Giter Club home page Giter Club logo

Comments (11)

andravin avatar andravin commented on August 15, 2024 4

The authors released their reference implementation: https://github.com/facebookresearch/OctConv

from octaveconv.

PistonY avatar PistonY commented on August 15, 2024 1

Hello, How do you handle with feature map size is 7x7?
When feature map size is 7x7, using Pooling(data, (2, 2), 'avg', stride=(2, 2)) would make it to 3x3, and using UpSampling(avg_pool, scale=2, sample_type='nearest', num_args=1) would make it to 6x6 but not 7x7?How do you handle this?
I read your code. it seem that you don't use OctConv in last residual Block?

from octaveconv.

andravin avatar andravin commented on August 15, 2024 1

@PistonY Looks like the authors just set alpha=0 in the last ResNet stages, so that the 7x7 feature map is not downsampled to a lower resolution:

        # ratio is forced to be 0. for the last stage
        # (because do 3x3 conv on 3.5x3.5 resolution map does not make sense)

https://github.com/facebookresearch/OctConv/blob/253139b1dc842f04030077d983b2cabb5a754b3a/utils/symbol/symbol_resnetv2.py#L82-L83

from octaveconv.

deyituo avatar deyituo commented on August 15, 2024

I searched oct conv in github and found that the implementations are the same as this one. It seems like that the strided convolutional and upsample/downsample should be considered carefullly when the size is odd. @andravin

from octaveconv.

andravin avatar andravin commented on August 15, 2024

I agree that we have to be careful when the strided convolution (or max-pool) filter size is odd, because the OctConv downsampling/upsampling filter size is even.

Just passing the stride to the HH, LL, HL, and LL convolutions inside OctConv would be the wrong thing to do (and at least 1 github implementation does this), because it does not correct the half-pixel shift misalignment that is caused by downsampling with an even sized filter.

Average pooling is the 2-tap filter [1 1]/2, and nearest neighbor upsampling is the filter [1 1]. Average pooling shifts the feature map by 1/2 pixel, and, with proper padding, nearest neighbor upsampling unshifts the feature map by the same amount.

Of course, the authors were not required to choose even-size downsampling/upsampling filters. Laplacian pyramids almost always use odd-size filters so that there are no half-pixel phase shifts between layers.

There are still different choices you could make when implementing strided OctConv, and we could discuss each of them. But we still would not know exactly what the authors did and would not be confident that we were reproducing the paper accurately.

from octaveconv.

savourylie avatar savourylie commented on August 15, 2024

I believe it's stated in this part of the paper,

However, since the index of X_H can only be an integer, we could either round the index to (2∗p+i, 2∗q+j) or approximate the value at (2∗p+0.5+i, 2∗q+0.5+j) by averaging all 4 adjacent locations. The first one is also known as strided convolution and the second one as average pooling. As we discuss in Section 3.3 and Fig. 3, strided convolution leads to misalignment; we therefore use average pooling to approximate this value for the rest of the paper.

that average pooling is chosen over strided convolution for the rest of the paper. Please also see ThoroughImages/OctConv#1 (comment)

from octaveconv.

andravin avatar andravin commented on August 15, 2024

That does not actually say how they port Conv2d(stride=2) to OctConv(stride=2), which is the subject of this issue.

It is true however that mixing Conv2d(kernel_size=3, stride=2) with AvgPool(kernel_size=2, stride=2) is not good, because this creates 2 different half-scale grids that are shifted relative to each other.

So the authors probably did replace Conv2d(stride=2) with something like AvgPool(stride=2)->Conv2d(stride=1), as this repo does, but the exact formulation is not documented in the paper.

from octaveconv.

PistonY avatar PistonY commented on August 15, 2024

@andravin Yeah,thanks.So it's not a totally plug-and-play "tools" and not suitable for singular side length featuer maps.

from octaveconv.

andravin avatar andravin commented on August 15, 2024

@PistonY I guess .. in my experiments, I avoided this issue by training with 256x256 input image resolution, so all layer sizes are a power of 2. If the filter was adapted to take the image boundary into account, one could downsample from 7x7 to 4x4. Or maybe it would make more sense to downsample the last 14x14 layer to 8x8.

from octaveconv.

andravin avatar andravin commented on August 15, 2024

Closing this issue, because we can now look at the reference implementation to see how OctConv implements strided convolutions: https://github.com/facebookresearch/OctConv

from octaveconv.

jackchinor avatar jackchinor commented on August 15, 2024

Hello, How do you handle with feature map size is 7x7?
When feature map size is 7x7, using Pooling(data, (2, 2), 'avg', stride=(2, 2)) would make it to 3x3, and using UpSampling(avg_pool, scale=2, sample_type='nearest', num_args=1) would make it to 6x6 but not 7x7?How do you handle this?
I read your code. it seem that you don't use OctConv in last residual Block?

@PistonY have you solved this problem? I met the same problem with you when deal with last octave conv layer. I don't know how to handle it.

from octaveconv.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.