Giter Club home page Giter Club logo

octaveconv's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

octaveconv's Issues

1d version

Thank you very much. I would welcome if you can share the 1d version of OctConv as well.

Super resolution result

On super-resolution tasks, the ordinary convolutional layer is replaced by octave convolution, but the loss and calculation time become larger, what is the matter?

How does OctConv really implementstrided convolutions

First of all, feel the author's open source spirit, great idea!

I know that low and high frequencies are divided according to a given proportion, but how are they divided along the channel dimension? Are they divided randomly according to proportion? Or are there other pre-processing in it?

Thank you very much.

How does OctConv really implement strided convolutions

if stride == (2, 2):

It is not clear to me from reading the paper how OctConv implements strided convolutions. Because the paper is not explicit about this, it would be great if we could get clarification from the authors.

This also leads to a deeper conversation about how OctConv is handling downsampling / upsampling, which maybe should be had here, because this github repo is the most popular implementation.

Seems most readers assume that OctConv implements strided convolution the same way that the downsampling operation is implemented, using average pooling followed by regular convolution. This might be true, but I cannot see where the paper actually says this.

Adding to the confusion, I think this premise from the paper is wrong:

shifting the location by half step is to ensure the down-sampled maps well aligned with the input

One reason the downsampled maps would be shifted by a half step relative to the full resolution input would be if they were produced by convolution with an even-sized filter, such as 2x2 average pooling with stride 2x2. If instead the downsampled maps were produced by convolution with an odd sized filter, such as a 3x3 convolution with padding = 1, then the output would be perfectly aligned with the input.

Another cause of feature misalignment is "valid" padding, in other words, when padding is zero, which shifts the alignment of the outputs relative to the inputs. This is actually the cause of misalignment corrected by the "Deformable Convolutional Neural Networks" paper: [1]

In the original Inception-ResNet [51] architecture, multiple layers of valid convolution/pooling are utilized, which brings feature alignment issues for dense prediction tasks.

[...]

Aligned- Inception-ResNet does not have the feature alignment problem, by proper padding in convolutional and pooling layers

That is actually the citation used by the OctConv paper to make this claim:

we adopt average pooling to get more accurate approximation. This helps alleviate misalignments that appear when aggregating information from different scales

So that appears to be a misattribution, because the "Deformable Convnets" paper was actually addressing an alignment problem caused by valid convolutions used by Inception-Resnet.

Figure 3 from the paper is misleading. The strided convolution with 3x3 filter does not cause misalignment. It is the subsequent up-sampling using nearest neighbor interpolation that causes it. Nearest neighbor is equivalent to upsampling with the filter [1 1] which has an even number of taps. If the upsampling instead used a 3-tap or 5-tap interpolation filter, the upsampled feature map would remain aligned with the original.

This is the reason that Laplacian Pyramids usually use odd-sized filters, to keep the multiple layers aligned.

There are good engineering reason for using the simple 2-tap filters (average pooling and nearest neighbor interpolation). They are fast, they are already implemented by major frameworks, they do not require any padding. So I am not questioning these design choices, just the justification for them.

References:
[1] Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017).
https://arxiv.org/abs/1703.06211

about last layer of octconv in mobilenetv2

dear author,
how to deal with depthwise conv replaced by octave conv at last octave conv, the num_filter and num_group doesn't match, how to merge the high_f channel and low_f channel to one symol using depthwise octave conv. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.