terrychenism / octaveconv Goto Github PK
View Code? Open in Web Editor NEWA MXNet Implementation for Drop an Octave
A MXNet Implementation for Drop an Octave
Thank you very much. I would welcome if you can share the 1d version of OctConv as well.
Hi, and thks for this repo
I want to know if octConv is really fast than vanila conv on general Gpu environment
Could you provide the elapsed time it takes to complete the learning or sec/epoch for each models ResNet-v1-50, OctResNet-v1-50?
On super-resolution tasks, the ordinary convolutional layer is replaced by octave convolution, but the loss and calculation time become larger, what is the matter?
First of all, feel the author's open source spirit, great idea!
I know that low and high frequencies are divided according to a given proportion, but how are they divided along the channel dimension? Are they divided randomly according to proportion? Or are there other pre-processing in it?
Thank you very much.
Line 46 in 5fa78ae
It is not clear to me from reading the paper how OctConv implements strided convolutions. Because the paper is not explicit about this, it would be great if we could get clarification from the authors.
This also leads to a deeper conversation about how OctConv is handling downsampling / upsampling, which maybe should be had here, because this github repo is the most popular implementation.
Seems most readers assume that OctConv implements strided convolution the same way that the downsampling operation is implemented, using average pooling followed by regular convolution. This might be true, but I cannot see where the paper actually says this.
Adding to the confusion, I think this premise from the paper is wrong:
shifting the location by half step is to ensure the down-sampled maps well aligned with the input
One reason the downsampled maps would be shifted by a half step relative to the full resolution input would be if they were produced by convolution with an even-sized filter, such as 2x2 average pooling with stride 2x2. If instead the downsampled maps were produced by convolution with an odd sized filter, such as a 3x3 convolution with padding = 1, then the output would be perfectly aligned with the input.
Another cause of feature misalignment is "valid" padding, in other words, when padding is zero, which shifts the alignment of the outputs relative to the inputs. This is actually the cause of misalignment corrected by the "Deformable Convolutional Neural Networks" paper: [1]
In the original Inception-ResNet [51] architecture, multiple layers of valid convolution/pooling are utilized, which brings feature alignment issues for dense prediction tasks.
[...]
Aligned- Inception-ResNet does not have the feature alignment problem, by proper padding in convolutional and pooling layers
That is actually the citation used by the OctConv paper to make this claim:
we adopt average pooling to get more accurate approximation. This helps alleviate misalignments that appear when aggregating information from different scales
So that appears to be a misattribution, because the "Deformable Convnets" paper was actually addressing an alignment problem caused by valid convolutions used by Inception-Resnet.
Figure 3 from the paper is misleading. The strided convolution with 3x3 filter does not cause misalignment. It is the subsequent up-sampling using nearest neighbor interpolation that causes it. Nearest neighbor is equivalent to upsampling with the filter [1 1] which has an even number of taps. If the upsampling instead used a 3-tap or 5-tap interpolation filter, the upsampled feature map would remain aligned with the original.
This is the reason that Laplacian Pyramids usually use odd-sized filters, to keep the multiple layers aligned.
There are good engineering reason for using the simple 2-tap filters (average pooling and nearest neighbor interpolation). They are fast, they are already implemented by major frameworks, they do not require any padding. So I am not questioning these design choices, just the justification for them.
References:
[1] Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017).
https://arxiv.org/abs/1703.06211
dear author,
how to deal with depthwise conv replaced by octave conv at last octave conv, the num_filter and num_group doesn't match, how to merge the high_f channel and low_f channel to one symol using depthwise octave conv. Thanks.
how to use hf_ch_in?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.