terrychenism / octaveconv Goto Github PK

View Code? Open in Web Editor NEW

492.0 492.0 93.0 127 KB

A MXNet Implementation for Drop an Octave

Python 100.00%

octaveconv's People

Stargazers

Watchers

Forkers

hiroki-kyoto mdongbenben leonscrt cjngh chrisliu007 hezewy jason-lee-lxx size1995 jackyspeed chisyliu wintersurvival zehaoy bruinxiong elena-ssq irvingshu robinwenqian insomnialjk rotorliu gengjiaqi changya1990 kimkuang caihengyu520 neo-yang leo-xxx shaosanye jdc08161063 luyifanlu sijiayang gaimjkp chloe4d liuwq0809 sweaterr csjunxu iforkings brandonzhong shlpu amwons runauto kelven123 xiaolang17 renly jiangxuehan zhangjingsecond raleighpeng weycui qianmuluo fly2leoo githublzb ahuwangsheng pzhren zhdai zhuzhao-1995 uuuup yiqunlihk qitong ddeeppnneett zlpsophina xig007 fsudong zhangsdly soywu trendingtechnology william-zhan junsenselee futureprecd aiyangyang963 catmi666 lovelan521 wudongyuan huster123 hudsonhuang syrilzhang guoruiwang guoswang lymaterial aodgithub brandnewa hdony nankaigc mrzhouyang skyneta clhne zhangy10 yjhgithub tinyloop jameskry lifeiscool juyongjiang tenglang123 chengyiming wonlee2019 makex1n iq-scm

octaveconv's Issues

1d version

Thank you very much. I would welcome if you can share the 1d version of OctConv as well.

does this work on segmentation models ?

Could you provide the elapsed time it takes to complete the learning or sec/epoch for each models ResNet-v1-50, OctResNet-v1-50?

Hi, and thks for this repo

I want to know if octConv is really fast than vanila conv on general Gpu environment

Could you provide the elapsed time it takes to complete the learning or sec/epoch for each models ResNet-v1-50, OctResNet-v1-50?

Super resolution result

On super-resolution tasks, the ordinary convolutional layer is replaced by octave convolution, but the loss and calculation time become larger, what is the matter?

How does OctConv really implementstrided convolutions

First of all, feel the author's open source spirit, great idea!

I know that low and high frequencies are divided according to a given proportion, but how are they divided along the channel dimension? Are they divided randomly according to proportion? Or are there other pre-processing in it?

Thank you very much.

How does OctConv really implement strided convolutions

OctaveConv/symbol_octConv.py

Line 46 in 5fa78ae

if stride == (2, 2):

It is not clear to me from reading the paper how OctConv implements strided convolutions. Because the paper is not explicit about this, it would be great if we could get clarification from the authors.

This also leads to a deeper conversation about how OctConv is handling downsampling / upsampling, which maybe should be had here, because this github repo is the most popular implementation.

Seems most readers assume that OctConv implements strided convolution the same way that the downsampling operation is implemented, using average pooling followed by regular convolution. This might be true, but I cannot see where the paper actually says this.

Adding to the confusion, I think this premise from the paper is wrong:

shifting the location by half step is to ensure the down-sampled maps well aligned with the input

One reason the downsampled maps would be shifted by a half step relative to the full resolution input would be if they were produced by convolution with an even-sized filter, such as 2x2 average pooling with stride 2x2. If instead the downsampled maps were produced by convolution with an odd sized filter, such as a 3x3 convolution with padding = 1, then the output would be perfectly aligned with the input.

Another cause of feature misalignment is "valid" padding, in other words, when padding is zero, which shifts the alignment of the outputs relative to the inputs. This is actually the cause of misalignment corrected by the "Deformable Convolutional Neural Networks" paper: [1]

In the original Inception-ResNet [51] architecture, multiple layers of valid convolution/pooling are utilized, which brings feature alignment issues for dense prediction tasks.

[...]

Aligned- Inception-ResNet does not have the feature alignment problem, by proper padding in convolutional and pooling layers

That is actually the citation used by the OctConv paper to make this claim:

we adopt average pooling to get more accurate approximation. This helps alleviate misalignments that appear when aggregating information from different scales

So that appears to be a misattribution, because the "Deformable Convnets" paper was actually addressing an alignment problem caused by valid convolutions used by Inception-Resnet.

Figure 3 from the paper is misleading. The strided convolution with 3x3 filter does not cause misalignment. It is the subsequent up-sampling using nearest neighbor interpolation that causes it. Nearest neighbor is equivalent to upsampling with the filter [1 1] which has an even number of taps. If the upsampling instead used a 3-tap or 5-tap interpolation filter, the upsampled feature map would remain aligned with the original.

This is the reason that Laplacian Pyramids usually use odd-sized filters, to keep the multiple layers aligned.

There are good engineering reason for using the simple 2-tap filters (average pooling and nearest neighbor interpolation). They are fast, they are already implemented by major frameworks, they do not require any padding. So I am not questioning these design choices, just the justification for them.

References:
[1] Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017).
https://arxiv.org/abs/1703.06211

about last layer of octconv in mobilenetv2

dear author,
how to deal with depthwise conv replaced by octave conv at last octave conv, the num_filter and num_group doesn't match, how to merge the high_f channel and low_f channel to one symol using depthwise octave conv. Thanks.

no use of hf_ch_in？

how to use hf_ch_in?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.