jaxony / shufflenet Goto Github PK

ShuffleNet in PyTorch. Based on https://arxiv.org/abs/1707.01083

License: MIT License

Python 100.00%

pytorch deep-learning neural-network artificial-intelligence convolution

shufflenet's Introduction

ShuffleNet in PyTorch

An implementation of ShuffleNet in PyTorch. ShuffleNet is an efficient convolutional neural network architecture for mobile devices. According to the paper, it outperforms Google's MobileNet by a small percentage.

What is ShuffleNet?

In one sentence, ShuffleNet is a ResNet-like model that uses residual blocks (called ShuffleUnits), with the main innovation being the use of pointwise, or 1x1, group convolutions as opposed to normal pointwise convolutions.

Usage

Clone the repo:

git clone https://github.com/jaxony/ShuffleNet.git

Use the model defined in model.py:

from model import ShuffleNet

# running on MNIST
net = ShuffleNet(num_classes=10, in_channels=1)

Performance

Trained on ImageNet (using the PyTorch ImageNet example) with groups=3 and no channel multiplier. On the test set, got 62.2% top 1 and 84.2% top 5. Unfortunately, this isn't comparable to Table 5 of the paper, because they don't run a network with these settings, but it is somewhere between the network with groups=3 and half the number of channels (42.8% top 1) and the network with the same number of channels but groups=8 (32.4% top 1). The pretrained state dictionary can be found here, in the following format:

{
    'epoch': epoch + 1,
    'arch': args.arch,
    'state_dict': model.state_dict(),
    'best_prec1': best_prec1,
    'optimizer' : optimizer.state_dict()
}

Note: trained with the default ImageNet settings, which are actually different from the training regime described in the paper. Pending running again with those settings (and groups=8).

shufflenet's People

Contributors

Stargazers

Watchers

Forkers

wzhen1 issac8huxley opencvfun wk910930 kuyun-zhangyang benjamesbabala lijiannuist soledad89 runngezhang donnyyou wuhao2 tangal0203 grseb9s randl baucheng franciszchen lizhi3158 willdamon gngdb myownskyw7 huaijin-chen jwang41 hq-liu mrwhitehomeman busyszl linwaydong dg-apollo zhixiangwang-cn fendaq hanson-young raymondbigcat yifdu tanglang96 yueshanggu ww00426955 happy-ngh lazuraslong frankzd amirunpri2018 tinyloop pgadosey jluhuangj linwei-chen hisangke michaelbeechan ioekg junxiangzhao libuhui zhangyuxuan1996 thomascx kloud1989 mokochin fedral keep4m hvning ywwwer 13213085 vincentzhao1992 dreamer121121 muyangmuzi hejing-maker shashankwer jeffery000 w2020 jodyngo fqyy3210 zjnlxk bii-dpi syedtauhidullahshah kv1830 a154609 dengfenglai321 zhoujfan love112358 hdzattain czw6592 essential-gx gellston kingharemiii junjie2008v adxchen ivorytower152 applib-sg applib-sg-1009 libo-yueling cas2ggb jqjin123

shufflenet's Issues

Pretrained Model Cannot Be Opened

The tar file you provided can not be opened.
Could you provide us your raw pth file?

the pretrained state dictionary

I download the pretrained state dictionary ,but i can not open it?

Question about the out_channels

Hi,
May I ask you a question? When the stride of shuffle unit is 2, why should the out_channels minus the in_channels?
the code in ShuffleUnit:

# ensure output of concat has the same channels as 
# original output channels.
self.out_channels -= self.in_channels

Wrong number of channels for g=1, stage4: should be 576, not 567

In init function of 'ShuffleNet' class, when groups==1 the number of channels should be 576, not 567.

The correct case should be:

if groups == 1:
self.stage_out_channels = [-1, 24, 144, 288, 576]

Train/Test Speed

Dear @jaxony ,

Thanks for your work!

I am using this to train ShuffleNet on ImageNet. However, both training and testing speed looks very slow.

ShuffleNet at epoch 2:

Test: [190/196] Time 3.673 (3.323)      Loss 2.3073 (3.7287)    Prec@1 43.750 (24.235)  Prec@5 75.000 (47.801)

While for AlexNet at epoch 2:

Test: [190/196] Time 0.672 (0.558)      Loss 3.7238 (4.3975)    Prec@1 28.125 (15.122)  Prec@5 50.391 (35.214)

ShuffleNet is about 6 times slower than AlexNet. Have you noticed this on MNIST?

Thanks!
Kun

ImageNet result

I noticed your imagenet result is 62.2% (top1), can you share your training log for me or more detail training setting?

When group=8, input channels cannot be divided by group number

Hi, when I run your code when group = 8, the error occurs that ValueError: in_channels must be divisible by groups.
According to the paper, I think there is no problem with your implementation. Do you have tested your code using group = 8?
By the way, there is the same problem when I tried to build ShuffleNet0.5x using scale=0.5.
To reproduce the issue, just modify groups=3 to groups=8 here: https://github.com/jaxony/ShuffleNet/blob/master/tests.py#L104

Thanks in Advance!.

batch normalization and relu after first conv

Though it was never mentioned in the paper, but I think the batch normalization and relu activation is necessary after the first 3x3 convolution.

load_state_dict while errors occurs

size mismatch for stage4.ShuffleUnit_Stage4_0.depthwise_conv3x3.weight