Giter Club home page Giter Club logo

mobilenetv2.pytorch's Introduction

PyTorch Implemention of MobileNet V2

+ Release of next generation of MobileNet in my repo *mobilenetv3.pytorch*
+ Release of advanced design of MobileNetV2 in my repo *HBONet* [ICCV 2019]
+ Release of better pre-trained model. See below for details.

Reproduction of MobileNet V2 architecture as described in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov and Liang-Chieh Chen on ILSVRC2012 benchmark with PyTorch framework.

This implementation provides an example procedure of training and validating any prevalent deep neural network architecture, with modular data processing, training, logging and visualization integrated.

Requirements

Dependencies

  • PyTorch 1.0+
  • NVIDIA-DALI (in development, not recommended)

Dataset

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Pretrained models

The pretrained MobileNetV2 1.0 achieves 72.834% top-1 accuracy and 91.060% top-5 accuracy on ImageNet validation set, which is higher than the statistics reported in the original paper and official TensorFlow implementation.

MobileNetV2 with a spectrum of width multipliers

Architecture # Parameters MFLOPs Top-1 / Top-5 Accuracy (%)
MobileNetV2 1.0 3.504M 300.79 72.192 / 90.534
MobileNetV2 0.75 2.636M 209.08 69.952 / 88.986
MobileNetV2 0.5 1.968M 97.14 64.592 / 85.392
MobileNetV2 0.35 1.677M 59.29 60.092 / 82.172
MobileNetV2 0.25 1.519M 37.21 52.352 / 75.932
MobileNetV2 0.1 1.356M 12.92 34.896 / 56.564

MobileNetV2 1.0 with a spectrum of input resolutions

Architecture # Parameters MFLOPs Top-1 / Top-5 Accuracy (%)
MobileNetV2 224x224 3.504M 300.79 72.192 / 90.534
MobileNetV2 192x192 3.504M 221.33 71.076 / 89.760
MobileNetV2 160x160 3.504M 154.10 69.504 / 88.848
MobileNetV2 128x128 3.504M 99.09 66.740 / 86.952
MobileNetV2 96x96 3.504M 56.31 62.696 / 84.046

Taking MobileNetV2 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms.

from models.imagenet import mobilenetv2

net = mobilenetv2()
net.load_state_dict(torch.load('pretrained/mobilenetv2-c5e733a8.pth'))

Usage

Training

Configuration to reproduce our strong results efficiently, consuming around 2 days on 4x TiTan XP GPUs with non-distributed DataParallel and PyTorch dataloader.

  • batch size 256
  • epoch 150
  • learning rate 0.05
  • LR decay strategy cosine
  • weight decay 0.00004

The newly released model achieves even higher accuracy, with larger bacth size (1024) on 8 GPUs, higher initial learning rate (0.4) and longer training epochs (250). In addition, a dropout layer with the dropout rate of 0.2 is inserted before the final FC layer, no weight decay is imposed on biases and BN layers and the learning rate ramps up from 0.1 to 0.4 in the first five training epochs.

python imagenet.py \
    -a mobilenetv2 \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a mobilenetv2 \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

The following is a BibTeX entry for the MobileNet V2 paper that you should cite if you use this model.

@InProceedings{Sandler_2018_CVPR,
author = {Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
title = {MobileNetV2: Inverted Residuals and Linear Bottlenecks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

If you find this implementation helpful in your research, please also consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

License

This repository is licensed under the Apache License 2.0.

mobilenetv2.pytorch's People

Contributors

d-li14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobilenetv2.pytorch's Issues

Question about normalization in data preprocessing

Hi,
Thanks for your wonderful work!
I find the normalization in your codes does not use ``transforms.ToTensor(), transforms.Normalize()'' provided by PyTorch. The normalization you used is to sub and div on [0, 255] space directly.
I wonder why you use this normalization, dose it make big difference to the results?

DALI-dataloader

I wonder if you have updated the DALI dataloader since last push? If not, I'll send a PR that makes fixed dali dataloader for newer versions of dali/nvidia stuff.

mobilenetv2_0.1-7d1d638a.pth can not be loaded.

Thank you for your great work!
However, some problems occur when I load mobilenetv2_0.1-7d1d638a.pth to your mobile net with width _mult = 0.1.
Could you tell me the width_mult I should set?

Embedding size

Hello, I found one interesting thing: The last layers in your state_dicts have shape 1280. I think they should change shape according to width_mult, but all checkpoints have shape 1280.

name_of_layer, shape
conv.0.weight torch.Size([1280, 160, 1, 1])
conv.1.weight torch.Size([1280])
conv.1.bias torch.Size([1280])
conv.1.running_mean torch.Size([1280])
conv.1.running_var torch.Size([1280])
conv.1.num_batches_tracked torch.Size([])
classifier.weight torch.Size([1000, 1280])
classifier.bias torch.Size([1000])

训练参数

你好,能公布下你的训练参数么,按照脚本给出的用于训练ResNet的默认参数并不能很好的训练mobilenetv2,所以想参考下你的设置

MobileNet v2 training options

Could you please kindly share your training options for MobileNet v2 on ImageNet when top1 accuracy finally reaches 72.0%? Thanks a lot!

bias present in pretrained network

Hi,
I don't understand why there are biases in your models. In model definition, conv2d has bias set to false :
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False)
Is this normal?

Validation Input size

Would you please clarify what is the input size during the validation? From the code, as it is defined in function get_pytorch_val_loader in the file utils/dataloaders.py, it seems validation input size is 224/0.875 = 256:
transforms.Resize(int(input_size / 0.875))
I believe in the paper, results are reported for the central crop of size 224x224, isn't it? So comparison is not straightforward. Can you please add the results on 224x224 crops?

Load the trained model and report an error when testing separately

When I use your following code to load the model:
source_state = torch.load(args.weight)
target_state = OrderedDict()
for k, v in source_state.items():
if k[:7] != 'module.':
k = 'module.' + k
target_state[k] = v
model.load_state_dict(target_state)

Report an error:
2021-05-30 15-27-50屏幕截图

Does not run in Pytorch 1.3.1

I just cloned your repo and when I'm launching the command:

CUDA_VISIBLE_DEVICES=2,3,4,5 python imagenet.py -a mobilenetv2 -d /path/to/dataset/ImageNet2012/ --epochs 150 --lr-decay cos --lr 0.05 --wd 4e-5 -c checkpoints --width-mult 1 --input-size 224 -j 12

It gets stuck at this point:

=> creating model 'mobilenetv2'

Epoch: [1 | 150]
Processing

<Ctrl+C pressed after 10 min of nothing happening:>

^CTraceback (most recent call last):
  File "imagenet.py", line 403, in <module>
    main()
  File "imagenet.py", line 224, in main
    train_loss, train_acc = train(train_loader, train_loader_len, model, criterion, optimizer, epoch)
  File "imagenet.py", line 271, in train
    for i, (input, target) in enumerate(train_loader):
  File "/home/michael/mobilenetv2.pytorch/utils/dataloaders.py", line 190, in prefetched_loader
    for next_input, next_target in loader:
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _get_data
    success, data = self._try_get_data()
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/queue.py", line 179, in get
    self.not_empty.wait(remaining)
  File "/home/michael/miniconda2/envs/pt/lib/python3.7/threading.py", line 300, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt

Nothing is happening at this point. nvidia-smi shows that a single GPU consumes ~500M of memory, and CPU cores are ~60% busy, but it's not clear what are they doing. I waited for 10 minutes before aborting. I also tried it on a single GPU - same issue.
If I switch to --data-backend dali-cpu (using nvidia-dali version 0.16) it fails with the following error:

=> creating model 'mobilenetv2' Traceback (most recent call last): File "imagenet.py", line 403, in <module> main() File "imagenet.py", line 194, in main train_loader, train_loader_len = get_train_loader(args.data, args.batch_size, workers=args.workers, input_size=args.input_size) TypeError: gdtl() got an unexpected keyword argument 'input_size'

I'm using Pytorch 1.3.1 with 4x Titan Xp cards. The only thing I had to change in your code is to replace cuda(async=True) with cuda(non_blocking=True). Changing tonon_blocking=False does not help.

Can you please try cloning your repo to a clean Pytorch 1.3.1 environment and see if you can run it? Any idea what's going on?

how to use the pretrained 0.75 model

hi, here!

first of all, many thanks for your work! It helps a lot for me!

My question is that how could I use your pretrained mobilenetv2_0.75 model? I looked into the model, the channels change from [32 16 24 32 64 96 160 320] to [24 16 24 24 48 72 120 240]. Does this mean as long as I change the input_channel to 24
and
self.cfgs = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
to
self.cfgs = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 24, 3, 2],
[6, 48, 4, 2],
[6, 72, 3, 1],
[6, 120, 3, 2],
[6, 240, 1, 1],
]
everything is ok?

want to get some advice for training.

mobilenetv2 1.0 224
epochs: 200
bacth size: 512
lr-decay: cos
lr : 0.2
wd: 4e-5
wamps up: 0-0.4 first 5 epochs
dropout : 0.2

top1 : 0.685
but In your md_file ,it is 0.722
is something wrong in my training?

I didn't find the implementation of the 'linear2exp' you mentioned before

Several months ago you answered a question with your training command. But I didn't find the implementation of the '--lr-decay="linear2exp"' in the function 'adjust_learning_rate'.

I trained MobileNet V2 from scratch by calling
python3 imagenet.py -d /path/to/your/ImageNet/root/ -j16 --epochs=300 --arch="mobilenetv2" --gpu-id="0,1,2,3" --lr=0.045 --lr-decay="linear2exp" --gamma=0.98 --weight-decay=0.00004

Now, the best model achieves 71.79% top-1 accuracy at Epoch 249. The training is expected to finish in a few days. Hardware environment: 4-way 2080ti, Software environment: Pytorch 1.0.

Originally posted by @lld533 in #2 (comment)

Transformations on the input tensor

Hi, thank you for sharing the models. Can you please share transformations that are needed to be applied to the input tensor to get the correct results?

图像预处理

请问下使用模型的话,图像该怎么预处理啊,均值,方差,还有RGB通道,谢谢

how many epochs did you train?

Hi, thanks for your nice work.
I am wandering how many epochs you trained? the default epochs=90 in your code and it may take around 60hrs. while the lr policy you set is more than 90 epochs .

about step lr_decay

in imagenet.py:383, since current_iter < max_iter, lr is always set to 0.1args.lr, and the step policy is missing
did i misunderstood?
thanks!

Param and MAdd

hi friend, did you conduct experiment with width_mul=1.4? if just adjust width_mul, i dont think the param of mobilenetv2 can reach 6.9M

training time cost

Hi @d-li14 ,

Nice work and performance!
Would you please share the training cost here? e.g. GPU detail, training hours.
Since i have only 1080ti and seems like training imagenet from scratch will takes too much time.
Thanks very much!

cosine learning rate

Thanks for sharing your code! However, your implemenation of cosine learning rate seems not right.
In your implementation, the learing rate start with the initial learning rate (lr), but stop with 0.5*lr, instead of 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.