d-li14 / hbonet Goto Github PK

View Code? Open in Web Editor NEW

103.0 10.0 16.0 182.94 MB

[ICCV 2019] Harmonious Bottleneck on Two Orthogonal Dimensions, surpassing MobileNetV2

Home Page: https://arxiv.org/abs/1908.03888

License: Apache License 2.0

Python 100.00%

pytorch imagenet pretrained-models mobilenetv2 efficient-model iccv2019

hbonet's Introduction

HBONet

Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.

We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.

Requirements

Dependencies

PyTorch 1.0+
NVIDIA-DALI (in development, not recommended)

Dataset

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Pretrained models

The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.

HBONet with a spectrum of width multipliers (Table 2)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 1.0	305	73.1 / 91.0
HBONet 0.8	205	71.3 / 89.7
HBONet 0.5	96	67.0 / 86.9
HBONet 0.35	61	62.4 / 83.7
HBONet 0.25	37	57.3 / 79.8
HBONet 0.1	14	41.5 / 65.7

HBONet 0.8 with a spectrum of input resolutions (Table 3)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.8 224x224	205	71.3 / 89.7
HBONet 0.8 192x192	150	70.0 / 89.2
HBONet 0.8 160x160	105	68.3 / 87.8
HBONet 0.8 128x128	68	65.5 / 85.9
HBONet 0.8 96x96	39	61.4 / 83.0

HBONet 0.35 with a spectrum of input resolutions (Table 4)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.35 224x224	61	62.4 / 83.7
HBONet 0.35 192x192	45	60.9 / 82.6
HBONet 0.35 160x160	31	58.6 / 80.7
HBONet 0.35 128x128	21	55.2 / 78.0
HBONet 0.35 96x96	12	50.3 / 73.8

HBONet with different width multipliers and different input resolutions (Table 5)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.5 224x224	98	67.7 / 87.4
HBONet 0.6 192x192	108	67.3 / 87.3

HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet(2x) 0.25	44	58.3 / 80.6
HBONet(4x) 0.25	45	59.3 / 81.4
HBONet(8x) 0.25	45	58.2 / 80.4

Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)

from models.imagenet import hbonet

net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))

Usage

Training

Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.

batch size 256
epoch 150
learning rate 0.05
LR decay strategy cosine
weight decay 0.00004

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

If you find our work useful in your research, please consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

hbonet's People

Contributors

Stargazers

Watchers

Forkers

ml-lab chaoso yaoanbang chunhuizng zlstudycv jlhou goodgoodstudy92 auth0rm0rgan waterbearbee scape1989 jxncyym mathpopo xiaoxiaowen123 xiaobingsuper mengfei25

hbonet's Issues

0 HarmoniousBottleneck_2x

开源代码中HarmoniousBottleneck_2x模块forward函数的elif self.stride == 2: return torch.cat((self.avgpool(x[:, -(self.oup - self.oup // 2):, :, :]), self.conv(x)), dim=1)，仅仅使用了cat，其他模块中包含了add之后再cat操作，文章结构示意图也是先add再cat，应该以哪个为准呢。

about feature selection

thanks for your contribution a lot,here if I want to combine hbonet with FPN(feature pyramid network) ,a combined feature extraction network is just like resnet101-FPN and so on.so which feature maps extracted from hbonet would be suitable for FPN?

could you please give me some suggestion taking the following as an example?

self.cfgs = [
            # t, c, n, s, block
            [1, 20, 1, 1, InvertedResidual],

            # alternative blocks for 8x varaint model
            # [2,  36, 1, 1, HarmoniousBottleneck_8x],
            # [2,  72, 3, 2, HarmoniousBottleneck_8x],
            # [2,  96, 4, 2, HarmoniousBottleneck_4x],

            # alternative blocks for 4x varaint model
            # [2,  36, 1, 1, HarmoniousBottleneck_4x],
            # [2,  72, 3, 2, HarmoniousBottleneck_4x],
            # [2,  96, 4, 2, HarmoniousBottleneck_4x],

            # alternative blocks for 2x main model
            [2, 36, 1, 1, HarmoniousBottleneck_2x],
            [2, 72, 3, 2, HarmoniousBottleneck_2x],
            [2, 96, 4, 2, HarmoniousBottleneck_2x],

            # fixed blocks
            [2, 192, 4, 2, HarmoniousBottleneck_2x],
            [2, 288, 1, 1, HarmoniousBottleneck_2x],
            [0, 144, 1, 1, conv_1x1_bn_hbo],
            [6, 200, 2, 2, InvertedResidual],
            [6, 400, 1, 1, InvertedResidual],
        ]

thanks in advance.

About FLOPs

Hi,

I'm calculating the FLOPs of the default settings of HBOnet where width_mult = 1.0 and input size is 3x224x224 with thop (https://github.com/Lyken17/pytorch-OpCounter). Below please find my script with Python3.5.

import hbonet
from thop import profile, clever_format
import torch

model = hbonet.hbonet()
flops, params = profile(model, inputs=(torch.randn(1,3,224,224), ))
flops, params = clever_format([flops, params], "%.3f")
print(flops, params) #984.760M 4.562M

It turns out to me that FLOPs and params are 984.760M & 4.562M, respectively. As you can see, the number of flops is far more than 300M claimed in your paper.

May I know if you counted nn.Upsample and nn.BatchNorm2d in your paper? To make it clear, could you please release the code you used to count FLOPs in your paper? Many thanks in advance.

training epoch

Hi,

First of all, congratulation for your acception in ICCV. This work looks nice.

I have two questions,

In your paper, these models were trained for 150 epochs while this repo indicates 300epoch.

Which one is correct?

This repo notices the dependency of NVIDIA DALI which is not recommended by you. Without the DALI, I can train the model?