Giter Club home page Giter Club logo

hbonet's Introduction

HBONet

Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.

We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.

Requirements

Dependencies

  • PyTorch 1.0+
  • NVIDIA-DALI (in development, not recommended)

Dataset

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Pretrained models

The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.

HBONet with a spectrum of width multipliers (Table 2)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 1.0 305 73.1 / 91.0
HBONet 0.8 205 71.3 / 89.7
HBONet 0.5 96 67.0 / 86.9
HBONet 0.35 61 62.4 / 83.7
HBONet 0.25 37 57.3 / 79.8
HBONet 0.1 14 41.5 / 65.7

HBONet 0.8 with a spectrum of input resolutions (Table 3)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.8 224x224 205 71.3 / 89.7
HBONet 0.8 192x192 150 70.0 / 89.2
HBONet 0.8 160x160 105 68.3 / 87.8
HBONet 0.8 128x128 68 65.5 / 85.9
HBONet 0.8 96x96 39 61.4 / 83.0

HBONet 0.35 with a spectrum of input resolutions (Table 4)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.35 224x224 61 62.4 / 83.7
HBONet 0.35 192x192 45 60.9 / 82.6
HBONet 0.35 160x160 31 58.6 / 80.7
HBONet 0.35 128x128 21 55.2 / 78.0
HBONet 0.35 96x96 12 50.3 / 73.8

HBONet with different width multipliers and different input resolutions (Table 5)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet 0.5 224x224 98 67.7 / 87.4
HBONet 0.6 192x192 108 67.3 / 87.3

HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)

Architecture MFLOPs Top-1 / Top-5 Acc. (%)
HBONet(2x) 0.25 44 58.3 / 80.6
HBONet(4x) 0.25 45 59.3 / 81.4
HBONet(8x) 0.25 45 58.2 / 80.4

Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)

from models.imagenet import hbonet

net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))

Usage

Training

Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.

  • batch size 256
  • epoch 150
  • learning rate 0.05
  • LR decay strategy cosine
  • weight decay 0.00004
python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

If you find our work useful in your research, please consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

hbonet's People

Contributors

d-li14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hbonet's Issues

0

1

HarmoniousBottleneck_2x

开源代码中HarmoniousBottleneck_2x模块forward函数的elif self.stride == 2: return torch.cat((self.avgpool(x[:, -(self.oup - self.oup // 2):, :, :]), self.conv(x)), dim=1),仅仅使用了cat,其他模块中包含了add之后再cat操作,文章结构示意图也是先add再cat,应该以哪个为准呢。

about feature selection

thanks for your contribution a lot,here if I want to combine hbonet with FPN(feature pyramid network) ,a combined feature extraction network is just like resnet101-FPN and so on.so which feature maps extracted from hbonet would be suitable for FPN?

could you please give me some suggestion taking the following as an example?

self.cfgs = [
            # t, c, n, s, block
            [1, 20, 1, 1, InvertedResidual],

            # alternative blocks for 8x varaint model
            # [2,  36, 1, 1, HarmoniousBottleneck_8x],
            # [2,  72, 3, 2, HarmoniousBottleneck_8x],
            # [2,  96, 4, 2, HarmoniousBottleneck_4x],

            # alternative blocks for 4x varaint model
            # [2,  36, 1, 1, HarmoniousBottleneck_4x],
            # [2,  72, 3, 2, HarmoniousBottleneck_4x],
            # [2,  96, 4, 2, HarmoniousBottleneck_4x],

            # alternative blocks for 2x main model
            [2, 36, 1, 1, HarmoniousBottleneck_2x],
            [2, 72, 3, 2, HarmoniousBottleneck_2x],
            [2, 96, 4, 2, HarmoniousBottleneck_2x],

            # fixed blocks
            [2, 192, 4, 2, HarmoniousBottleneck_2x],
            [2, 288, 1, 1, HarmoniousBottleneck_2x],
            [0, 144, 1, 1, conv_1x1_bn_hbo],
            [6, 200, 2, 2, InvertedResidual],
            [6, 400, 1, 1, InvertedResidual],
        ]

thanks in advance.

About FLOPs

Hi,

I'm calculating the FLOPs of the default settings of HBOnet where width_mult = 1.0 and input size is 3x224x224 with thop (https://github.com/Lyken17/pytorch-OpCounter). Below please find my script with Python3.5.

import hbonet
from thop import profile, clever_format
import torch

model = hbonet.hbonet()
flops, params = profile(model, inputs=(torch.randn(1,3,224,224), ))
flops, params = clever_format([flops, params], "%.3f")
print(flops, params) #984.760M 4.562M

It turns out to me that FLOPs and params are 984.760M & 4.562M, respectively. As you can see, the number of flops is far more than 300M claimed in your paper.

May I know if you counted nn.Upsample and nn.BatchNorm2d in your paper? To make it clear, could you please release the code you used to count FLOPs in your paper? Many thanks in advance.

training epoch

Hi,

First of all, congratulation for your acception in ICCV. This work looks nice.

I have two questions,

  1. In your paper, these models were trained for 150 epochs while this repo indicates 300epoch.

Which one is correct?

  1. This repo notices the dependency of NVIDIA DALI which is not recommended by you. Without the DALI, I can train the model?

Is the paper available now

Thanks for sharing the awesome work! Could you please provide a link to your paper, the link in the README is invalid somehow.

Some question about other network

Thanks for your great work!
I want to know if you tried the channel Shuffle at the end of each block like ShufflenetV2? Can this be improved? Look forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.