Giter Club home page Giter Club logo

region-based-non-local-network's Introduction

Region-based Non-local operation for Video Classification [arXiv]

Citation

Please [★star] this repo and [cite] the following arXiv paper if you think our RNL is useful for you:

@article{huang2020region,
  title={Region-based Non-local Operation for Video Classification},
  author={Huang, Guoxi and Bors, Adrian G},
  journal={arXiv preprint arXiv:2007.09033},
  year={2020}
}

Prerequisites

Data Preparation

Please refer to TSM repo for the details of data preparation.

Pretrained Models

The accuracy might be a bit different from the paper, as we did some modification to our models. For example, instead of using SE module reported in the paper, we use the Channel-gate module form GCNet to model the channel attention.

method n-frame Kinetics Acc. checkpoint
NL I3D-ResNet50 32 * 10clips 74.9% -
RNL TSM-ResNet50 8 * 10clips 75.6% link
RNL TSM-ResNet50 16 * 10clips 77.2% link
RNL TSM-ResNet50 (16+8) * 10clips 77.4% -

On Kinetics, RNL TSM models achieve better performance than NL I3D model with less computation (shorter video length).

method n-frame Something-V1 Acc. checkpoint
RNL TSM-ResNet50 8 * 2clips 49.5% link
RNL TSM-ResNet50 16 * 2clips 51.0% link
RNL TSM-ResNet50 (8+16) * 2clips 52.7% -
RNL TSM-ResNet101 8 * 2clips 50.8% link
RNL 101 + RNL 50 (8+16) * 2clips 54.1% -

Training

We provided several examples to train RNL network with this repo:

  • To train on Kinetics from ImageNet pretrained models, you can run the script bellow:
python main.py --dataset kinetics  --dense_sample --dist-url 'tcp://localhost:6666' \
--dist-backend 'nccl' --multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 \
--rank 0 --gd 20 --shift --shift_div=8 --shift_place=blockres --npb --lr 0.02 --wd 2e-4 \
--dropout 0.5 --num_segments 8 --batch_size 16 --batch_multiplier 4 --use_warmup --warmup_epochs 5 \
--lr_type cos --epochs 100 --non_local  --suffix 1
  • To train on Something-Something V1 from ImageNet pretrained models, you can run the script bellow:
python main.py --dist-url 'tcp://localhost:6666' --dist-backend 'nccl' \
--multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 --rank 0 \
--dataset something --gd 20 --shift --shift_div=8 --shift_place=blockres --npb \
--lr 0.02 --wd 1e-3 --dropout 0.8 --num_segments 8 --batch_size 16 --batch_multiplier 4\
--use_warmup --warmup_epochs 1 --lr_type cos --epochs 50 --non_local  --suffix 1

# Notice that the total batch size is equal to batch_size x batch_multiplier x world_size, and 
# you should scale up the learning rate with batch size. For example, if you use 
# a batch size of 128 you should set learning rate to 0.04.

Test

For example, to test the downloaded pretrained models, you can run the scripts below. The scripts test RNL on 8-frame setting by running:

# test on kinetics
python test_models.py kinetics  \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_cos_dense_nl_lr0.02_wd2.0e-04.pth.tar \
--test_segments=8 --batch_size=16 -j 25 --test_crops=3  --dense_sample --full_res

# test on Something
python test_models.py something \
--weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e50_cos_nl_h_8e-4.pth.tar \
--test_segments=8 --batch_size=2 -j 25 --test_crops=3  --twice_sample  --full_res

Other Info

References

This repository is built upon the following baseline implementations.

Contact

For any questions, please feel free to open an issue or contact:

Guoxi Huang: [email protected]

region-based-non-local-network's People

Contributors

russellllaputa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

region-based-non-local-network's Issues

Questions about Channel-wise Separable Convolutions

Hi! Thanks for the great work!
I have a question related to the Channel-wise Separable Convolutions part in the RNL module.

In your paper, you mention that Fθ should not fuse together information across channels.
But in your code, the parameter of groups in DepthwiseConv3d is not equal to the in_channels, it is "groups = max(in_channels // 32, 1)".

Are there any specific reasons for you to choose this group?

Thank you!

How many clips do you use in TABLE III?

Thanks for your work.
I have a question.

In Table III, RNL got 49.47; while in Table V, RNL with 1 clip is 47.3.
In this case, could you please tell us why this difference occurs?
Did RNL in Table III uses 2 clips and 3 crops as said in the setting?
image
image

how to train?

I am very interested in your research.

I want to perform custom learning on my data, but I am curious about the example training code and how to organize the data.

If this is not an excuse, I would like you to provide the code to create the dataset.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.