Giter Club home page Giter Club logo

ste-nvan's Introduction

Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification

  • NVAN

  • STE-NVAN

[Paper] [arXiv]

Chih-Ting Liu, Chih-Wei Wu, Yu-Chiang Frank Wang and Shao-Yi Chien,
British Machine Vision Conference (BMVC), 2019

This is the pytorch implementatin of Spatially and Temporally Efficient Non-local Video Attention Network (STE-NVAN) for video-based person Re-ID.
It achieves 90.0% for the baseline version and 88.9% for the ST-efficient model in rank-1 accuracy on MARS dataset.

Prerequisites

  • Python3.5+
  • Pytorch (We run the code under version 1.0.)
  • torchvisoin (We run the code under version 0.2.2)

Getting Started

Installation

  • Install dependancy. You can install all the dependancies by:
$ pip3 install numpy, Pillow, progressbar2, tqdm, pandas 

Datasets

We conduct experiments on MARS and DukeMTMC-VideoReID (DukeV) datasets.

For MARS dataset:

  • Download and unzip the dataset from the official website. (Google Drive)
  • Clone the repo of MARS-evaluation. We will need the files under info/ directory.
    You will have the structure as follows:
path/to/your/MARS dataset/
|-- bbox_train/
|-- bbox_test/
|-- MARS-evaluation/
|   |-- info/
  • run create_MARS_database.py to create the database files (.txt and .npy files) into "MARS_database" directory.
$ python3 create_MARS_database.py --data_dir /path/to/MARS dataset/ \
                                --info_dir /path/to/MARS dataset/MARS-evaluation/info/ \
                                --output_dir ./MARS_database/

For DukeV dataset:

  • Download and unzip the dataset from the official github page. (data link)
    You will have the structure as follows:
path/to/your/DukeV dataset/
|-- gallery/
|-- query/
|-- train/
  • run create_DukeV_database.py to create the database files (.txt and .npy files) into "DukeV_database" directory.
$ python3 create_DukeV_database.py --data_dir /path/to/DukeV dataset/ \
                                 --output_dir ./DukeV_database/

Usage-Testing

We rewrite the evaluation code in here with python.

Furthermore, we follow the video-based evaluation metric in this paper.

In detail, we will sample the first frame in each chunk of a tracklet.

Prerequisite

For testing, we provide three trained models on MARS dataset in this link.

You should first create a directory with this command: $ mkdir ckpt, to put these three models under the directory.

All three execution commands are in the script run_evaluate.sh. You can check and alter the arguments inside and run

$ sh run_evaluate.sh

to obtain the rank-1 accuracy and the mAP score.

Some scores are different to those in my paper because some models are lost in my previous computer. (I've retrained them again.)

The evaluation commands of three models are as follows.

Baseine model : Resnet50 + FPL (mean)

Uncomment this part. You will get R1=87.42% and mAP=79.44%.

# Evaluate ResNet50 + FPL (mean or max)
LOAD_CKPT=./ckpt/R50_baseline_mean.pth
python3 evaluate.py --test_txt $TEST_TXT --test_info $TEST_INFO --query_info $QUERY_INFO \
                    --batch_size 64 --model_type 'resnet50_s1' --num_workers 8  --S 8 \
                    --latent_dim 2048 --temporal mean --stride 1 --load_ckpt $LOAD_CKPT 

NVAN : R50 + 5 Non-local layers + FPL

Uncomment this part. You will get R1=90.00% and mAP=82.79%.

#Evaluate NVAN (R50 + 5 NL + FPL)
LOAD_CKPT=./ckpt/NVAN.pth
python3 evaluate.py --test_txt $TEST_TXT  --test_info  $TEST_INFO   --query_info $QUERY_INFO \
                    --batch_size 64 --model_type 'resnet50_NL' --num_workers 8  --S 8 --latent_dim 2048 \
                    --temporal Done  --non_layers  0 2 3 0 --load_ckpt $LOAD_CKPT \

STE-NVAN : NVAN + Spatial Reduction + Temporal Reduction

Uncomment this part. You will get R1=88.69% and mAP=81.27%.

# Evaluate NVAN (R50 + 5 NL + Stripe + Hierarchical + FPL)
LOAD_CKPT=./ckpt/STE_NVAN.pth
python3 evaluate.py --test_txt $TEST_TXT  --test_info  $TEST_INFO   --query_info $QUERY_INFO \
                    --batch_size 128 --model_type 'resnet50_NL_stripe_hr' --num_workers 8  --S 8 --latent_dim 2048 \
                    --temporal Done  --non_layers  0 2 3 0 --stripe 16 16 16 16 --load_ckpt $LOAD_CKPT \

Usage-Training

As mentioned in our paper, we have three kinds of models. (Baseline, NVAN, STE-NVAN)

Baseine model : Resnet50 + FPL (mean)

You can alter the arguments in run_baseline.sh or just use this command:

$ sh run_baseline.sh

NVAN : R50 + 5 Non-local layers + FPL

You can alter the arguments or uncomment this part in run_NL.sh:

# For NVAN
CKPT=ckpt_NL_0230
python3 train_NL.py --train_txt $TRAIN_TXT --train_info $TRAIN_INFO  --batch_size 64 \
                     --test_txt $TEST_TXT  --test_info  $TEST_INFO   --query_info $QUERY_INFO \
                     --n_epochs 200 --lr 0.0001 --lr_step_size 50 --optimizer adam --ckpt $CKPT --log_path loss.txt --class_per_batch 8 \
                     --model_type 'resnet50_NL' --num_workers 8 --track_per_class 4 --S 8 --latent_dim 2048 --temporal Done  --track_id_loss \
                     --non_layers  0 2 3 0

Then run this script.

$ sh run_NL.sh

STE-NVAN : NVAN + Spatial Reduction + Temporal Reduction

You can alter the arguments or uncomment this part in run_NL.sh:

# For STE-NVAN
CKPT=ckpt_NL_stripe16_hr_0230
python3 train_NL.py --train_txt $TRAIN_TXT --train_info $TRAIN_INFO  --batch_size 64 \
                    --test_txt $TEST_TXT  --test_info  $TEST_INFO   --query_info $QUERY_INFO \
                    --n_epochs 200 --lr 0.0001 --lr_step_size 50 --optimizer adam --ckpt $CKPT --log_path loss.txt --class_per_batch 8 \
                    --model_type 'resnet50_NL_stripe_hr' --num_workers 8 --track_per_class 4 --S 8 --latent_dim 2048 --temporal Done  --track_id_loss \
                    --non_layers  0 2 3 0 --stripes 16 16 16 16 

Then run this script.

$ sh run_NL.sh

Citation

@inproceedings{liu2019spatially,
  title={Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification},
  author={Liu, Chih-Ting and Wu, Chih-Wei and Wang, Yu-Chiang Frank and Chien, Shao-Yi},
  booktitle={British Machine Vision Conference},
  year={2019}
}

Reference

Chih-Ting Liu, Media IC & System Lab, National Taiwan University

E-mail : [email protected]

ste-nvan's People

Contributors

jackie840129 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.