VLG: General Video Recognition with Web Textual Knowledge

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14
pip install decord
pip install git+https://github.com/ildoonet/pytorch-randaugment

Data Preparation

Kinetics-Close/Kinetics-LT

Download the Kinetics videos from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400/
  videos_train/
    vid1.mp4
    ...
  videos_val/
    vid2.mp4
    ...
  wiki/
    desc_0.txt
    ...
  k400_LT_train_videos.txt
  k400_LT_val_videos.txt
  kinetics_video_train_list.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Fewshot

We used the split from CMN for Kinetics-Fewshot.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics100_base
  wiki/
    desc_0.txt
    ...
  k100_base_train_list.txt
  labels.txt
./data/kinetics100_test
  wiki/
    desc_0.txt
    ...
  k100_support_query_list.txt
  labels.txt

Kinetics-Fewshot-C-way

we used the split from Efficient-Prompt for Kinetics-Fewshot-C-way.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_fewshot_C
  wiki/
    desc_0.txt
    ...
  k400_fewshot_c_train_split_0.txt
  k400_fewshot_c_train_split_1.txt
  ...
  k400_fewshot_c_train_split_9.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Openset

Download the split from here for Kinetics-Openset.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_openset
  wiki/
    desc_0.txt
    ...
  k400_openset_train_list.txt
  k400_openset_val_list.txt
  labels.txt

Evaluation

To evaluate VLG, you can run:

Pre-training stage:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain

Fine-tuning stage:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval

For fewshot cases, you can run:

bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8

For openset cases, you can run:

bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VLG on a single node with 8 GPUs for:

Pre-training stage, run:

bash dist_train_arun.sh ${CONFIG_PATH} 8

Fine-tuning stage:

First, select the salient sentences by running this:

bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select

Then, running this:

bash dist_train_arun.sh ${CONFIG_PATH} 8

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Pretrained Models:

The checkpoints are provided in Baidu Netdisk, and the corresponding code is nc6e.

Citation

If you are interested in our work, please cite as follows:

@article{lin2022vlg,
  title={VLG: General Video Recognition with Web Textual Knowledge},
  author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
  journal={arXiv preprint arXiv:2212.01638},
  year={2022}
}

Acknowledge

This repo contains modified codes from: VL-LTR, ActionCLIP, and OpenMax.

mcg-nju / vlg Goto Github PK

vlg's Introduction

VLG: General Video Recognition with Web Textual Knowledge

Usage

Data Preparation

Kinetics-Close/Kinetics-LT

Kinetics-Fewshot

Kinetics-Fewshot-C-way

Kinetics-Openset

Evaluation

Training

Pretrained Models:

Citation

Acknowledge

vlg's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent