Giter Club home page Giter Club logo

vlg's Introduction

VLG: General Video Recognition with Web Textual Knowledge

Usage

First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install mmcv==1.3.14
pip install decord
pip install git+https://github.com/ildoonet/pytorch-randaugment

Data Preparation

Kinetics-Close/Kinetics-LT

Download the Kinetics videos from here.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400/
  videos_train/
    vid1.mp4
    ...
  videos_val/
    vid2.mp4
    ...
  wiki/
    desc_0.txt
    ...
  k400_LT_train_videos.txt
  k400_LT_val_videos.txt
  kinetics_video_train_list.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Fewshot

We used the split from CMN for Kinetics-Fewshot.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics100_base
  wiki/
    desc_0.txt
    ...
  k100_base_train_list.txt
  labels.txt
./data/kinetics100_test
  wiki/
    desc_0.txt
    ...
  k100_support_query_list.txt
  labels.txt

Kinetics-Fewshot-C-way

we used the split from Efficient-Prompt for Kinetics-Fewshot-C-way.

Download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_fewshot_C
  wiki/
    desc_0.txt
    ...
  k400_fewshot_c_train_split_0.txt
  k400_fewshot_c_train_split_1.txt
  ...
  k400_fewshot_c_train_split_9.txt
  kinetics_video_val_list.txt
  labels.txt

Kinetics-Openset

Download the split from here for Kinetics-Openset.

Then download and extract the wiki text into the same directory. The directory tree of data is expected to be like this:

./data/kinetics400_openset
  wiki/
    desc_0.txt
    ...
  k400_openset_train_list.txt
  k400_openset_val_list.txt
  labels.txt

Evaluation

To evaluate VLG, you can run:

  • Pre-training stage:
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain
  • Fine-tuning stage:
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval

For fewshot cases, you can run:

bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8

For openset cases, you can run:

bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Training

To train VLG on a single node with 8 GPUs for:

  • Pre-training stage, run:
bash dist_train_arun.sh ${CONFIG_PATH} 8
  • Fine-tuning stage:

    • First, select the salient sentences by running this:

      bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select 
      
    • Then, running this:

      bash dist_train_arun.sh ${CONFIG_PATH} 8
      

The ${CONFIG_PATH} is the relative path of the corresponding configuration file in the config directory.

Pretrained Models:

The checkpoints are provided in Baidu Netdisk, and the corresponding code is nc6e.

Citation

If you are interested in our work, please cite as follows:

@article{lin2022vlg,
  title={VLG: General Video Recognition with Web Textual Knowledge},
  author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
  journal={arXiv preprint arXiv:2212.01638},
  year={2022}
}

Acknowledge

This repo contains modified codes from: VL-LTR, ActionCLIP, and OpenMax.

vlg's People

Contributors

dreamerlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.