Giter Club home page Giter Club logo

st-adapter's Introduction

[NeurIPS 2022] ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

This is the official repo of the paper ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning.

@article{pan2022st,
  title={ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition},
  author={Pan, Junting and Lin, Ziyi and Zhu, Xiatian and Shao, Jing and Li, Hongsheng},
  journal={arXiv preprint arXiv:2206.13559},
  year={2022}
}

Environment

We use conda to manage the Python environment. The dumped configuration is provided at environment.yml

Configuration

Some common configurations (e.g., dataset paths, pretrained backbone paths) are set in config.py. We've included an example configuration in config.py.example which contains all required fields with values left empty. Please copy config.py.example to config.py and fill in the values before running the models.

Dataset preparation

The data list should be organized as follows

<video_1> <label_1>
<video_2> <label_2>
...
<video_N> <label_N>

where <video_i> is the path to a video file, and <label_i> is an integer between $0$ and $M-1$ representing the class of the $i$-th video, where $M$ is the total number of classes.

We release the data list we used for Kinetics-400 (k400, train list link, val list link) and Something-something-v2 (ssv2, train list link, val list link), which reflect the class mapping of the released models and the videos available at our side. It is strongly recommended that the Kinetics-400 lists be cleaned first, as some videos may have been taken down by YouTube for various reasons (the training will stop on broken videos in the current implementation).

After obtaining the videos and the data lists, set the root dir and the list paths in config.py in the DATASETS dictionary (fill in the blanks for k400 and ssv2 or add new items for custom datasets). For each dataset, 5 fields are required:

  • TRAIN_ROOT: the root directory which the video paths in the training list are relative to.

  • VAL_ROOT: the root directory which the video paths in the validation list are relative to.

  • TRAIN_LIST: the path to the training video list.

  • VAL_LIST: the path to the validation video list.

  • NUM_CLASSES: number of classes of the dataset.

Backbone preparation

We use the CLIP checkpoints from the official release. Put the downloaded checkpoint paths in config.py. The currently supported architectures are CLIP-ViT-B/16 (set CLIP_VIT_B16_PATH) and CLIP-ViT-L/14 (set CLIP_VIT_B16_PATH).

Run the models

We provide some preset scripts in the scripts/ directory containing some recommended settings. For a detailed description of the comand line arguments see the help message of main.py.

Model zoo

The adapter architecture is described as (# adapter layers x # bottleneck channels). This is a reproduced code, so the accuracy of the checkpoints may slightly differ from the numbers reported in the paper. More models are to be released soon.

Something-something-v2

Backbone arch. Adapter arch. Acc. 1 (%) Links
CLIP ViT-B/16 24x384 66.9 script log checkpoint

Kinetics-400

Backbone arch. Adapter arch. Acc. 1 (%) Links
CLIP ViT-B/16 12x384 82.2 script log checkpoint

Acknowledgements

The CLIP model implementation is modified from CLIP official repo. Some data processing code comes from PySlowFast. Part of the training code comes from MAE. Thanks for their awesome works!

st-adapter's People

Contributors

linziyi96 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.