Giter Club home page Giter Club logo

mmaction2's Introduction

English | 简体中文

📄 Table of Contents

🥳 🚀 What's New 🔝

The default branch has been switched to main(previous 1.x) from master(current 0.x), and we encourage users to migrate to the latest version with more supported models, stronger pre-training checkpoints and simpler coding. Please refer to Migration Guide for more details.

Release (2023.10.12): v1.2.0 with the following new features:

  • Support VindLU multi-modality algorithm and the Training of ActionClip
  • Support lightweight model MobileOne TSN/TSM
  • Support video retrieval dataset MSVD
  • Support SlowOnly K700 feature to train localization models
  • Support Video and Audio Demos

📖 Introduction 🔝

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

Action Recognition on Kinetics-400 (left) and Skeleton-based Action Recognition on NTU-RGB+D-120 (right)


Skeleton-based Spatio-Temporal Action Detection and Action Recognition Results on Kinetics-400


Spatio-Temporal Action Detection Results on AVA-2.1

🎁 Major Features 🔝

  • Modular design: We decompose a video understanding framework into different components. One can easily construct a customized video understanding framework by combining different modules.

  • Support five major video understanding tasks: MMAction2 implements various algorithms for multiple video understanding tasks, including action recognition, action localization, spatio-temporal action detection, skeleton-based action detection and video retrieval.

  • Well tested and documented: We provide detailed documentation and API reference, as well as unit tests.

🛠️ Installation 🔝

MMAction2 depends on PyTorch, MMCV, MMEngine, MMDetection (optional) and MMPose (optional).

Please refer to install.md for detailed instructions.

Quick instructions
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch torchvision -c pytorch  # This command will automatically install the latest version PyTorch and cudatoolkit, please check whether they match your environment.
pip install -U openmim
mim install mmengine
mim install mmcv
mim install mmdet  # optional
mim install mmpose  # optional
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip install -v -e .

👀 Model Zoo 🔝

Results and models are available in the model zoo.

Supported model
Action Recognition
C3D (CVPR'2014) TSN (ECCV'2016) I3D (CVPR'2017) C2D (CVPR'2018) I3D Non-Local (CVPR'2018)
R(2+1)D (CVPR'2018) TRN (ECCV'2018) TSM (ICCV'2019) TSM Non-Local (ICCV'2019) SlowOnly (ICCV'2019)
SlowFast (ICCV'2019) CSN (ICCV'2019) TIN (AAAI'2020) TPN (CVPR'2020) X3D (CVPR'2020)
MultiModality: Audio (ArXiv'2020) TANet (ArXiv'2020) TimeSformer (ICML'2021) ActionCLIP (ArXiv'2021) VideoSwin (CVPR'2022)
VideoMAE (NeurIPS'2022) MViT V2 (CVPR'2022) UniFormer V1 (ICLR'2022) UniFormer V2 (Arxiv'2022) VideoMAE V2 (CVPR'2023)
Action Localization
BSN (ECCV'2018) BMN (ICCV'2019) TCANet (CVPR'2021)
Spatio-Temporal Action Detection
ACRN (ECCV'2018) SlowOnly+Fast R-CNN (ICCV'2019) SlowFast+Fast R-CNN (ICCV'2019) LFB (CVPR'2019) VideoMAE (NeurIPS'2022)
Skeleton-based Action Recognition
ST-GCN (AAAI'2018) 2s-AGCN (CVPR'2019) PoseC3D (CVPR'2022) STGCN++ (ArXiv'2022) CTRGCN (CVPR'2021)
MSG3D (CVPR'2020)
Video Retrieval
CLIP4Clip (ArXiv'2022)
Supported dataset
Action Recognition
HMDB51 (Homepage) (ICCV'2011) UCF101 (Homepage) (CRCV-IR-12-01) ActivityNet (Homepage) (CVPR'2015) Kinetics-[400/600/700] (Homepage) (CVPR'2017)
SthV1 (ICCV'2017) SthV2 (Homepage) (ICCV'2017) Diving48 (Homepage) (ECCV'2018) Jester (Homepage) (ICCV'2019)
Moments in Time (Homepage) (TPAMI'2019) Multi-Moments in Time (Homepage) (ArXiv'2019) HVU (Homepage) (ECCV'2020) OmniSource (Homepage) (ECCV'2020)
FineGYM (Homepage) (CVPR'2020) Kinetics-710 (Homepage) (Arxiv'2022)
Action Localization
THUMOS14 (Homepage) (THUMOS Challenge 2014) ActivityNet (Homepage) (CVPR'2015) HACS (Homepage) (ICCV'2019)
Spatio-Temporal Action Detection
UCF101-24* (Homepage) (CRCV-IR-12-01) JHMDB* (Homepage) (ICCV'2015) AVA (Homepage) (CVPR'2018) AVA-Kinetics (Homepage) (Arxiv'2020)
MultiSports (Homepage) (ICCV'2021)
Skeleton-based Action Recognition
PoseC3D-FineGYM (Homepage) (ArXiv'2021) PoseC3D-NTURGB+D (Homepage) (ArXiv'2021) PoseC3D-UCF101 (Homepage) (ArXiv'2021) PoseC3D-HMDB51 (Homepage) (ArXiv'2021)
Video Retrieval
MSRVTT (Homepage) (CVPR'2016)

👨‍🏫 Get Started 🔝

For tutorials, we provide the following user guides for basic usage:

Research works built on MMAction2 by users from community
  • Video Swin Transformer. [paper][github]
  • Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [paper][github]
  • Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [paper][github]

🎫 License 🔝

This project is released under the Apache 2.0 license.

🖊️ Citation 🔝

If you find this project useful in your research, please consider cite:

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

🙌 Contributing 🔝

We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.

🤝 Acknowledgement 🔝

MMAction2 is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features and users who give valuable feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their new models.

🏗️ Projects in OpenMMLab 🔝

  • MMEngine: OpenMMLab foundational library for training deep learning models.
  • MMCV: OpenMMLab foundational library for computer vision.
  • MIM: MIM installs OpenMMLab packages.
  • MMEval: A unified evaluation library for multiple machine learning libraries.
  • MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
  • MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
  • MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
  • MMRazor: OpenMMLab model compression toolbox and benchmark.
  • MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMFlow: OpenMMLab optical flow toolbox and benchmark.
  • MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
  • MMGeneration: OpenMMLab image and video generative models toolbox.
  • MMDeploy: OpenMMLab model deployment framework.
  • Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.

mmaction2's People

Contributors

dreamerlin avatar kennymckormick avatar dai-wenxun avatar innerlee avatar cir7 avatar hukkai avatar joannalxy avatar congee524 avatar irvingzhang0512 avatar gengenkai avatar jackytown avatar magicdream2222 avatar sunnyxiaohu avatar hellock avatar rlleshi avatar ly015 avatar makecent avatar zheng-linxiao avatar sczwangxiao avatar jamiechoi1995 avatar michael-camilleri avatar wangruohui avatar shoufachen avatar yzfly avatar yuta1125tp avatar yrquni avatar yaochaorui avatar jin-s13 avatar willoscar avatar xwen99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.