Giter Club home page Giter Club logo

fsfg's Introduction

Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

Official repository of ACM MM 2021 paper "Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning"

Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions: the inability to capture subtle action details and the inadequacy in learning from data with low inter-class variance. To tackle the first issue, a human vision inspired bidirectional attention module (BAM) is proposed. Combining top-down task-driven signals with bottom-up salient stimuli, BAM captures subtle action details by accurately highlighting informative spatio-temporal regions. To address the second issue, we introduce contrastive meta-learning (CML). Compared with the widely adopted ProtoNet-based method, CML generates more discriminative video representations for low inter-class variance data, since it makes full use of potential contrastive pairs in each training episode. Furthermore, to fairly compare different models, we establish specific benchmark protocols on two large-scale fine-grained action recognition datasets. Extensive experiments show that our method consistently achieves state-of-the-art performance across evaluated tasks.

Overview of our framework framework Visualization of attention maps generated by our bidirectional attention module (BAM) attention Comparision of video representations generated by ProtoNet and the proposed contrastive meta-learning (CML) cml

Download Data

The evaluations are performed on two publicly released fine-grained action recognition datasets: FineGym and HAA500.

FineGym

Please visit the official site of FineGym to download data and extract frames for video clips with Gym99 and Gym288 annotations.

HAA500

Please visit the official site of HAA500 and download the 1.1 version of the dataset. Directly extract frames for each video since they are already trimmed.

Few-Shot Learning Protocols

You can find the split files of our few-shot fine-grained action recognition benchmarks here. The specific training/test protocols of Gym99, Gym288 and HAA500 are introduced in Sec. 4.1 of our paper.

For each benchmark, "test_class.txt" and "train_class.txt" contain the labels of training/test categories. "train_split.txt" and "test_split.txt" contain the annotations for dataloaders in TPN style, i.e. each line is a record of a video sample in the form of: directory path of a video, total frame number, groundtruth label.

If you find this repository useful, please cite:

@inproceedings{wang2021few-shot,
  title={Few-shot fine-grained action recognition via bidirectional attention and contrastive meta-learning},
  author={Wang, Jiahao and Wang, Yunhong and Liu, Sheng and Li, Annan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

fsfg's People

Contributors

acewjh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.