Awesome Deep Learning for Video Analysis

This repo contains some video analysis, especiall multimodal learning for video analysis, research. I summarize some papers and categorize them by myself. You are kindly invited to pull requests!

I pay more attention on multimodal learning related work and some research like action recognition is not the main scope of this repo.

Dataset
Multimodal for Video Analysis
Video Moment Localization
Video Retrieval
Video Advertisement
Visual Commonsense Reasoning
Video Highlight
Object Tracking
Audio-Visual Dialog
Action Recognition

Dataset:

I find a very interesting website

Sortable and searchable compilation of video dataset [Video Dataset Overview]

AVA dataset: AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. [Project]
PyVideoResearch: A repositsory of common methods, datasets, and tasks for video research [GitHub]
How2 Dataset: How2: A Large-scale Dataset for Multimodal Language Understanding [Paper] [GitHub]
Moments in Time Dataset A large-scale dataset for recognizing and understanding action in videos [Dataset] [Pretrained Model]
Pretrained image and video models for Pytorch [GitHub]
Youtube-8M, new segment task! [Blog]

Tool

This document describes the collection of utilities created for Detection and Classification of Acoustic Scenes and Events (DCASE). [GitHub]

Paper:

Action recognition (Spatiotemporal Features, Video Classification)

Long-Term Feature Banks for Detailed Video Understanding (CVPR2019) [Paper][GitHub]
Deep Learning for Video Classification and Captioning [Paper]
Large-scale Video Classification with Convolutional Neural Networks [Paper]
Learning Spatiotemporal Features with 3D Convolutional Networks [Paper]
Two-Stream Convolutional Networks for Action Recognition in Videos [Paper]
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [Paper]
Non-local neural networks [Paper]](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf) GitHub
- Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. (CVPR 2018)
- Summary:
Learning Correspondence from the Cycle-consistency of Time [Paper] [GitHub]
- Xiaolong Wang and Allan Jabri and Alexei A. Efros (CVPR2019)
- Summary:
3D ConvNets in Pytorch [GitHub]

Multimodal For video Analysis

Awsome list for multimodal learning [GitHub]
VideoBERT: A Joint Model for Video and Language Representation Learning [Paper]
AENet: Learning Deep Audio Features for Video Analysis [Paper] [GitHub]
Look, Listen and Learn [Paper]
Objects that Sound [Paper]
Learning to Separate Object Sounds by Watching Unlabeled Video [Paper]
- Gao, Ruohan, Rogerio Feris, and Kristen Grauman. arXiv preprint arXiv:1804.01665 2018
Ambient Sound Provides Supervision for Visual Learning [Paper]
- Owens, Andrew, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. ECCV 2016
- Summary: unsupervised learning

Video Moment Localization

Localizing Moments in Video with Natural Language [Paper][GitHub]

Video Retrieval

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data." [Paper][GitHub]
- Miech, Antoine, Ivan Laptev, and Josef Sivic. ECCV 2018
- Summary: combine multi-modality information, calculate similarities and weight different similarities
Cross-Modal and Hierarchical Modeling of Video and Text [Paper]
- B. Zhang * , H. Hu * , F. Sha ECCV 2018
- Summary: learning the intrinsic hierarchical structures of both videos and texts. (Make video and text closer, make videos closer and make text closer)
A dataset for movie description. [Paper]
- Rohrbach, Anna, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. CVPR 2015
- Summary: dataset paper
Web-scale Multimedia Search for Internet Video Content. [Thesis]
- Lu Jiang
- Summary: amazing thesis

Video Advertisement (Also include some image advertisement paper)

Automatic understanding of image and video advertisements [Paper] [Project]
- Hussain, Zaeem, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. CVPR 2017
- Summary: Image and video advertisement datasets and baselines.
Multimodal Representation of Advertisements Using Segment-level Autoencoders [Paper] [GitHub]
- Somandepalli, Krishna, Victor Martinez, Naveen Kumar, and Shrikanth Narayanan. ICMI 2018
- Summary: video and audio features to understand whether video is funny or not.
Story Understanding in Video Advertisements. [Paper] [GitHub]
- Keren Ye, Kyle Buettner, Adriana Kovashka BMVC 2018
- Summary: Combine multiple features including climax, audio and so on to analyze video ads.
ADVISE: Symbolism and External Knowledge for Decoding Advertisements. [Paper] [GitHub]
- Keren Ye and Adriana Kovashka. (ECCV2018)
- Summary: action-reason statement for advertisement. Many pre-trained models are as prior knowledge. SSD, DenseCAP and GloVe.

Visual Commonsense Reasoning

From Recognition to Cognition: Visual Commonsense Reasoning [Paper] [Project Website]
- Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi (CVPR2019)
- Summary: First dataset paper. Use BERT and fastrcnn as the baseline

Video Highlight Prediction

Video highlight prediction using audience chat reactions
- Fu, Cheng-Yang, Joon Lee, Mohit Bansal, and Alexander C. Berg. (EMNLP 2017)

Object Tracking

SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask. [GitHub]

Audio-Visual Dialog

Audio-Visual Scene-Aware Dialog [GitHub]
- Alamri, Huda, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa et al.
- arXiv preprint arXiv:1901.09107 (2019)

benjamesbabala / awsome-deep-learning-for-video-analysis Goto Github PK

awsome-deep-learning-for-video-analysis's Introduction

Awesome Deep Learning for Video Analysis

Contents

Dataset:

Sortable and searchable compilation of video dataset [Video Dataset Overview]

Tool

Paper:

Action recognition (Spatiotemporal Features, Video Classification)

Multimodal For video Analysis

Video Moment Localization

Video Retrieval

Video Advertisement (Also include some image advertisement paper)

Visual Commonsense Reasoning

Video Highlight Prediction

Object Tracking

Audio-Visual Dialog

awsome-deep-learning-for-video-analysis's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org