Giter Club home page Giter Club logo

awsome-deep-learning-for-video-analysis's Introduction

Maintenance Awesome GitHub

Awesome Deep Learning for Video Analysis

This repo contains some video analysis, especiall multimodal learning for video analysis, research. I summarize some papers and categorize them by myself. You are kindly invited to pull requests!

I pay more attention on multimodal learning related work and some research like action recognition is not the main scope of this repo.

Contents

Dataset:

I find a very interesting website

Sortable and searchable compilation of video dataset [Video Dataset Overview]

  • AVA dataset: AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. [Project]
  • PyVideoResearch: A repositsory of common methods, datasets, and tasks for video research [GitHub]
  • How2 Dataset: How2: A Large-scale Dataset for Multimodal Language Understanding [Paper] [GitHub]
  • Moments in Time Dataset A large-scale dataset for recognizing and understanding action in videos [Dataset] [Pretrained Model]
  • Pretrained image and video models for Pytorch [GitHub]
  • Youtube-8M, new segment task! [Blog]

Tool

  • This document describes the collection of utilities created for Detection and Classification of Acoustic Scenes and Events (DCASE). [GitHub]

Paper:

Action recognition (Spatiotemporal Features, Video Classification)

Multimodal For video Analysis

  • Awsome list for multimodal learning [GitHub]
  • VideoBERT: A Joint Model for Video and Language Representation Learning [Paper]
  • AENet: Learning Deep Audio Features for Video Analysis [Paper] [GitHub]
  • Look, Listen and Learn [Paper]
  • Objects that Sound [Paper]
  • Learning to Separate Object Sounds by Watching Unlabeled Video [Paper]
    • Gao, Ruohan, Rogerio Feris, and Kristen Grauman. arXiv preprint arXiv:1804.01665 2018
  • Ambient Sound Provides Supervision for Visual Learning [Paper]
    • Owens, Andrew, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. ECCV 2016
    • Summary: unsupervised learning

Video Moment Localization

Video Retrieval

  • Learning a Text-Video Embedding from Incomplete and Heterogeneous Data." [Paper][GitHub]
    • Miech, Antoine, Ivan Laptev, and Josef Sivic. ECCV 2018
    • Summary: combine multi-modality information, calculate similarities and weight different similarities
  • Cross-Modal and Hierarchical Modeling of Video and Text [Paper]
    • B. Zhang * , H. Hu * , F. Sha ECCV 2018
    • Summary: learning the intrinsic hierarchical structures of both videos and texts. (Make video and text closer, make videos closer and make text closer)
  • A dataset for movie description. [Paper]
    • Rohrbach, Anna, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. CVPR 2015
    • Summary: dataset paper
  • Web-scale Multimedia Search for Internet Video Content. [Thesis]
    • Lu Jiang
    • Summary: amazing thesis

Video Advertisement (Also include some image advertisement paper)

  • Automatic understanding of image and video advertisements [Paper] [Project]
    • Hussain, Zaeem, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. CVPR 2017
    • Summary: Image and video advertisement datasets and baselines.
  • Multimodal Representation of Advertisements Using Segment-level Autoencoders [Paper] [GitHub]
    • Somandepalli, Krishna, Victor Martinez, Naveen Kumar, and Shrikanth Narayanan. ICMI 2018
    • Summary: video and audio features to understand whether video is funny or not.
  • Story Understanding in Video Advertisements. [Paper] [GitHub]
    • Keren Ye, Kyle Buettner, Adriana Kovashka BMVC 2018
    • Summary: Combine multiple features including climax, audio and so on to analyze video ads.
  • ADVISE: Symbolism and External Knowledge for Decoding Advertisements. [Paper] [GitHub]
    • Keren Ye and Adriana Kovashka. (ECCV2018)
    • Summary: action-reason statement for advertisement. Many pre-trained models are as prior knowledge. SSD, DenseCAP and GloVe.

Visual Commonsense Reasoning

  • From Recognition to Cognition: Visual Commonsense Reasoning [Paper] [Project Website]
    • Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi (CVPR2019)
    • Summary: First dataset paper. Use BERT and fastrcnn as the baseline

Video Highlight Prediction

  • Video highlight prediction using audience chat reactions
    • Fu, Cheng-Yang, Joon Lee, Mohit Bansal, and Alexander C. Berg. (EMNLP 2017)

Object Tracking

  • SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask. [GitHub]

Audio-Visual Dialog

  • Audio-Visual Scene-Aware Dialog [GitHub]
    • Alamri, Huda, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa et al.
    • arXiv preprint arXiv:1901.09107 (2019)

awsome-deep-learning-for-video-analysis's People

Contributors

huaizhengzhang avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.