Giter Club home page Giter Club logo

awesome-audiovisual-learning's Introduction

Overview

This is a curated list of audio-visual learning methods and datasets, based on our survey: <Learning in Audio-visual Context: A Review, Analysis, and New Perspective>. This list will continue to be updated, please feel free to nominate good related works with Pull Requests!

[Website of Our Survey], [arXiv]

Table of contents

Audio-visual Boosting

Audio-visual Recognition

Speech Recognition

Speaker Recognition

Action Recognition

Emotion Recognition

Uni-modal Enhancement

Speech Enhancement and Separation

Object Sound Separation

Face Super-resolution and Reconstruction

Cross-modal Perception

Cross-modal Generation

Mono Sound Generation

Speech
Music
Natural Sound

Spatial Sound Generation

Video Generation

talking face
Gesture
Dance

Depth Estimation

Audio-visual Transfer Learning

Cross-modal Retrieval

Audio-visual Collaboration

Audio-visual Representation Learning

Audio-visual Localization

Sound Localization in Videos

Audio-visual Saliency Detection

Audio-visual Navigation

Audio-visual Event Localization and Parsing

Localization

Parsing

Audio-visual Question Answering and Dialog

Question Answering

Dialog

Datasets

Dataset Year # Videos Length Data form Video source Task
LRW, LRS2 and LRS3 2016,2018, 2018 - 800h+ video in the wild Speech-related, speaker-related,face generation-related tasks
VoxCeleb, VoxCeleb2 2017, 2018 - 2,000h+ video YouTube Speech-related, speaker-related,face generation-related tasks
AVA-ActiveSpeaker} 2019 - 38.5h video YouTube Speech-related task, speaker-related task
Kinetics-400 2017 306,245 850h+ video YouTube Action recognition
EPIC-KITCHENS 2018 39,594 55h video Recorded videos Action recognition
CMU-MOSI 2016 2,199 2h+ video YouTube Emotion recognition
CMU-MOSEI 2018 23,453 65h+ video YouTube Emotion recognition
VGGSound 2020 200k+ 550h+ video YouTube Action recognition, sound localization
AudioSet 2017 2M+ 5,800h+ video YouTube Action recognition, sound sepearation
Greatest Hits 2016 977 9h+ video Recorded videos Sound generation
MUSIC 2018 714 23h+ video YouTube Sound seperation, sound localization
FAIR-Play 2019 1,871 5.2h video with binaural sound Recorded videos Spatial sound generation
YT-ALL 2018 1,146 113.1h 360 video YouTube Spatial sound generation
Replica 2019 - - 3D environment 3D simulator Depth estimation
AIST++ 2021 - 5.2h 3D video Recorded videos Dance generation
TED 2019 - 52h video TED talks Gesture generation
SumMe 2014 25 1h+ video with eye-tracking User videos Saliency detection
AVE 2018 4,143 11h+ video YouTube Event localization
LLP 2020 11,849 32.9h video YouTube Event parsing
SoundSpaces 2020 - - 3D environment 3D simulator Audio-visual navigation
AVSD 2019 11,816 98h+ video with dialog Crowd-sourced Audio-visual dialog
Pano-AVQA 2021 5.4k 7.7h 360 video with QA Video-sharing platforms Audio-visual question answering
MUSIC-AVQA 2022 9,288 150h+ video with QA YouTube Audio-visual question answering
AVSBench 2022 5,356 14.8h+ video YouTube Audio-visual segmentation, sound localization

awesome-audiovisual-learning's People

Contributors

echo0409 avatar jasongief avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.