Giter Club home page Giter Club logo

looking-to-listen-at-cocktail-party's Introduction

alt title

This is Keras+Tensorflow implementation of paper "Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation" from Ephrat et el. from Google Research. The project also uses ideas from the paper "Seeing Through Noise:Visually Driven Speaker Seperation and Enhancement"

Compatibility

The code is tested using Tensorflow 1.13.1 under Ubuntu 18.00 with python 3.6.

News

Date Update
08-06-2019 Notebook added for full pipeline with pretrained model.
25-05-2019 Datasets added for mixed user videos.
23-04-2019 Added automated scripts for creating database structure.

External Dependencies

This repo uses code from facenet and face_recognition for tracking and extracting features from faces in videos.

Usage

Database structure

Given a way to store audio and video datasets efficiently without much duplication.

|--speaker_background_spectrograms/
|  |--per speaker part 1/
|  |    |--speaker_clean.pkl
|  |    |--speaker_chatter_i.pkl
|  |--per speaker part 2/
|  |--  |--speaker_clean.pkl
|       |--speaker_chatter_i.pkl
|--two_speakers_mix_spectrograms/
|	 |--per speaker/
|	 |	|--clean.pkl
|	 |	|--mix_with_other_i.pkl
|--speaker_video_spectrograms
|	 |--per_speaker part 1/
|	 |	|--clean.pkl
|	 |--per_speaker part 2/
|	 |	|--clean.pkl
|--chatter audios/
|  |--part1/
|  |--part2/
|  |--part3/
|--clean audios/
|	 |--videos/
|	 |--frames/
|	 |--pretrained_model/
|	 |  |--facenet_model.h5

Getting started

1.Install all dependencies

pip install -r requirements.txt

2.Run prepare_directory script

./data/prepare_directory.sh

3.download avspeech train and test csv files and put in data/

4.Run background chatter files downloader and slicer to download and slice chatter files.This will download chatter files with tag "/m/07rkbfh" from Audioset

python data/chatter_download.py
python data/chatter_slicer.py

5.Start Downloading data for avspeech dataset and process with your choice with arguments.

python data/data_data_download.py --from_id=0 --to_id=1000 --type_of_dataset=audio_dataset

Arguments available

from_id -> start downloading youtube clips download from train.csv from this id

to_id -> start downloading youtube clips download from train.csv to this id

type_of_dataset -> type of dataset to prepare.
  audio_dataset -> create audio spectrogram mixed with background chatter
  audio_video_dataset -> create audio spectrogram and video embeddings and spectrograms of speaker mixed other speakers audio.
  
low_memory -> clear unnecessary stuff

chatter_part -> user different slots of chatter files to be mixed with clean speakers audio

sample_rate,duration,fps,mono,window,stride,fft_length,amp_norm,chatter_norm -> arguments for STFT and audio processing

face_extraction_model -> select which model to use for facial embedding extraction
  hog -> faster on cpu but less accurate
  cnn -> slower on cpu,faster on nvidia gpu,more accurate

Datasets

  1. Video mixed dataset is availble on my kaggle page in 10 parts.(created using default parameters above)
Go to my kaggle profile(https://www.kaggle.com/mayurnewase)
Click on datasets
Sort by new
Datasets are named by mix_speakers_ultimate_*
Total 10 parts are available.

To do

Check here

looking-to-listen-at-cocktail-party's People

Contributors

mayurnewase avatar chauhanmayur2850 avatar gungui98 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.