Giter Club home page Giter Club logo

voice_activity_detection's Introduction

Voice Activity Detection project

GitHub stars GitHub forks Docker Pulls

Keywords: Python, TensorFlow, Deep Learning, Time Series classification

Table of contents

  1. Installation
    1.1  Basic installation
    1.2 Virtual environment installation
    1.3 Docker installation
  2. Introduction
    2.1 Goal
    2.2 Results
  3. Project structure
  4. Dataset
  5. Project usage
    5.1 Dataset automatic labeling
    5.2 Record raw data to .tfrecord format
    5.3 Train a CNN to classify Speech & Noise signals
    5.4 Export trained model & run inference on Test set
  6. Todo
  7. Resources

1. Installation

This project was designed for:

  • Ubuntu 20.04
  • Python 3.7.3
  • TensorFlow 1.15.4
$ cd /path/to/project/
$ git clone https://github.com/filippogiruzzi/voice_activity_detection.git
$ cd voice_activity_detection/

1.1 Basic installation

⚠️ It is recommended to use virtual environments !

$ pyenv install 3.7.3
$ pyenv virtualenv 3.7.3 vad-venv
$ pyenv activate vad-venv
$ pip install -r requirements.txt
$ pip install -e .

1.2 Virtual environment installation

1.3 Docker installation

You can pull the latest image from DockerHub and run Python commands inside the container:

$ docker pull filippogrz/tf-vad:latest
$ docker run --rm --gpus all -v /var/run/docker.sock:/var/run/docker.sock -it --entrypoint /bin/bash -e TF_FORCE_GPU_ALLOW_GROWTH=true filippogrz/tf-vad

If you want to build the docker image and run the container from scratch, run the following commands.

Build the docker image:

$ make build

(This might take a while.)

Run the docker image:

$ make local-nobuild

2. Introduction

2.1 Goal

The purpose of this project is to design and implement a real-time Voice Activity Detection algorithm based on Deep Learning.

The designed solution is based on MFCC feature extraction and a 1D-Resnet model that classifies whether a audio signal is speech or noise.

2.2 Results

Model Train acc. Val acc. Test acc.
1D-Resnet 99 % 98 % 97 %

Raw and post-processed inference results on a test audio signal are shown below.

alt text alt text

3. Project structure

The project voice_activity_detection/ has the following structure:

  • vad/data_processing/: raw data labeling, processing, recording & visualization
  • vad/training/: data, input pipeline, model & training / evaluation / prediction
  • vad/inference/: exporting trained model & inference

4. Dataset

Please download the LibriSpeech ASR corpus dataset from https://openslr.org/12/, and extract all files to : /path/to/LibriSpeech/.

The dataset contains approximately 1000 hours of 16kHz read English speech from audiobooks, and is well suited for Voice Activity Detection.

I automatically annotated the test-clean set of the dataset with a pretrained VAD model.

Please feel free to use the labels/ folder and the pre-trained VAD model (only for inference) from this link .

5. Project usage

$ cd /path/to/project/voice_activity_detection/vad/

5.1 Dataset automatic labeling

Skip this subsection if you already have the labels/ folder, that contains annotations from a different pre-trained model.

$ python data_processing/librispeech_label_data.py --data-dir /path/to/LibriSpeech/test-clean/ --exported-model /path/to/pretrained/model/

This will record the annotations into /path/to/LibriSpeech/labels/ as .json files.

5.2 Record raw data to .tfrecord format

$ python data_processing/data_to_tfrecords.py --data-dir /path/to/LibriSpeech/

This will record the splitted data to .tfrecord format in /path/to/LibriSpeech/tfrecords/

5.3 Train a CNN to classify Speech & Noise signals

$ python training/train.py --data-dir /path/to/LibriSpeech/tfrecords/

5.4 Export trained model & run inference on Test set

$ python inference/export_model.py --model-dir /path/to/trained/model/dir/
$ python inference/inference.py --data-dir /path/to/LibriSpeech/ --exported-model /path/to/exported/model/ --smoothing

The trained model will be recorded in /path/to/LibriSpeech/tfrecords/models/resnet1d/. The exported model will be recorded inside this directory.

6. Todo

  • Compare Deep Learning model to a simple baseline
  • Train on full dataset
  • Improve data balancing
  • Add time series data augmentation
  • Study ROC curve & classification threshold
  • Add online inference
  • Evaluate quantitatively post-processing methods on the Test set
  • Add model description & training graphs
  • Add Google Colab demo

7. Resources

  • Voice Activity Detection for Voice User Interface, Medium
  • Deep learning for time series classifcation: a review, Fawaz et al., 2018, Arxiv
  • Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline, Wang et al., 2016, Arxiv

voice_activity_detection's People

Contributors

filippogiruzzi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.