Giter Club home page Giter Club logo

acoustic-simulator's Introduction

This package provides scripts used to degrade audio files and download/install the required data and code. 

===========
 CONTENTS
===========

The contents of this package are

  - README : This file

  - README-noise-db.txt : Instructions to download and install noise database

  - README-impulse-responses.txt : Instructions to download and install impulse response database

  - README-codecs.txt : Instructions to download and install audio codecs

  - README-file-lists.txt : Instructions on how to generate dev, train and test data sets for the evaluation of speaker recognition systems on degraded speech. This data set requires access to the NIST Speaker Recognition Evaluation (SRE) 2010 data set.

  - download-noise-db.py : Script for downloading the noise database (see README-noise-db.txt)

  - noise-db.txt : List of noise file ids, tag and license for the noise database

  - freesound.py : Python API to the freesound.org service (online audio repository)

  - prepare-impulse-responses.py : Prepare and normalize impulse the packaged and downloaded impulses

  - impulse-responses-original : Directory containing distributable impulse responses

  - degrade-audio-list-safe-random.py : Degrades an audio file

  - degrade-audio-safe-random.py : Degrades a list of audio files under pre-specified degradation conditions (landline, cellular, satellite, interview, playback) along with noisy variants

  - split-dev-train-test.py : Script to split the generated noise file list into dev, train and test data sets

  - train.list : List of ID, file name and gender for the training data set (taken from the NIST SRE 2010 data) 
  
  - test.list : List of ID, file name and gender for the test data set (taken from the NIST SRE 2010 data) 

  - random : List of integer random numbers used to generate random numbers in a reproducible way across machines

=============
 DESCRIPTION
=============

This acoustic simulator allows you to degrade clean audio recordings using a variety of algorithms and data. The simulator offers three main types of degradations:

  - Additive noise from a large collection of open-source real noise recordings of around 60 hours. The noise recordings have been manually assigned one out of the following categories: announcement, bable, crowd, impulsive, music, nature, outdoors, private, public, signaling, sport, transportation. The database must be downloaded from the www.freesound.org website using an API. Check the file README-noise-db.txt for details on how to download the data.

  - Open-source impulse responses sampled from real audio devices, loudspeakers, cabinets and smartphones on one side and rooms on the other side. 74 and 54 impulse responses are provided for devices and rooms respectively. The impulse responses are stored in .wav PCM format and have been manually categorized into directories under impulse-response-original. For the simulator to use these impulse responses they must be normalized and downsampled to 8kHz and 16kHz. Check README-impulse-responses.txt to generate these files as well as download missing files that could not be distributed as part of this package.

  - Speech and audio codecs, a total of 14 codecs for cellular and satellite telephony, voice over IP, and audio recorders. For licensing issues, the code for these codecs is not provided in this package, but download URLs together with code changes and compile instruccions are provided in README-codecs.txt . These should be stored the src directory. The supported codecs are ITU G.711, ITU G.726, ITU G.722, ITU G.728, ITU G.729a, ETSI AMR-NB, ETSI AMR-WB, ETSI GSM-FR, CVSD, Codec2, Skype SILK, Skype SILK-WB, Fraunhofer MP3, Fraunhofer AAC. Telephony band-pass filters G.712, P341, IRS and MIRS can be used in combination with narrow band codecs as well.

  - Noise reduction based on Wiener filtering (Qualcomm-ICSI-OGI) and amplitude normalization is also supported

This package also includes a data set based on the NIST SRE 2010 data (not included) for the evaluation of speaker verification systems. Please read README-file-lists.txt for more information about how to corrupt these audio files so that results are comparable across research labs.

=============
 BUG FIXES
=============

v0.2:

  - Fixed random number generator to choose from all elements in a list

  - Removed medium sized room impulse responses from interview condition. Only small room IR are now used.


=============
 CONTACT
=============

  Please report any issues to [email protected] .

acoustic-simulator's People

Contributors

mferras avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.