Giter Club home page Giter Club logo

urban-sound-classification's Introduction

Urban-Sound-Classification

References

Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal processing letters, 24(3), 279-283. https://doi.org/10.1109/LSP.2017.2657381

Dataset

The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.All excerpts are taken from field recordings uploaded to www.freesound.org.
8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).
The UrbanSound8k dataset used for model training, can be downloaded from the following link: https://urbansounddataset.weebly.com/

Directory Structure

Urban_data_preprocess.ipynb: Pre-processing data and also augmenting data

Urban_nn_model.ipynb: Running 10 fold cross val on original data using simple NN

Urban_cnn_model.ipynb: Running 10 fold cross val on original and augmented data using CNN

Urban_data_generator.ipynb: Contains code for a data-generator that can be used for training with augmented data using CNN

Results

10 Fold Cross Val Accuracy for NN using original data: 57.43%
10 Fold Cross Val Accuracy for CNN using original data: 62.61%
10 Fold Cross Val Accuracy for CNN using augmented data: 63.90%

Future Work

Extend data more by using different parameters for augmentation
Apply Hyperparameter optimization and test different architectures

Features Extracted

Librosa was used for data preprocessing and feature extraction.

MEL Features

MFCC

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").

MFCC of a dog bark

image

Melspectrogram

A mel-scaled spectrogram.

Melspectrogram of a dog bark

image

Chroma Features

In music, the term chroma feature or chromagram closely relates to the twelve different pitch classes. Chroma-based features, which are also referred to as "pitch class profiles", are a powerful tool for analyzing music whose pitches can be meaningfully categorized (often into twelve categories) and whose tuning approximates to the equal-tempered scale. One main property of chroma features is that they capture harmonic and melodic characteristics of music, while being robust to changes in timbre and instrumentation.

Chroma_stft

A chromagram from a waveform or power spectrogram.

Chromagram of a dog bark

image

Chroma_cqt

Constant-Q chromagram.

Constant-Q chromagram of a dog bark

image

Chroma_cens

The chroma variant “Chroma Energy Normalized” (CENS).

Chroma cens of a dog bark

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.