Audify - The Audio Classifier

What is Audify?

Audify - An Audio Classification project built by an Artificial Neural Network model that accurately categorized audio samples based on their content.

Why is Audify Unique? How does it help the real world?

Sound monitoring -> By accurately classifying urban sounds, such as car horns, sirens, and jackhammers, the audio classification model can be used in real-time sound monitoring applications. It can assist city planners, environmental agencies, and policymakers in understanding noise patterns, identifying areas with excessive noise levels, and implementing measures to mitigate noise pollution. Public Safety and Security -> Ability to classify audio signals in real-time can contribute to public safety and security. For example, the audio classification model can be integrated into surveillance systems to automatically detect and recognize critical sounds like gunshots or alarms.

Dataset Link

Link to dataset

The dataset consists of 8732 audio files in WAV format, the dataset includes 10 low-level classes and a number of files respectively

Tech Stack

Programming Language: Python

Python Libraries used

Numpy
Pandas
Scikit
Keras
TensorFlow
Librosa
Flask
Pickle

The project's main topics includes :

Audio preprocessing
Audio classification
Audio feature extraction
Deep Learning - model building

Project Description

Feature Extraction

In this project the feature considered is MFCC (Mel Frequency Cepstral Coefficients), MFCCs are a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel frequency scale. The Mel frequency scale is a perceptual scale of frequency that is based on the human ear's response to sound. The power spectrum of the sound is divided into a number of frequency bands that are equally spaced on the Mel scale, and the energy in each band is then summed and logarithmically compressed. This log-compressed spectrum is then transformed using a Discrete Cosine Transform (DCT) to produce the MFCCs. MFCC is one of the commonly used features that has been used in a variety of applications, especially in voice signal processing such as speaker recognition, voice recognition, and gender identification

The following code is used to extract the MFCC value from each file

the extracted MFCC values are made into Data Frame

Data Splitting

The created data set is split into 70-30 for Training and Testing of the Model

The Shape of the Training and Testing Dataset

Deep Neural Network Model

In the project, we built a fully connected network with an input layer,2 hidden layers, and an output layer using sequential API. The first layer consists of 100 neurons and takes the input of 40 features with a dropout of 50% and uses RELU activation function. The second layer consists of 200 neurons with a dropout of 50% and uses RELU activation function to extract more complex features than the first layer. The third layer is a dense layer with 100 neurons and a ReLU activation function, again extracting more complex features. the last layer is another dense layer with a number of output classes (10) neurons and a softmax activation function.

Training the Model

We have trained the model for 100 epochs with a batch size of 50

Result

From training and testing the model we obtained the accuracy of

Comparison with previous works

The following paper: Salamon, Justin, Christopher Jacoby, and Juan Pablo Bello. "A dataset and taxonomy for urban sound research." Proceedings of the 22nd ACM international conference on Multimedia. 2014.

Link to paper

The research paper comes to a conclusion with the use of SVM and Random Forest model having high accuracy of approximately 73%, From the Neural Network Architecture we used in the project we got a improved accuracy of 83.81 % and 78.28 % for training and testing respectively.

Front End of the Project

We used the Flask web framework of Python for the front end and the pickle module for loading the model.

By uploading the sample audio file, the model classifies the sample and gives this result on the result page by highlighting the predicted class.

shraddhavp / ml_audify Goto Github PK

ml_audify's Introduction

Audify - The Audio Classifier

Table of Contents

Dataset Link

Tech Stack

Programming Language: Python

Python Libraries used

The project's main topics includes :

Project Description

Feature Extraction

Data Splitting

Deep Neural Network Model

Training the Model

Result

Comparison with previous works

Front End of the Project

ml_audify's People

Contributors

Watchers

Forkers

ml_audify's Issues

Recommend Projects

Recommend Topics

Recommend Org