Giter Club home page Giter Club logo

ml_audify's Introduction

Audify - The Audio Classifier

What is Audify?

Audify - An Audio Classification project built by an Artificial Neural Network model that accurately categorized audio samples based on their content.

Why is Audify Unique? How does it help the real world?

Sound monitoring -> By accurately classifying urban sounds, such as car horns, sirens, and jackhammers, the audio classification model can be used in real-time sound monitoring applications. It can assist city planners, environmental agencies, and policymakers in understanding noise patterns, identifying areas with excessive noise levels, and implementing measures to mitigate noise pollution. Public Safety and Security -> Ability to classify audio signals in real-time can contribute to public safety and security. For example, the audio classification model can be integrated into surveillance systems to automatically detect and recognize critical sounds like gunshots or alarms.

Table of Contents

Dataset Link

Link to dataset

The dataset consists of 8732 audio files in WAV format, the dataset includes 10 low-level classes and a number of files respectively

WhatsApp Image 2023-04-30 at 20 06 53

Tech Stack

Programming Language: Python

Python Libraries used

  • Numpy
  • Pandas
  • Scikit
  • Keras
  • TensorFlow
  • Librosa
  • Flask
  • Pickle

The project's main topics includes :

  • Audio preprocessing
  • Audio classification
  • Audio feature extraction
  • Deep Learning - model building

Project Description

Feature Extraction

In this project the feature considered is MFCC (Mel Frequency Cepstral Coefficients), MFCCs are a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel frequency scale. The Mel frequency scale is a perceptual scale of frequency that is based on the human ear's response to sound. The power spectrum of the sound is divided into a number of frequency bands that are equally spaced on the Mel scale, and the energy in each band is then summed and logarithmically compressed. This log-compressed spectrum is then transformed using a Discrete Cosine Transform (DCT) to produce the MFCCs. MFCC is one of the commonly used features that has been used in a variety of applications, especially in voice signal processing such as speaker recognition, voice recognition, and gender identification

The following code is used to extract the MFCC value from each file

WhatsApp Image 2023-04-30 at 20 14 26

the extracted MFCC values are made into Data Frame

WhatsApp Image 2023-04-30 at 20 16 36

Data Splitting

The created data set is split into 70-30 for Training and Testing of the Model

WhatsApp Image 2023-04-30 at 20 33 29

The Shape of the Training and Testing Dataset

WhatsApp Image 2023-04-30 at 20 34 15

Deep Neural Network Model

In the project, we built a fully connected network with an input layer,2 hidden layers, and an output layer using sequential API. The first layer consists of 100 neurons and takes the input of 40 features with a dropout of 50% and uses RELU activation function. The second layer consists of 200 neurons with a dropout of 50% and uses RELU activation function to extract more complex features than the first layer. The third layer is a dense layer with 100 neurons and a ReLU activation function, again extracting more complex features. the last layer is another dense layer with a number of output classes (10) neurons and a softmax activation function.

WhatsApp Image 2023-04-30 at 20 47 56

Training the Model

We have trained the model for 100 epochs with a batch size of 50

WhatsApp Image 2023-05-01 at 12 12 19

Result

From training and testing the model we obtained the accuracy of

WhatsApp Image 2023-05-01 at 12 20 47

Comparison with previous works

The following paper: Salamon, Justin, Christopher Jacoby, and Juan Pablo Bello. "A dataset and taxonomy for urban sound research." Proceedings of the 22nd ACM international conference on Multimedia. 2014.

Link to paper

The research paper comes to a conclusion with the use of SVM and Random Forest model having high accuracy of approximately 73%, From the Neural Network Architecture we used in the project we got a improved accuracy of 83.81 % and 78.28 % for training and testing respectively.

Front End of the Project

We used the Flask web framework of Python for the front end and the pickle module for loading the model.

WhatsApp Image 2023-05-01 at 13 17 55

By uploading the sample audio file, the model classifies the sample and gives this result on the result page by highlighting the predicted class.

WhatsApp Image 2023-05-01 at 13 20 16

ml_audify's People

Contributors

hithesh-mr avatar shraddhavp avatar

Watchers

 avatar

Forkers

hithesh-mr

ml_audify's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.