WildWav - Bird Sound Identifier

Click here to go to the website

Ever heard distinct bird sounds in the wild or around you and wondered which bird it is? WildWav is developed precisely to identify our flappy friend!

WildWav is a web application that analyzes the bird's sound using audio recording captured by the user (or browse a WAV file) to predict the type of bird.

Summary: The Bird Sound Identifier is a personal project of mine with my friend Vaish to explore the possibilities of machine learning. Also, to bring an idea to life and make sure it is usable by everyone by deploying the model on the Heroku server. For this project, we used Flask API to create the app and Python for modeling. Vaish did all the web designing using HTML. The app uses a recorder.js javascript file that we found here.

Functionality: The web app has an interface with two options:

Either record to capture the bird's sound using the device's microphone OR
Browse file option to choose an audio file with a WAV format.

If you choose to RECORD and capture the bird's sound:

Place your device where the bird's sound is most audible.
Click the 'RECORD' button. The analyzer box should display the audio waveform, indicating that the recording has initiated.
Wait for a few seconds (maximum 10-12 seconds) and click the 'STOP' button.
Click the 'PREDICT' button and wait until the application processes the recording.
The type of bird will be displayed along with an image.
Click on the bird image for more information about the bird.
Click 'REFRESH' to start over.

If you choose to BROWSE and upload the bird-sound file (.wav format):

Click the 'BROWSE FILE' button.
Select the bird-sound file (.wav) and click open.
Click the 'PREDICT' button and wait till the file is uploaded.
The type of bird will be displayed along with an image.
Click on the bird image for more information about the bird.
Click 'REFRESH' to start over.

Design: In terms of the application, the app can work on both computer and android devices. The Chrome browser has the best visual effects. The display automatically resizes to fit the device screen best. The app currently does not work on iOS devices due to security reasons.

Languages and Technologies used:

Python
Flask API
HTML
JavaScript
Heroku

You can check out the website here, or you can keep reading to understand more about how we built this app from getting the data to modeling and deployment process.

Data collection

Data collection for this type of problem can be difficult. Fortunately, there are several websites to collect crowdsourced audio files.

YouTube has several videos of bird songs of varying length in time. We downloaded the video and converted them to 5-6 seconds length audio files in WAV format. Totally 531 samples of four birds (Cardinal, Mourning dove, Pigeon, and Blue Jay) were prepared with 133 audio samples for each bird.

Bird	Number of audio samples	Method of collection
Cardinal	133	YouTube video to WAV conversion
Mourning Dove	133	YouTube & manually created recordings
Pigeon	133	YouTube video to WAV conversion
Blue Jay	132	YouTube video to WAV conversion

Data preprocessing

Audio processing is one of the most challenging tasks there exists. Audio has so many features. The challenge is to understand which feature to decide as the best feature for the model.

There are many types of features from an audio. Some are-

For this model, we use MFCC feature. Librosa library from Python can extract MFCCs easily. It can obtained with only one line of code as shown below.

mfcc_=[]
# Get the path for file
path = r"tmp/audio.wav"
# Load the file
X, sample_rate = librosa.load(path, sr = 44100, res_type='kaiser_best')
# Insert MFCC features in mfcc_
mfcc_.append(np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13).T,axis=0))

A little bit about FFT, STFT, and MFCC

Fast Fourier Transform (FFT):

Fourier transform basically transforms a waveform which contains time domain values to frequency domain.

A waveform is a time domain signal with time at its x-axis and amplitude at its y-axis.

Fourier transform removes the time domain and has frequency domains and amiplitude of each frequency bin is plotted below. FFT is a snapshot of the entire audio in one single view.

There is a problem with FFT. The FFT gives frequency in x-axis and amplitude in y-axis but loses the time feature. If the audio changes with respect to time, we will not be able to see it FFT. To overcome this, there is Short Time Fourier Transform (STFT)

Short Time Fourier Transform (STFT):

STFT essentially does FFT for smaller period of time window instead of the entire length of the song. STFT is a collection of these FFTs for each time window selected. The STFT shows the variation of frequency over time. In the below example the signal has two frequency throughout the entire length of the audio. This corresponds to the FFT plot we saw above.

Mel Frequency Cepstral Coefficients (MFCC):

MFCCs are essentially expressing how humans hear in coefficients with number of coefficients ranging from 1-40. The algorithm has high resolution on low frequency and reduces resolution as the frequency increases. This means that higher the frequency, the algorithm uses mel scale and groups the bunch of frequencies together and gives coefficients to each indicating the value of weight for that frequency bin.

Model buidling

For modeling we initially used SVM. But the model was not predicting with good accuracy.

We then moved to neural netwrork and built a MLP Classifier. The neural network model used is extremely simple with just one hidden layer. Below is the architecture of the neural network.

Deployment

Deployment was done on Heroku with the help of Flask API. The recorder interface was obtained using this repo

Prediction

The model is trained to detect three birds (Cardinal, Mourning Dove, Pigeon, and Blue Jay). More birds will be added soon.

nicklu666 / wildwav Goto Github PK

wildwav's Introduction

WildWav - Bird Sound Identifier

Data collection

Data preprocessing

A little bit about FFT, STFT, and MFCC

Fast Fourier Transform (FFT):

Short Time Fourier Transform (STFT):

Mel Frequency Cepstral Coefficients (MFCC):

Model buidling

Deployment

Prediction

wildwav's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent