Giter Club home page Giter Club logo

bird-species-classification's People

Contributors

aljoh avatar johnmartinsson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

bird-species-classification's Issues

Write tests

A test suite needs to be implemented to better assure that everything is working as expected.

Bird Package

I installed the bird module required for this program, but it doesn't seem to be working. Loader, preprocessing, and several other packages in bird seem to be missing/unable to be imported. The conflict seems to be that I installed the wrong bird package which was meant for denoising audio in an algorithmic process., but I can't seem to find the required bird package. How can I install the right bird package?

Baseline Network Model

Implement a baseline convolutional neural network model.

"Figure 5 shows a visual representation of our neural network architecture. The network contains 5 convolutional layer, each followed by a max-pooling layer. We insert one dense layer before the final soft-max layer. The dense layer contains 1024 and the soft-max layer 1000 units, generating a probability for each class. We use batch normalization before every convolutional and before the dense layer. The convolutional layers use a rectify activation function. Drop-out is used on the input layer (probability 0.2), on the dense layer (probability 0.4) and on the soft-max layer (probability 0.4). As a cost function we use the single label categorical cross entropy function (in the log domain)."

Architecture

  • Dropout 20%
  • BachNormalization
  • Convolution with 64 5x5 Kernels Stride Size 2x1
  • ReLU Activation
  • MaxPooling with 2x2 Kernels Stride Size 2x2

Four times:

  • BachNormalization
  • Convolution Num. Filters = 64, 128, 256, 256
  • Convolution Kernel Sizes = 5x5, 5x5, 5x5, 3x3
  • Convolution Stride Size = 1x1
  • ReLU Activation
  • MaxPooling with 2x2 Kernels and Stride Size 2x2

Fully Connected

  • Dropout(40%)
  • Dense Layer with 1024 units
  • Dropout(40%)
  • SoftMax Layer with 19 units

Pitch Shift Data Augmentation

Implement methods that given time-frequency data, will shift the data in the frequency domain.

The augmented data will be used to encourage pitch-shift invariance when training the neural network.

Implementation details:

  1. small shift of around 5%
  2. wrap-around to preserve complete information

Sprengel et al, 2016

"In a review of different augmentation methods [12] showed that pitch shifts (vertical shifts) also helped reducing the classification error. We found that, while a small shift (about 5%) seemed to help, a larger shift was not beneficial. Again we used a wrap-around method to preserve the complete information."

Thesis Methods

  • Deep Residual Neural Networks
  • Multiple-Width Frequency-Delta Data

Tutorial Model

Implement, train, and evaluate a tutorial network model in keras which solves the MNIST problem.

Version of the pre-requisites

Can you mention the version of the pre-requisites on which the code works?

I can understand it will take some time and effort to update the repo reflect the changes in API.

I found 2 issues in trying to run the code out of box.

First, was in signal_processing.py where librosa.stft requires numpy array in float which I managed to resolve by just adding a line 'wave = wave.astype(float)'

Second, was found in conf.ini which didn't had the optimizer mentioned I believe 'sgd' is used.

Residual Functions

Modify the convolutional neural network such that it learns residual functions, i.e., it will become a deep residual network.

Signal / Noise Separation

Add a method which separates structured sound from noise.

Input: time-frequency data
Output: Intervals with noise, intervals with structure, intervals with neither.

Implementation details:

Signal mask

  • compute the spectrogram of the whole wave file
    • pass signal through STFT, using Hanning window functino (size 512, 75% overlap).
    • normalize: divide every element by the maximum value, s.t., all values in [0, 1].
  • select all pixels in the spectrogram that are three times bigger than the row median, and three times bigger than the column media. Set these pixels to 1, all other to 0.
  • apply a binary erosion and dilation filter, 4 by 4 filter produced best results.
  • create indicator vector with as many elements as there are columns in spectrogram
    • set i:th element in indicator to 1 if i:th column contains at least one 1, otherwise set to 0
    • smooth indicator vector by applying two more binary dilation filters (4 by 1).
  • scale indicator vector to the length of the original sound file.
  • use scaled indicator vector as mask to extract signal.

Noise mask
Same as signal, but select pixels larger than 2.5 times the row/column median, and invert the vector at the end.

Everything else is considered to contain no relevant information. The use of dilation filters ensures that the number of generated intervals are kept to a minimum.

Compute Spectrogram Correctly

It seems that the current use of scipy.signal.spectrogram is not sufficient, and that information is lost. Correct this by a manual implementation of spectrogram computations, or by understanding the scipy.signal.spectrogram method completely so that the desired results can be achieved.

References:

Thesis Previous Work

  • Previous Challenges
    • MLSP 2013
    • NIPS4B 2013
    • BirdCLEF 2014-2016
  • Convolutional Neural Networks for Acoustic Bird Classification

Missing import matplotlib.pyplot as plt

Missing import matplotlib.pyplot as plt:

The code for data analysis uses plt for plotting but does not import it explicitly.
Please make sure to include import matplotlib.pyplot as plt at the beginning of the code.

Same Class and Noise Addition

Implement additive data augmentation methods. These will be used to encourage the network to see more combined species samples, and to see more variations of noise/structure.

Methods:

  • load multiple random noise samples
  • load multiple random signal samples of the same class
  • additively combine multiple samples of time-frequency data

Same Class

"We follow [14] and add sound files that correspond to the same class. Adding is a simple process because each sound file can be represented by a single vector. If one of the sound files is shorter than the other we repeat the shorter one as many times as it is necessary. After adding two sound files, we re-normalize the result to preserve the original maximum amplitude of the sound files. The operation describes the effect of multiple birds (of the same species) singing at the same time. Adding files improves convergence because the neural network sees more important patterns at once, we also found a slight increase in the accuracy of the system (see Table 1)." (Sprengel et al, 2016)

Adding Noise

"One of the most important augmentation steps is to add background noise. In Section 2.1 we described how we split each file into a signal and noise part. For every signal sample we can choose an arbitrary noise sample (since the background noise should be independent of the class label) and add it on top of the original training sample at hand. As for combining same class audio files, this operation should be done in the time domain by adding both sound files and repeating the smaller one as often as necessary. We can even add multiple noise samples. In our test we found that three noise samples added on top of the signal, each with a dampening factor of 0.4 produces the best results. This means that, given enough training time, for a single training sample we eventually add every possible background noise which decreases the generalization error." (Sprengel et al, 2016)

Divide Spectrograms into Chunks

Implement a method which divides the spectrograms into equal chunks.

Implementation details:

  1. Split spectrogram into chunks of equal size (length 512).

Motivation:

  1. Need fixed sized input for the neural network architecture.
    • Allow to pad only the last part, and keep step size constant
  2. Each chunk can be used as a unique sample for training (since "empty" parts have been removed)
  3. Network can make multiple predictions per sound file, and average them to generate a final prediction.

ValueError in create_dataset.py

python create_dataset.py --src_dir=<my_src_dir> --dst_dir=<my_dst_dir> --valid_percentage=20
copying sound classes...
splitting train/validation...
Traceback (most recent call last):
  File "create_dataset.py", line 99, in <module>
    main()
  File "create_dataset.py", line 91, in main
    replace=False)
  File "mtrand.pyx", line 1161, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18155)
ValueError: Cannot take a larger sample than population when 'replace=False'

Baseline Evaluation

Evaluate the baseline classifier with respect to:

  • Area Under Curve (AUC)
  • Mean Average Precision (MAP)

Time Shift Data Augmentation

Implement a method which randomly shifts the time-frequency input data in the time domain.

The shifted time data will be used as additional samples when training the neural network in order to encourage time-shift invariance.

Implementation details:

  1. split spectrogram in two parts (at random)
  2. place the second part in front of the first

Sprengel et al, 2016

"Every time we present the neural network with a training example, we shift it in time by a random amount. In terms of the spectrogram this means that we cut it into two parts and place the second part in front of the first (wrap around shifts). This creates a sharp corner where the end of the second part meets the beginning of the first part but all the information is preserved. With this augmentation we force the network to deal with irregularities in the spectrogram and also, more importantly, teach the network that bird songs/calls appear at any time, independent of the bird species."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.