johnmartinsson / bird-species-classification Goto Github PK
View Code? Open in Web Editor NEWUsing convolutional neural networks to build and train a bird species classifier on bird song data with corresponding species labels.
License: MIT License
Using convolutional neural networks to build and train a bird species classifier on bird song data with corresponding species labels.
License: MIT License
A test suite needs to be implemented to better assure that everything is working as expected.
I installed the bird module required for this program, but it doesn't seem to be working. Loader, preprocessing, and several other packages in bird seem to be missing/unable to be imported. The conflict seems to be that I installed the wrong bird package which was meant for denoising audio in an algorithmic process., but I can't seem to find the required bird package. How can I install the right bird package?
Implement a baseline convolutional neural network model.
"Figure 5 shows a visual representation of our neural network architecture. The network contains 5 convolutional layer, each followed by a max-pooling layer. We insert one dense layer before the final soft-max layer. The dense layer contains 1024 and the soft-max layer 1000 units, generating a probability for each class. We use batch normalization before every convolutional and before the dense layer. The convolutional layers use a rectify activation function. Drop-out is used on the input layer (probability 0.2), on the dense layer (probability 0.4) and on the soft-max layer (probability 0.4). As a cost function we use the single label categorical cross entropy function (in the log domain)."
Four times:
Fully Connected
Implement methods that given time-frequency data, will shift the data in the frequency domain.
The augmented data will be used to encourage pitch-shift invariance when training the neural network.
Implementation details:
"In a review of different augmentation methods [12] showed that pitch shifts (vertical shifts) also helped reducing the classification error. We found that, while a small shift (about 5%) seemed to help, a larger shift was not beneficial. Again we used a wrap-around method to preserve the complete information."
Explain how weight initialization works, and why it is important.
Default in Keras: Understanding the difficulty of training deep feedforward neural networks (Glotrot 2010)
Alternative explanation: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
Implement, train, and evaluate a tutorial network model in keras which solves the MNIST problem.
Can you mention the version of the pre-requisites on which the code works?
I can understand it will take some time and effort to update the repo reflect the changes in API.
I found 2 issues in trying to run the code out of box.
First, was in signal_processing.py where librosa.stft requires numpy array in float which I managed to resolve by just adding a line 'wave = wave.astype(float)'
Second, was found in conf.ini which didn't had the optimizer mentioned I believe 'sgd' is used.
Modify the convolutional neural network such that it learns residual functions, i.e., it will become a deep residual network.
Add a method which separates structured sound from noise.
Input: time-frequency data
Output: Intervals with noise, intervals with structure, intervals with neither.
Signal mask
Noise mask
Same as signal, but select pixels larger than 2.5 times the row/column median, and invert the vector at the end.
Everything else is considered to contain no relevant information. The use of dilation filters ensures that the number of generated intervals are kept to a minimum.
Modify the neural network to include identity mappings.
It seems that the current use of scipy.signal.spectrogram is not sufficient, and that information is lost. Correct this by a manual implementation of spectrogram computations, or by understanding the scipy.signal.spectrogram method completely so that the desired results can be achieved.
References:
Implement a deep residual neural network model.
Missing import matplotlib.pyplot as plt:
The code for data analysis uses plt for plotting but does not import it explicitly.
Please make sure to include import matplotlib.pyplot as plt at the beginning of the code.
Implement additive data augmentation methods. These will be used to encourage the network to see more combined species samples, and to see more variations of noise/structure.
Methods:
"We follow [14] and add sound files that correspond to the same class. Adding is a simple process because each sound file can be represented by a single vector. If one of the sound files is shorter than the other we repeat the shorter one as many times as it is necessary. After adding two sound files, we re-normalize the result to preserve the original maximum amplitude of the sound files. The operation describes the effect of multiple birds (of the same species) singing at the same time. Adding files improves convergence because the neural network sees more important patterns at once, we also found a slight increase in the accuracy of the system (see Table 1)." (Sprengel et al, 2016)
"One of the most important augmentation steps is to add background noise. In Section 2.1 we described how we split each file into a signal and noise part. For every signal sample we can choose an arbitrary noise sample (since the background noise should be independent of the class label) and add it on top of the original training sample at hand. As for combining same class audio files, this operation should be done in the time domain by adding both sound files and repeating the smaller one as often as necessary. We can even add multiple noise samples. In our test we found that three noise samples added on top of the signal, each with a dampening factor of 0.4 produces the best results. This means that, given enough training time, for a single training sample we eventually add every possible background noise which decreases the generalization error." (Sprengel et al, 2016)
xml_paths
should be replaced by xml_roots
in https://github.com/johnmartinsson/bird-species-classification/blob/master/preprocess_birdclef.py#L36.
Maybe you should run your code once to test it?
The naming of some of the arguments for the optparser can be confusing. Should go through and clean this up.
Implement a method which divides the spectrograms into equal chunks.
Implementation details:
Motivation:
python create_dataset.py --src_dir=<my_src_dir> --dst_dir=<my_dst_dir> --valid_percentage=20
copying sound classes...
splitting train/validation...
Traceback (most recent call last):
File "create_dataset.py", line 99, in <module>
main()
File "create_dataset.py", line 91, in main
replace=False)
File "mtrand.pyx", line 1161, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18155)
ValueError: Cannot take a larger sample than population when 'replace=False'
Evaluate the baseline classifier with respect to:
Evaluate the (improved) classifier with respect to:
Implement the multi-width frequency-delta data augmentation.
Implement a method which randomly shifts the time-frequency input data in the time domain.
The shifted time data will be used as additional samples when training the neural network in order to encourage time-shift invariance.
Implementation details:
"Every time we present the neural network with a training example, we shift it in time by a random amount. In terms of the spectrogram this means that we cut it into two parts and place the second part in front of the first (wrap around shifts). This creates a sharp corner where the end of the second part meets the beginning of the first part but all the information is preserved. With this augmentation we force the network to deal with irregularities in the spectrogram and also, more importantly, teach the network that bird songs/calls appear at any time, independent of the bird species."
So
$ python preprocess_birdclef.py --xml_dir=<path-to-xml-dir> \
--wav_dir=<path-to-wav-dir> \
--output_dir=<path-to-output-dir>
still use hardcoded path...
Input data should have zero mean and identity variance.
Note: mean, and variance should be computed from the training set ONLY, and then subtracted from the training, validation and test sets.
Possible issue may arise due to updates in "wave" file reading libraries
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.