Light

ruohoruotsi / lstm-music-genre-classification Goto Github PK

View Code? Open in Web Editor NEW

296.0 15.0 63.0 727.66 MB

Music genre classification with LSTM Recurrent Neural Nets in Keras & PyTorch

License: MIT License

Python 100.00%

lstm music genre classification rnn gtzan-dataset music-genre-classification audio-features-extracted keras pytorch

lstm-music-genre-classification's Introduction

Music Genre Classification with LSTMs

Classify music files based on genre from the GTZAN music corpus
GTZAN corpus is included for easy of use
Use multiple layers of LSTM Recurrent Neural Nets
Implementations in PyTorch, PyTorch-Lightning, Keras

Test trained LSTM model

In the ./weights/ you can find trained model weights and model architecture.

To test the model on your custom audio file, run

 python3 predict_example.py path/to/custom/file.mp3

or to test the model on our custom files, run

 python3 predict_example.py audio/classical_music.mp3

Audio features extracted

Dependencies

Python3
numpy
librosa → for audio feature extraction
Keras
- pip install keras
PyTorch
- pip install torch torchvision
- brew install libomp

Ideas for improving accuracy:

GTZAN dataset has problems, how do we use it with consideration?
Normalize MFCCs & other input features (Recurrent BatchNorm?)
Decay learning rate
How are we initing the weights?
Better optimization hyperparameters (too little dropout)
Do you have avoidable bias? How's your variance?

Accuracy

At Epoch 400, training on a TITAN X GPU (October 2017):

	Loss	Accuracy
Training	`0.5801`	`0.7810`
Validation	`0.734523485104`	`0.766666688025`
Testing	`0.900845060746`	`0.683333342274`

At Epoch 400, training on a 2018 Macbook Pro CPU (May 2019):

	Loss	Accuracy
Training	`0.3486`	`0.8738`
Validation	`1.028421084086`	`0.700000017881`
Testing	`1.209656755129`	`0.683333347241`

lstm-music-genre-classification's People

Contributors

Stargazers

Watchers

lstm-music-genre-classification's Issues

one question in feature extraction

Could you please let me know about
data[i, :, 0:13] = mfcc.T[0:self.timeseries_length, :]
data[i, :, 13:14] = spectral_center.T[0:self.timeseries_length, :]
data[i, :, 14:26] = chroma.T[0:self.timeseries_length, :]
data[i, :, 26:33] = spectral_contrast.T[0:self.timeseries_length, :]
I don't know what is 0:13?
and 13:14?
thanks

Problem with pytorch

in lstm_genre_classifier_pytorch.py file
model.hidden = model.init_hidden() actually does not work.
Beacause model has no attribute 'hidden'
if you print model.hidden in every batch,you will find it's always [[0,0,...]]
model.hidden has no relation with the actual model. It's just something you define every epoch.
So it's always a stateful LSTM.

Validate accuracy and hyperparams for CPU training

Accuracy figures for training/dev/test set in the README were generated from a run with an older version of TF (1.2) on a GPU.

Running the project freshly on a CPU, the following figures were obtained, which gives me pause. Investigate. Training accuracy is waay up and dev/validation is down significantly.

What are the (extra) implications to training on the CPU vs GPU?

420/420 [==============================] - 3s 6ms/step - loss: 0.3547 - acc: 0.8667

Validating ...
120/120 [==============================] - 0s 4ms/step
Dev loss:   1.0411598483721416
Dev accuracy:   0.6666666865348816

Testing ...
60/60 [==============================] - 0s 1ms/step
Test loss:   1.1342438459396362
Test accuracy:   0.6000000089406967```

[ADD] Tensorboard support

This repo was created to facilitate learning how a simple LSTM model works on a small dataset (gtzan) solving a well understood audio/music problem (music-genre).

Add Tensorboard support to help visualize the "learning" process, debug, and optimize the code: https://www.tensorflow.org/guide/summaries_and_tensorboard

Get the list of expected and predicted value?

Is there a way to get the list (.csv) containing the actual value and the predicted value of the output. Here we only get the loss and accuracy?

A question about the features.

Thank you for your contribution. I noticed that you used four audio features in the calculation of audio features. can you tell me why you used these four features?

Add check to only generate feature data it is absent or forced

Don't regenerate these files unless there's a force flag or they don't exist.

	data_test_input.npy
	data_test_target.npy
	data_train_input.npy
	data_train_target.npy
	data_validation_input.npy
	data_validation_target.npy

Is the dataset shuffling enough?

When i get all the data together, and reshuffle the dataset. I get the better accuracy in LSTM.
this is the code:

from sklearn.model_selection import train_test_split
X = torch.concatenate((train_X,test_X,dev_X),axis=0)
y = torch.concatenate((train_Y,test_Y,dev_Y),axis=0)
train_X, test_X, train_Y, test_Y = train_test_split(X, y, test_size=0.2, random_state=42)

How to generate model .json file?

I am training a different audio dataset on your code.
For predicting the results, I have replaced model_weights.h5 with lstm_genre_classifier_lstm.h5 (generated during lstm training) but need to know how to generate model .json file for my data?
predict_example.py

LSTM-Music-Genre-Classification/predict_example.py

Line 54 in 5f235f6

MODEL = load_model("./weights/model.json", "./weights/model_weights.h5")

Also, in case I want to change the number of layers and the parameters associated with them, how to go about it?

some questions about generating the npy data of gtzan dataset

[CLEAN] up documentation

Clean up documentation (add code to train - oneliner)
Add graphs and or Tensorboard plots from training progress
what else is useful to make this code approachable?

Can we use .wav files?

I am trying to train my dataset containing .wav files but it throws errors when I run the
lstm_genre_classifier_keras.py file
error:
self.progbar.update(self.seen, self.log_values)
AttributeError: 'ProgbarLogger' object has no attribute 'log_values'

I have changed the line 151 in GenreFeatureData.py
from:
if file.endswith(".au"):
to
if file.endswith(".wav"):

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.