Giter Club home page Giter Club logo

lstm-music-genre-classification's Introduction

Music Genre Classification with LSTMs

  • Classify music files based on genre from the GTZAN music corpus
  • GTZAN corpus is included for easy of use
  • Use multiple layers of LSTM Recurrent Neural Nets
  • Implementations in PyTorch, PyTorch-Lightning, Keras

Test trained LSTM model

In the ./weights/ you can find trained model weights and model architecture.

To test the model on your custom audio file, run

 python3 predict_example.py path/to/custom/file.mp3

or to test the model on our custom files, run

 python3 predict_example.py audio/classical_music.mp3

Audio features extracted

Dependencies

Ideas for improving accuracy:

  • GTZAN dataset has problems, how do we use it with consideration?
  • Normalize MFCCs & other input features (Recurrent BatchNorm?)
  • Decay learning rate
  • How are we initing the weights?
  • Better optimization hyperparameters (too little dropout)
  • Do you have avoidable bias? How's your variance?

Accuracy

At Epoch 400, training on a TITAN X GPU (October 2017):

Loss Accuracy
Training 0.5801 0.7810
Validation 0.734523485104 0.766666688025
Testing 0.900845060746 0.683333342274

At Epoch 400, training on a 2018 Macbook Pro CPU (May 2019):

Loss Accuracy
Training 0.3486 0.8738
Validation 1.028421084086 0.700000017881
Testing 1.209656755129 0.683333347241

lstm-music-genre-classification's People

Contributors

abhishek-mvs avatar ch4ndelier avatar nazarponochevnyi avatar ruohoruotsi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lstm-music-genre-classification's Issues

one question in feature extraction

Could you please let me know about
data[i, :, 0:13] = mfcc.T[0:self.timeseries_length, :]
data[i, :, 13:14] = spectral_center.T[0:self.timeseries_length, :]
data[i, :, 14:26] = chroma.T[0:self.timeseries_length, :]
data[i, :, 26:33] = spectral_contrast.T[0:self.timeseries_length, :]
I don't know what is 0:13?
and 13:14?
thanks

Problem with pytorch

in lstm_genre_classifier_pytorch.py file
model.hidden = model.init_hidden() actually does not work.
Beacause model has no attribute 'hidden'
if you print model.hidden in every batch,you will find it's always [[0,0,...]]
model.hidden has no relation with the actual model. It's just something you define every epoch.
So it's always a stateful LSTM.

Validate accuracy and hyperparams for CPU training

Accuracy figures for training/dev/test set in the README were generated from a run with an older version of TF (1.2) on a GPU.

Running the project freshly on a CPU, the following figures were obtained, which gives me pause. Investigate. Training accuracy is waay up and dev/validation is down significantly.

What are the (extra) implications to training on the CPU vs GPU?

420/420 [==============================] - 3s 6ms/step - loss: 0.3547 - acc: 0.8667

Validating ...
120/120 [==============================] - 0s 4ms/step
Dev loss:   1.0411598483721416
Dev accuracy:   0.6666666865348816

Testing ...
60/60 [==============================] - 0s 1ms/step
Test loss:   1.1342438459396362
Test accuracy:   0.6000000089406967```

A question about the features.

Thank you for your contribution. I noticed that you used four audio features in the calculation of audio features. can you tell me why you used these four features?

Is the dataset shuffling enough?

When i get all the data together, and reshuffle the dataset. I get the better accuracy in LSTM.
this is the code:

from sklearn.model_selection import train_test_split
X = torch.concatenate((train_X,test_X,dev_X),axis=0)
y = torch.concatenate((train_Y,test_Y,dev_Y),axis=0)
train_X, test_X, train_Y, test_Y = train_test_split(X, y, test_size=0.2, random_state=42)

How to generate model .json file?

I am training a different audio dataset on your code.
For predicting the results, I have replaced model_weights.h5 with lstm_genre_classifier_lstm.h5 (generated during lstm training) but need to know how to generate model .json file for my data?
predict_example.py

MODEL = load_model("./weights/model.json", "./weights/model_weights.h5")

Also, in case I want to change the number of layers and the parameters associated with them, how to go about it?

[CLEAN] up documentation

  • Clean up documentation (add code to train - oneliner)
  • Add graphs and or Tensorboard plots from training progress
  • what else is useful to make this code approachable?

Can we use .wav files?

I am trying to train my dataset containing .wav files but it throws errors when I run the
lstm_genre_classifier_keras.py file
error:
self.progbar.update(self.seen, self.log_values)
AttributeError: 'ProgbarLogger' object has no attribute 'log_values'

I have changed the line 151 in GenreFeatureData.py
from:
if file.endswith(".au"):
to
if file.endswith(".wav"):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.