Giter Club home page Giter Club logo

deep-neural-networks-for-piano-music-transcription's Introduction

Deep Neural Networks for Piano Music Transcription

DT2119 Speech and Speaker Recognition course's project at KTH Royal Institute of Technology.

Author: Diego González Morín.

Report : Deep Neural Networks for Piano Music Transcription Poster: Poster Session

Abstract

In this project the main approaches of Automatic Music Transcription using Neural Networks are reviewed and explained. As an experimental study to these approaches, different models of Neural Networks are proposed to be compared for the application of Polyphonic Piano Music Transcription. This experimentation is first focused on the dataset preprocessing and alignment and is continued by a empirical comparison between Deep Neural Networks and Long Short-Term Memory (LSTM) Networks performances. The objective of the current project is to serve as a first step for future Neural Network design and optimization for Automatic Music Transcription by enclosing the best combination of methods and parameters for this particular task.

Introduction

The main inspiration for the current work was got from the approach proposed by Sigtia in 2015. Several Neural networks architecture were proposed for the task of Automatic Polyphonic Piano Transcription, including DNN, RNN and CNNs. However, LSTM were no tested and the main objective of the current project was to compare the performance of difference LSTM with simple DNNs in the task of Automatic Music Transcription (AMT).

Dataset

MIDI Aligned Piano Music was used as the main dataset. For each audio WAV file, the correspondent MIDI file and a text file with pitches transcription was included. To train the network, the 270 classic piano pieces included in the dataset were used, splitting them first in Training, Validation and Test sets.

Data Preprocessing

The audio signals were downsampled from 44.1kHz to 16kHz to reduce the amount of data. Then, the Constant Q Transform was applied and the extracted features were normalized and the training mean, substracted from all the sets.

Experiments

The main experiment's goal was, as it was mentioned previously, to compare DNNs with LSTM networks. 8 Networks were build and trained in total :

  • 4 DNNs: {1,2,3,4} layers and 256 units.
  • 4 LSTMs: {1,2,3,4} layers and 256 units.

Training

Training

Networks were built and trained using Keras with Tensor Flow backend using the following parameters:

  • Adam optimizers
  • 20% Dropout
  • Early stopping with validation data

Results

The best results are shown in the following table:

Model Size Test Set F-measure Accuracy F-measure Accuracy
DNN 3L Set 1 69.36% 53.09% 70.61% 54.58%
LSTM 3L Set 1 68.95% 52.61% 69.36% 53.09%
DNN 3L Set 2 65.29% 48.47% 66.54% 49.86%
LSTM 3L Set 2 66.05% 49.31% 66.37% 49.67%

And the plotted predictions for a 1m 30s subset of the test set:

Predictions

Finally, some famous songs were used as input to the network. The predictions for this song were transformed back to MIDI and gathered in the following video:

Alt text

deep-neural-networks-for-piano-music-transcription's People

Contributors

diegomorin8 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.