Musical key detection with Deep Learning

Quick use

Load the latest release min_files.zip OR clone the repository and load the ONNX model attachment
Use export_tensor.py to transform your .wav file into a JSON-encoded 2d spectogram.

python .\export_tensor.py AUDIOFILE TARGETFILE

Run a localhost server in the root directory of the project, e.g. for port 8080

npx light-server -s . -p 8080

Open localhost:PORT in your browser of choice (activate JavaScript)
Load the exported tensor file and submit

Papers

https://paperswithcode.com/paper/deeper-convolutional-neural-networks-and
https://paperswithcode.com/paper/musical-tempo-and-key-estimation-using
https://arxiv.org/pdf/1706.02921.pdf

Topic

Musical feature extraction

Project type

Bring your own model

Summary

My main focus for this project will be the topic of key detection in musical pieces. I hope to reuse existing approaches and further refine them to improve on their performance. Steps are planned in the following order:

Rebuild an existing solution as quoted above
Experiment with network architectures, types of feature extraction, and applications to the waveform itself.
Expand the currently existing collection of key detection data sources with simple self-made compositions. Using modern DAWs it should be somewhat trivial to construct a series of short audio samples in various keys using different instruments setups.

I plan to use the following datasets:

GiantSteps & GiantSteps MTG
These seems to be common datasets to use for key extraction and also provide us with some comparable approaches from other models
Children's Songs
This is a set of vocal recordings only
Optionally, my own dataset

My main focus will be on model generation. However, if I can reach satisfying results before my estimated time is used up, I will invest the remaining time into dataset creation. Hence the following breakdown is somewhat flexible:

Dataset collection: 2-8 hrs
Design/build network: 9-15 hrs
Train/tune network: 18 hrs
Build application: 8 hrs
Write report: 6 hrs
Presentation: 6 hrs Total: 55 hrs

Phase 2 - Hacking report

Plan

My references paper used accuracy ratings (micro-averaged from my understanding) as well as the Mirex score. I am using both scores to be comparable.

The state of the art in Mirex score is around 75. My aim was to reach at least 70.

For the implementation I went for stripped-down version of InceptionKeyNet. I implemented some of the blocks, but stopped noticing performance increases after a while, and in fact it seemed that the network decreased in accuracy. Personally I think the full network is overkill for key detection only, which likely depends on a few base frequencies for the most part. My next steps will include experimentation with more simple models.

Installation

All code and notes can be found in the Jupyter notebook. Please install the dependencies outlined in requirements.txt. The audio files must be downloaded using the repository links above. The project is configured for Giantsteps and Giantsteps MTG and allows setup of data locations within the notebook. It also includes a conversion script from mp3 to wav. I neeeded to load the files in wav format on my windows machine, or else a significant chunk would not load

Results

The final Mirex best score for my network is currently around 60 across various recomputed train-test splits using the optimal configuration (should be set up at time of submissions), meaning I am sadly behind my established target at the moment.

Rough time investment:

Dataset collection: 3 hrs
Design/build network: 30 hrs
Train/tune network: 20 hrs

entenzahn / adl_ws22 Goto Github PK

adl_ws22's Introduction

Musical key detection with Deep Learning

Quick use

Papers

Topic

Project type

Summary

Phase 2 - Hacking report

Plan

Installation

Results

adl_ws22's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent