Giter Club home page Giter Club logo

adl_ws22's Introduction

Musical key detection with Deep Learning

Quick use

  1. Load the latest release min_files.zip OR clone the repository and load the ONNX model attachment
  2. Use export_tensor.py to transform your .wav file into a JSON-encoded 2d spectogram.
python .\export_tensor.py AUDIOFILE TARGETFILE
  1. Run a localhost server in the root directory of the project, e.g. for port 8080
npx light-server -s . -p 8080
  1. Open localhost:PORT in your browser of choice (activate JavaScript)
  2. Load the exported tensor file and submit

Papers

https://paperswithcode.com/paper/deeper-convolutional-neural-networks-and
https://paperswithcode.com/paper/musical-tempo-and-key-estimation-using
https://arxiv.org/pdf/1706.02921.pdf

Topic

Musical feature extraction

Project type

Bring your own model

Summary

My main focus for this project will be the topic of key detection in musical pieces. I hope to reuse existing approaches and further refine them to improve on their performance. Steps are planned in the following order:

  1. Rebuild an existing solution as quoted above
  2. Experiment with network architectures, types of feature extraction, and applications to the waveform itself.
  3. Expand the currently existing collection of key detection data sources with simple self-made compositions. Using modern DAWs it should be somewhat trivial to construct a series of short audio samples in various keys using different instruments setups.

I plan to use the following datasets:

  • GiantSteps & GiantSteps MTG
    These seems to be common datasets to use for key extraction and also provide us with some comparable approaches from other models
  • Children's Songs
    This is a set of vocal recordings only
  • Optionally, my own dataset

My main focus will be on model generation. However, if I can reach satisfying results before my estimated time is used up, I will invest the remaining time into dataset creation. Hence the following breakdown is somewhat flexible:

  • Dataset collection: 2-8 hrs
  • Design/build network: 9-15 hrs
  • Train/tune network: 18 hrs
  • Build application: 8 hrs
  • Write report: 6 hrs
  • Presentation: 6 hrs Total: 55 hrs

Phase 2 - Hacking report

Plan

My references paper used accuracy ratings (micro-averaged from my understanding) as well as the Mirex score. I am using both scores to be comparable.

The state of the art in Mirex score is around 75. My aim was to reach at least 70.

For the implementation I went for stripped-down version of InceptionKeyNet. I implemented some of the blocks, but stopped noticing performance increases after a while, and in fact it seemed that the network decreased in accuracy. Personally I think the full network is overkill for key detection only, which likely depends on a few base frequencies for the most part. My next steps will include experimentation with more simple models.

Installation

All code and notes can be found in the Jupyter notebook. Please install the dependencies outlined in requirements.txt. The audio files must be downloaded using the repository links above. The project is configured for Giantsteps and Giantsteps MTG and allows setup of data locations within the notebook. It also includes a conversion script from mp3 to wav. I neeeded to load the files in wav format on my windows machine, or else a significant chunk would not load

Results

The final Mirex best score for my network is currently around 60 across various recomputed train-test splits using the optimal configuration (should be set up at time of submissions), meaning I am sadly behind my established target at the moment.

Rough time investment:

  • Dataset collection: 3 hrs
  • Design/build network: 30 hrs
  • Train/tune network: 20 hrs

adl_ws22's People

Contributors

entenzahn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.