FMA: A Dataset For Music Analysis

Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

Note that this is a beta release and that this repository as well as the paper and data are subject to change. Stay tuned!

Data

The dataset is a dump of the Free Music Archive. You got various sizes:

Small: 4,000 clips of 30 seconds, 10 balanced genres (GTZAN-like) (~3.4 GiB)
Medium: 14,511 clips of 30 seconds, 20 unbalanced genres (~12.2 GiB)
Large (available soon): 77,643 clips of 30 seconds, 68 unbalanced genres (~90 GiB)
Huge (subject to distribution constraints): 77,643 untrimmed clips, 68 unbalanced genres (~900 GiB)

Notes:

All datasets come with MP3 audio (128 kbps, 44.1 kHz, stereo) of all clips.
All datasets come with the following meta-data about each clip: artist, title, list of genres (and top genre), play count.
Meta-data about all clips are stored in a JSON file to be loaded as a pandas dataframe.
As additional audio meta-data, each clip of datasets 1 and 2 come with all Echonest features.
Please see the paper for a description of how the data was collected and cleaned.

Code

This repository features the following notebooks:

Generation: generation of the datasets.
Analysis: loading and basic analysis of the data.
Baselines: baseline models for various tasks.
Usage: how to load the datasets and train your own models.

Installation

# Install Python 3.6 and create a virtual environment.
pyenv install 3.6.0
pyenv virtualenv 3.6.0 fma
pyenv activate fma

# Clone the repository.
git clone https://github.com/mdeff/fma.git
cd fma

# Install the dependencies.
make install

# Fill in the configuration.
cat .env
DATA_DIR=/path/to/fma_small

# Open the Jupyter notebook.
jupyter-notebook

# Or run a notebook.
make fma_baselines.ipynb

External dependencies: ffmpeg.
Install CUDA to train on GPU. See Tensorflow's instructions.

License

Please cite our paper if you use our code or data.
The code is released under the terms of the MIT license.
The dataset is meant for research only.
We are grateful to SWITCH and EPFL for hosting the dataset within the context of the SCALE-UP project, funded in part by the swissuniversities SUC P-2 program.

wnstlr / fma Goto Github PK

fma's Introduction

FMA: A Dataset For Music Analysis

Data

Code

Installation

License

fma's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent