Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.
Note that this is a beta release and that this repository as well as the paper and data are subject to change. Stay tuned!
The dataset is a dump of the Free Music Archive. You got various sizes:
- Small: 4,000 clips of 30 seconds, 10 balanced genres (GTZAN-like) (~3.4 GiB)
- Medium: 14,511 clips of 30 seconds, 20 unbalanced genres (~12.2 GiB)
- Large (available soon): 77,643 clips of 30 seconds, 68 unbalanced genres (~90 GiB)
- Huge (subject to distribution constraints): 77,643 untrimmed clips, 68 unbalanced genres (~900 GiB)
Notes:
- All datasets come with MP3 audio (128 kbps, 44.1 kHz, stereo) of all clips.
- All datasets come with the following meta-data about each clip: artist, title, list of genres (and top genre), play count.
- Meta-data about all clips are stored in a JSON file to be loaded as a pandas dataframe.
- As additional audio meta-data, each clip of datasets 1 and 2 come with all Echonest features.
- Please see the paper for a description of how the data was collected and cleaned.
This repository features the following notebooks:
- Generation: generation of the datasets.
- Analysis: loading and basic analysis of the data.
- Baselines: baseline models for various tasks.
- Usage: how to load the datasets and train your own models.
# Install Python 3.6 and create a virtual environment.
pyenv install 3.6.0
pyenv virtualenv 3.6.0 fma
pyenv activate fma
# Clone the repository.
git clone https://github.com/mdeff/fma.git
cd fma
# Install the dependencies.
make install
# Fill in the configuration.
cat .env
DATA_DIR=/path/to/fma_small
# Open the Jupyter notebook.
jupyter-notebook
# Or run a notebook.
make fma_baselines.ipynb
- External dependencies: ffmpeg.
- Install CUDA to train on GPU. See Tensorflow's instructions.
- Please cite our paper if you use our code or data.
- The code is released under the terms of the MIT license.
- The dataset is meant for research only.
- We are grateful to SWITCH and EPFL for hosting the dataset within the context of the SCALE-UP project, funded in part by the swissuniversities SUC P-2 program.