laion-ai / audio-dataset Goto Github PK

View Code? Open in Web Editor NEW

610.0 610.0 52.0 84.17 MB

Audio Dataset for training CLAP and other models

Python 83.28% Shell 16.24% Dockerfile 0.47%

audio-dataset's People

Contributors

Stargazers

Watchers

audio-dataset's Issues

Which is the most suitable Music Dataset for training MuLaN?

@marianna13 Hi Mariana,
It seems like there are several options to have a Music Dataset. However, could you recommend me one (or many) for training the MuLaN model?

They used 44 million music recordings (almost 370K hours). The following table show some examples of their texts of 3 different types.

GigaSpeech (process)

https://github.com/speechcolab/gigaspeech

RAVDESS

Assigned: Knoriy

https://zenodo.org/record/1188976

Juno - Music review (@dicknascarsixtynine#3885)

https://www.juno.co.uk/products/wu-tang-clan-the-charmels-cream/861381-01/

files for preprocessing

Where do we get the following files from?
Any help would be appreciated.
Thanks in advance.

audio-dataset/data_preprocess/preprocess_freesound.py

Line 26 in 572ebd2

metadata_file = r'/home/yuchen/raw/freesound/parquet/freesound_parquet.parquet'

audio-dataset/data_preprocess/preprocess_freesound.py

Line 27 in 572ebd2

ignore_file = r'/home/yuchen/raw/freesound/filename_dic.txt'

audio-dataset/data_preprocess/preprocess_freesound.py

Line 28 in 572ebd2

duration_file = r"/home/yuchen/raw/freesound/all_duration.txt"

How to download Freesound?

Hi, can you share some ways to download Freesound? e.g. How to use Linux scripts to download these audio.

CoVoST (process)

https://github.com/facebookresearch/covost

MIDI50K (prepare from scratch)

CREMA-D (process)

Assigned: Knoriy

https://github.com/CheyneyComputerScience/CREMA-D

see dataset repo ./data_collection/README.md. (We will add all the list contents here latter).

MIDI50K datasets (prepare from scratch)

Current location AWS S3 bucket, not yet prepared

FSD50K (Process）

https://annotator.freesound.org/fsd/release/fsd50k/

Clotho

https://zenodo.org/record/4783391#.ygdaa9-znpy

freesound download

Most of the urls in the Freesound (no overlap)train+test.csv files are invalid. When I visit the url, I find the result like this:

https://freesound.org/apiv2/sounds/621393/download/?format=api

How can I download the dataset correctly? Thank you!

decoding speed / benchmark

This repo is great. I always wanted to benchmark webdataset for audio. A couple of questions:

did you find flac to be a good trade-off between decoding performance and file-size? have you tried mp3 instead?
did you benchmark the pipeline against plain torch.data with torchaudio or the new torch data pipes? Maybe adding the benchmark to https://github.com/faroit/python_audio_loading_benchmark/ to give this a go?
How is partial decoding seeking be typically done with webdatasets, when storing long audio but at decoding stage, only random chunks are being read. Is seeking supported? If yes, does this slow down the i/o pipeline?

Missing 'tag' key in FSD50k preprocessor

Hi,
Thanks for sharing the wonderful code.

According to the readme of data preprocess (here)
there should be a key of 'tag' (containing labels) in the output JSON file after preprocessing.

This tag extraction/creation is missing in the preprocess_FSD50K.py file.

Am I understanding something incorrectly or there is 'tag' creation missing in the file?

Thanks,
Saksham

LJSpeech

https://keithito.com/lj-speech-dataset/

Sonnis Game Effects

LibriSpeech

https://paperswithcode.com/dataset/librispeech

Dataset Plan

@rvencu @rom1504
We need more data in the next step. The data we need in the ranking of priority is:

Audio data with natural text description(s).
Audio data with other labels, and "made up" a text description for the audio.

For audio data with natural text description, we further need:

MACS - Multi-Annotator Captioned Soundscapes: a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool.
Free Sound: scrape audio and text description from Free Sound. It is ok that the texts are a bit noisy.
High-quality sound effect library with similar quality as BBC Sound Effect: such as https://www.sound-ideas.com/Default.aspx or https://www.boomlibrary.com/ who has high-quality text descriptions of the audio rather than tags and labels.
Music review websites: such as Pitch Fork

For audio data with other labels, we need to collect new large datasets while converting our current dataset with tag labels.

The datasets in top priority are those with large size and easy to turn labels into a text description:

(The following datasets all are those with tag labels of the audio)

The datasets we currently have that need converting labels to text are: