Giter Club home page Giter Club logo

adversarialaudioseparation's Introduction

AdversarialAudioSeparation

Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction" available on arXiv here:

https://arxiv.org/abs/1711.00048

The idea

Improve existing supervised audio source separation models, which are commonly neural networks, with extra unlabelled mixture recordings as well as unlabelled solo recordings of the sources we want to separate. The network is trained in a normal supervised fashion to minimise its prediction error on fully annotated data (samples with mixture and sources paired up), and at the same time to output source estimates for the extra mixture recordings that are indistinguishable from the solo source recordings.

To achieve this, we use adversarial training: One discriminator network is trained per source to identify whether a source excerpt comes from the real solo source recordings or from the separator when evaluated on the extra mixtures.

This can prevent overfitting to the often small annotated dataset and makes use of the much more easily available unlabelled data.

Setup

Requirements

To run the code, the following Python packages are needed. We recommend the GPU version for Tensorflow due to the long running times of this model. You can install them easily using pip install -r requirements.txt after saving the below list to a text file.

tensorflow-gpu>=1.2.0  
sacred>=0.7.0  
audioread>=2.1.5
imageio>=2.2.0
librosa>=0.5.1
lxml>=3.8.0
mir_eval>=0.4
scikits.audiolab>=0.11.0
soundfile>=0.9.0

Furthermore, ffmpeg needs to be installed and in your path in case you want to read in mp3 files directly.

Dataset preparation

Before the code is runnable, the datasets need to be prepared and integrated into the data loading process. The simpler way to do this is to use the same datasets as used in the experiment in the paper, the alternative to use your own datasets and split them into custom partitions. Please see below and the Training.py code comments for guidance.

When the code is run for the first time, it creates a dataset.pkl file containing the dataset structure after reading in the dataset, so that subsequent starts are much faster.

Option 1: Recreate experiment from the paper

If you want to recreate the experiment from the paper, download the datasets DSD100, MedleyDB, CCMixter, and iKala separately. Then edit the corresponding XML files provided in this repository (DSD100.xml etc.), so that the XML entry

<databaseFolderPath>/mnt/daten/Datasets/DSD100</databaseFolderPath>

contains the location of the root folder of the respective dataset. Save the file changes and then execute Training.py.

Option 2: Use your own data of choice

To use your own datasets and dataset partitioning into supervised, unsupervised, validation and test sets, you can replace the data loading code in Training.py with a custom dataset loading function.

The only requirement to this function is its output format. The output should be a dictionary that maps the following strings to the respective dataset partitions:

"train_sup" : sample_list
"train_unsup" : [mix_list, source1_list, source2_list]
"train_valid" : sample_list
"train_test" : sample_list

A sample_list is a list with each element being a tuple containing three Sample objects. The order for these objects is mixture, source 1, source 2. You can initialise Sample objects with the constructor of the Sample class found in Sample.py. Each represents an audio signal along with its metadata. This audio should be preferably in .wav format for fast on-the-fly reading, but other formats such as mp3 are also supported.

The entry for "train_unsup" is different since recordings are not paired - instead, this entry is a list containing three lists. These contain mixtures, source1 and source2 Sample objects respectively. The lists can be of different length. since they are not paired.

Configuration and hyperparameters

You can configure settings and hyperparameters by modifying the model_config dictionary defined in the beginning of Training.py or using the commandline features of sacred by setting certain values when calling the script via commandline (see Sacred documentation).

Note that alpha and beta (hyperparameters from the paper) as loss weighting parameters are relatively important for good performance, tweaking these might be necessary. These are also editable in the model_config dictionary.

Training

The code is run by executing

python Training.py

It will train the same separator network first in a purely supervised way, and then using our semi-supervised adversarial approach. Each time, validation performance is measured regularly and early stopping is used, before the final test set performance is evaluated. For the semi-supervised approach, the additional data from dataset["train_unsup"] is used to improve performance.

Finally, BSS evaluation metrics are computed on the test dataset (SDR, SIR, SAR) - this saves the results in a pickled file along with the name of the dataset, so if you aim to use different datasets, the function needs to be extended slightly.

Logs are written continuously to the logs subfolder, so training can be supervised with Tensorboard. Checkpoint files of the model are created whenever validation performance is tested.

adversarialaudioseparation's People

Contributors

f90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

adversarialaudioseparation's Issues

Stuck on "Loading new item into cache from data list starting with"

Hi,

Thanks for your excellent code. I am trying to reproduce your results, however, the training will stuck on loading some item into cache. Below is the running output log:

image

Do you have any idea about that? The program stuck on this line: self.update_next_cache_item(self.communication_queue.get())

Normalizing spectrograms

As far as I know, the result of log1p(x) can be negative. You use this function to 'normalize' the spectrograms of target accompaniment and vocals and then use the difference between network outputs and these spectrograms in your loss function. However, network outputs after ReLU can't be negative.
I see the paper and I realize that it must work, so what do I miss? Please help me

Pretrained model?

Hi,

I know this is a bit late, it would be nice if a pretrained model were available for download, to easily recreate the original results and for use on custom audio.

In lieu of that, I'm trying to recreate the experiment, but I'm having some difficulty. Although the readme helpfully explains what to do, I'm not sure if I can obtain the same datasets. iKala is apparently no longer available at all, and MedleyDB is only available on request. I guess I'll try training using only the other two...

Fully adversarial training

Before I try it myself, I wanted to ask if you tried training the network without finetuning and starting from scratch with a fully adversarial training. Is that too hard to train? Did you try some other conditional GAN flavors?

Dataset links

Hi,
Do you have the links to the datasets you use? I am new to these datasets, but the paper is very interesting. I want to reproduce the results. However, it is not easy to find them and how you change the downloaded dataset to the formatted dataset. Though the dataset structure is described in README, it is still not clear to me that how a formatted dataset should be. Do you mind elaborating on that a little bit, like giving us a sample dataset in the repo?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.