Source Separation is a repository to extract speeches from various recorded sounds. It focuses to adapt more real-like dataset for training models.
The latest model in this repository is basically built with spectrogram based models. In mainly, Phase-aware Speech Enhancement with Deep Complex U-Net are implemented with modifications.
- Complex Convolution, Masking, Weighted SDR Loss
And then, To more stable inferences in real cases, below things are adopted.
- Audioset data is used to augment noises.
Dataset source is opened on audioset_augmentor. See this link for finding explanations about audioset. This repo used Balanced train label dataset (Label balanced, non-human classes, 18055 samples)
- Preemphasis is used to remove high-frequency noises on adapting real samples.
It's not official implementation by authors of paper.
Now, add Singing Voice Separation with DSD100 dataset! This model is trained with larger model and higher sample rate (44.1k). So it gives more stable and high quality audio. Let's checkout Youtube Playlist with samples of my favorites!
Voice Bank and Audioset (see above section)
You can use pre-defined preprocessing and dataset sources on https://github.com/Appleholic/pytorch_sound
- Singing Voice Separation is going on finishing stage (2019.10)
- Upload and share singing voice separation checkpoint.
- Python > 3.6
- pytorch 1.0
- ubuntu 16.04
- Brain Cloud (Kakaobrain Cluster) V2.XLARGE (2 V100 GPUs, 28 cores cpu, 244 GB memory)
They are two external repositories on this repository. These will be updated to setup with recursive clone or internal codes
- pytorch_sound package
It is built with using pytorch_sound. So that, pytorch_sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. Many of sources in this repository are based on pytorch_sound template.
- audioset_augmentor
Explained it on above section. link
-
General Voice Source Separation
- Model Name : refine_unet_base (see settings.py)
- Link : Google Drive
- Latest Tag : v0.0.0
-
Singing Voice Separation
- Model Name : refine_unet_larger
- Link : To be updated
- Latest Tag : v0.1.0
-
General Voice Source Separation
-
Validation 10 random samples
- Link : Google Drive
-
Test Samples :
- Link : Google Drive
-
-
Singing Voice Seperation
- Check out my youtube playlist !
- Link : Youtube Playlist
- Check out my youtube playlist !
- Install above external repos
You should see first README.md of audioset_augmentor and pytorch_sound, to prepare dataset and to train separation models.
$ pip install git+https://github.com/Appleholic/audioset_augmentor
$ pip install git+https://github.com/Appleholic/[email protected]
- Install package
$ pip install -e .
- Train
$ python source_separation/train.py [YOUR_META_DIR] [SAVE_DIR] [MODEL NAME, see settings.py] [SAVE_PREFIX] [[OTHER OPTIONS...]]
- Joint Train (Voice Bank and DSD100)
$ python source_separation/train_jointly.py [YOUR_VOICE_BANK_META_DIR] [YOUR_DSD100_META_DIR] [SAVE_DIR] [MODEL NAME, see settings.py] [SAVE_PREFIX] [[OTHER OPTIONS...]]
- Synthesize
- Be careful the differences sample rate between general case and singing voice case!
Single sample
$ python source_separation/synthesize.py separate [INPUT_PATH] [OUTPUT_PATH] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]
Whole validation samples
$ python source_separation/synthesize.py validate [YOUR_META_DIR] [OUTPUT_DIR] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]
All samples in given directory.
$ python source_separation/synthesize.py test-dir [INPUT_DIR] [OUTPUT_DIR] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]
This repository is developed by ILJI CHOI. It is distributed under Apache License 2.0.
If you use this repository for research, please cite:
@misc{specunet-audioset,
author = {Choi, Ilji},
title = {Spectrogram U-net for Source Separation with Audioset Samples},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Appleholic/source_separation}}
}