In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to WavLM features, and propose the spectrogram-resize based data augmentation to improve the purity of extracted content information.
๐ค Play online at HuggingFace Spaces.
Visit our demo page for audio samples.
We also provide the pretrained models.
(a) Training | (b) Inference |
---|
- Code release. (Nov 27, 2022)
- Online demo at HuggingFace Spaces. (Dec 14, 2022)
- Supports 24kHz outputs. See here for details. (Dec 15, 2022)
- Fix data loading bug. (Jan 10, 2023)
-
Clone this repo:
git clone https://github.com/OlaWod/FreeVC.git
-
CD into this repo:
cd FreeVC
-
Install python requirements:
pip install -r requirements.txt
-
Download WavLM-Large and put it under directory 'wavlm/'
-
Download the VCTK dataset (for training only)
-
Download HiFi-GAN model and put it under directory 'hifigan/' (for training with SR only)
Download the pretrained checkpoints and run:
# inference with FreeVC
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc
# inference with FreeVC-s
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s
- Preprocess
python downsample.py --in_dir </path/to/VCTK/wavs>
ln -s dataset/vctk-16k DUMMY
# run this if you want a different train-val-test split
python preprocess_flist.py
# run this if you want to use pretrained speaker encoder
CUDA_VISIBLE_DEVICES=0 python preprocess_spk.py
# run this if you want to train without SR-based augmentation
CUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py
# run these if you want to train with SR-based augmentation
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92
- Train
# train freevc
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc
# train freevc-s
CUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s