liusongxiang / stargan-voice-conversion Goto Github PK

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Home Page: https://arxiv.org/abs/1806.02169

Python 100.00%

voice-conversion stargan pytorch-implementation

stargan-voice-conversion's Introduction

StarGAN-Voice-Conversion

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169 . Note that the model architecture is a little different from that of the original paper.

Dependencies

Python 3.6 (or 3.5)
Pytorch 0.4.0
pyworld
tqdm
librosa
tensorboardX and tensorboard

Usage

Download Dataset

Download and unzip VCTK corpus to designated directories.

mkdir ./data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ./data

If the downloaded VCTK is in tar.gz, run this:

tar -xzvf VCTK-Corpus.tar.gz -C ./data

Preprocess data

We will use Mel-cepstral coefficients(MCEPs) here.

python preprocess.py --sample_rate 16000 \
                    --origin_wavpath data/VCTK-Corpus/wav48 \
                    --target_wavpath data/VCTK-Corpus/wav16 \
                    --mc_dir_train data/mc/train \
                    --mc_dir_test data/mc/test

Train model

Note: you may need to early stop the training process if the training-time test samples sounds good or the you can also see the training loss curves to determine early stop or not.

python main.py

Convert

For example: restore model at step 200000 and specify the source speaker and target speaker to p262 and p272, respectively.

convert.py --resume_iters 200000 --src_spk p262 --trg_spk p272

To-Do list

Post some converted samples (Please find some converted samples in the converted_samples folder).

Papers that use this repo:

stargan-voice-conversion's People

Stargazers

Watchers

Forkers

ml-lab entn-at tspannhw beckgom tencv lewisget leocnj hyuhualin aixingxy segurac michaelvobejda oracle9i88 xudongxiang ady95 ianmcaulay x-ccs undercontroller anotherother michelleeechan yhgon elegantors chow549 zhengjunyue xzm2004260 chenchy moemaher silyfox xiaozhuo12138 jubird915 mzntaka0 seansleat forwiat qianqq chcbin reatris loong1989 xi-studio giranntu hrnoh c1a1o1 phymucs coderwhisky norangeeroli indexalice thestarboy jjandnn guomin allenhung1025 lengjiayi nianzu-ethan-zheng farishijazi cuijianzhu karkirowle wonwizard soccergame kunzhou9646 macroustc charlottecuc liyihao17 elsa66666 thehappygadfly moordev codedigger111 mortyzhou-shef-bit lvxiaoqi hertz-pj qianye2777 deeptft jeihyunsung koeunseooooo kimjj-geek shaun95 hongchengzhu tengfei-zju kitakenkani susmitabhatt ychenn1 tttt1314 python-repository-hub suhitaghosh10 gongzhihong ahmeftah rob813 diogolimamarques dariadiatlova doris962 hjryu98 mofasjang tangfuhao heroxrs

stargan-voice-conversion's Issues

A question about the adversarial loss.

Thanks for your implement. It quites help me. But I am confused about the adversarial loss. For the Voice conversion task, it aims to convert the voice from A to speaker B's. And the discriminator trains iteratively to measure the current distribution and the target distribution. And why in the implement the adversarial loss compute the distribution distance between A and B but not B and B' ( I denote the G(A, y_b) as B').
Thank you. Look forward to your reply.

Id mapping loss

Hi,

I can see from this line:

g_loss = g_loss_fake + self.lambda_rec * g_loss_rec + self.lambda_cls * g_loss_cls_spks

in solver.py that adv. loss, reconstruction loss and domian class loss are being computed however, am I correct in saying that this paper does not implement the id mapping loss introduced in the StarGAN-VC paper?

Thank you,
Sam

Number of Mel-cpestral coefficients (MCEPs)

Hello, I wanted to ask a question.
When we want to compute the MCEPs of the wav files, we do it using the function coded_sp = pyworld.code_spectral_envelope(sp, fs, dim) where dim=36.
I wanted to ask you why you are using 36 dimensions o 36 MCEPs?

About how to run this repo

Hi,
I want to know what should be put into the path "./mc/train" appeared in the file "data_loader.py". Besides, could you tell me what files need to be run and the specific order of running these files?

Python 3.5

In Requirements you say python 3.6 (or 3.5).
However Python 3.5 cannot be used here, because there are a few f-strings which requires python 3.6+, (though we can manually change them to %-formatted string and work with python 3.5)

preprocessing.py possible sox issue

The resample_to_16k function of preprocessing.py just prints:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

when finished.

The next section begins with number of workers: 10 however, here the program just halts.

I think this may be a sox issue as i have just installed it with conda forge. I am working on a VM and am not entirely sure how to install sox from binary.

Thanks

Loss function meanings

Hello can someone explain me in a bit more detail what each loss function says about the training?
Thanks in advance

Can implementation of the author share 200000 iteration model for comparative study?

I used a model that iterated 7000/8000 times and it didn't work well.
I'm continuing to train.

Can implementation of the author share 200000G&D iteration model for comparative study?

tks.

How should I take it?Thank you!

sox WARN rate: rate clipped 27 samples; decrease volume?
sox WARN dither: dither clipped 25 samples; decrease volume?
sox WARN rate: rate clipped 3 samples; decrease volume?
sox WARN dither: dither clipped 2 samples; decrease volume?
60%|██████████████████████████▍ | 6/10 [00:27<00:36, 9.02s/it]sox WARN rate: rate clipped 4 samples; decrease volume?
sox WARN dither: dither clipped 4 samples; decrease volume?
100%|███████████████████████████████████████████| 10/10 [00:34<00:00, 5.55s/it]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
number of workers: 10
0%| | 0/10 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "preprocess.py", line 50, in get_spk_world_feats
train_paths, test_paths = split_data(paths)
File "preprocess.py", line 42, in split_data
train_indices, test_indices = train_test_split(indices, test_size=test_size, random_state=1234)
File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2100, in train_test_split
default_test_size=0.25)
File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1782, in _validate_shuffle_split
train_size)
ValueError: With n_samples=0, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "preprocess.py", line 130, in
result_list = [future.result() for future in tqdm(futures)]
File "preprocess.py", line 130, in
result_list = [future.result() for future in tqdm(futures)]
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
ValueError: With n_samples=0, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
^CError in atexit._run_exitfuncs:

not find gated cnn

hi, thanks for you sharing.
There is GLU(Gated CNN) in Generator and Discriminator. But I didn't find them.
I just find RELU, in model.py line:30 Generator:

    layers.append(nn.ReLU(inplace=True))

model is not stargan-vc

hi,
(中文熟悉些😆）貌似模型是cyclegan-vc改的，并不是stargan-vc的模型，不知转换结果怎么样呢？

I cannot run the code.

When I run the command
python preprocess.py --sample_rate 16000 --origin_wavpath data/VCTK-Corpus/wav48 --target_wavpath data/VCTK-Corpus/wav16 --mc_dir_train data/mc/train --mc_dir_test data/mc/test

I get the following error
Traceback (most recent call last): File "C:\Users\user pc\Anaconda3\lib\concurrent\futures\process.py", line 232, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "C:\Users\user pc\PycharmProjects\song\preprocess.py", line 25, in resample subprocess.call(['sox', wav_from, "-r", "16000", wav_to]) File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 323, in call with Popen(*popenargs, **kwargs) as p: File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 775, in __init__ restore_signals, start_new_session) File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 1178, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

Error in training with more than 4 speakers

Hi there,
Thanks for the code. I am able to work with the code with 4 speakers in folders Speaker and Speaker_test but if I increase the number of the speaker to 6, preprocess.py file runs fine but training through main.py is throwing an error saying that "target x is out of bounds" I am keeping data very small (100 samples for all 6 speakers to run successfully and then increasing the voice sample size. I try to read the code but not able to debug thoroughly yet.

UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
Traceback (most recent call last):
File "main.py", line 92, in
main(config)
File "main.py", line 34, in main
solver.train()
File "/Users/shalinisaini/pytorch-StarGAN-VCtk-oct-6small/solver.py", line 160, in train
cls_loss_real = CELoss(input=cls_real, target=speaker_idx_org)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 931, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2317, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2115, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 4 is out of bounds.

---can anyone tell if this code can be expanded for more than 4 speakers. I am not sure if I am missing something stupid but please see if anyone can help in the right direction. I would like to run the training for more speakers.

Thanks for your help.

D/loss_real: -0.0000

Suggestions for documentation

Python 3.6 is needed because of the f-strings
And also sox is needed (sudo apt-get install sox on Ubuntu)

run Convert.py wrong

could you please tell me which cuda and cudnn you used??

Are there any requirements for training datasets?

I put my own Chinese corpus into the training model, and the conversion effect is not as good as that of the original English corpus.
Are there any requirements for training datasets?

Looking forward to answer.

thks.

Do you have file ./models\200000-G.ckpt ? I want to download it. Thank you

How to fine-tune StarGAN-VC model?

Firstly thanks to the author for the implementation of this work. In ./converted_samples/readme, you mentioned that 'These converted samples were obtained from the not-well-fine-tuned model. If you want to get better results, please tune the hyper-parameters carefully'. Thus please give me some advice on how to fine-tune this model? Thanks a lot.

Why g_loss is lack of g_loss_identity

I found that the identity loss seems like missing in the "train the G ", and in my opinion, the code is supposed to be :

            id_mc = self.G ( mc_real, spk_c_org )
            g_loss_id = torch.mean( torch.abs( id_mc - mc_real ))

I wish that it can be explained by you . Thank you very much .

Inference time

The paper claims that this could allow at least real time, I would like to know the inference time and the hardware that you are using. Thanks.