Giter Club home page Giter Club logo

stargan-voice-conversion's Introduction

StarGAN-Voice-Conversion

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169 . Note that the model architecture is a little different from that of the original paper.

Dependencies

  • Python 3.6 (or 3.5)
  • Pytorch 0.4.0
  • pyworld
  • tqdm
  • librosa
  • tensorboardX and tensorboard

Usage

Download Dataset

Download and unzip VCTK corpus to designated directories.

mkdir ./data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ./data

If the downloaded VCTK is in tar.gz, run this:

tar -xzvf VCTK-Corpus.tar.gz -C ./data

Preprocess data

We will use Mel-cepstral coefficients(MCEPs) here.

python preprocess.py --sample_rate 16000 \
                    --origin_wavpath data/VCTK-Corpus/wav48 \
                    --target_wavpath data/VCTK-Corpus/wav16 \
                    --mc_dir_train data/mc/train \
                    --mc_dir_test data/mc/test

Train model

Note: you may need to early stop the training process if the training-time test samples sounds good or the you can also see the training loss curves to determine early stop or not.

python main.py

Convert

For example: restore model at step 200000 and specify the source speaker and target speaker to p262 and p272, respectively.

convert.py --resume_iters 200000 --src_spk p262 --trg_spk p272

To-Do list

  • Post some converted samples (Please find some converted samples in the converted_samples folder).

Papers that use this repo:

  1. AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss (ICML2019)
  2. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion (NeurIPS 2019)
  3. ADAGAN: ADAPTIVE GAN FOR MANY-TO-MANY NON-PARALLEL VOICE CONVERSION (under review for ICLR 2020)

stargan-voice-conversion's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stargan-voice-conversion's Issues

A question about the adversarial loss.

Thanks for your implement. It quites help me. But I am confused about the adversarial loss. For the Voice conversion task, it aims to convert the voice from A to speaker B's. And the discriminator trains iteratively to measure the current distribution and the target distribution. And why in the implement the adversarial loss compute the distribution distance between A and B but not B and B' ( I denote the G(A, y_b) as B').
Thank you. Look forward to your reply.

Id mapping loss

Hi,

I can see from this line:

g_loss = g_loss_fake + self.lambda_rec * g_loss_rec + self.lambda_cls * g_loss_cls_spks

in solver.py that adv. loss, reconstruction loss and domian class loss are being computed however, am I correct in saying that this paper does not implement the id mapping loss introduced in the StarGAN-VC paper?

Thank you,
Sam

Number of Mel-cpestral coefficients (MCEPs)

Hello, I wanted to ask a question.
When we want to compute the MCEPs of the wav files, we do it using the function coded_sp = pyworld.code_spectral_envelope(sp, fs, dim) where dim=36.
I wanted to ask you why you are using 36 dimensions o 36 MCEPs?

About how to run this repo

Hi,
I want to know what should be put into the path "./mc/train" appeared in the file "data_loader.py". Besides, could you tell me what files need to be run and the specific order of running these files?

Python 3.5

In Requirements you say python 3.6 (or 3.5).
However Python 3.5 cannot be used here, because there are a few f-strings which requires python 3.6+, (though we can manually change them to %-formatted string and work with python 3.5)

preprocessing.py possible sox issue

The resample_to_16k function of preprocessing.py just prints:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

when finished.

The next section begins with number of workers: 10 however, here the program just halts.

I think this may be a sox issue as i have just installed it with conda forge. I am working on a VM and am not entirely sure how to install sox from binary.

Thanks

Loss function meanings

image
Hello can someone explain me in a bit more detail what each loss function says about the training?
Thanks in advance

How should I take it?Thank you!

sox WARN rate: rate clipped 27 samples; decrease volume?
sox WARN dither: dither clipped 25 samples; decrease volume?
sox WARN rate: rate clipped 3 samples; decrease volume?
sox WARN dither: dither clipped 2 samples; decrease volume?
60%|██████████████████████████▍ | 6/10 [00:27<00:36, 9.02s/it]sox WARN rate: rate clipped 4 samples; decrease volume?
sox WARN dither: dither clipped 4 samples; decrease volume?
100%|███████████████████████████████████████████| 10/10 [00:34<00:00, 5.55s/it]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
number of workers: 10
0%| | 0/10 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "preprocess.py", line 50, in get_spk_world_feats
train_paths, test_paths = split_data(paths)
File "preprocess.py", line 42, in split_data
train_indices, test_indices = train_test_split(indices, test_size=test_size, random_state=1234)
File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2100, in train_test_split
default_test_size=0.25)
File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1782, in _validate_shuffle_split
train_size)
ValueError: With n_samples=0, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "preprocess.py", line 130, in
result_list = [future.result() for future in tqdm(futures)]
File "preprocess.py", line 130, in
result_list = [future.result() for future in tqdm(futures)]
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/root/anaconda3/envs/tensorflow/lib/python3.6/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
ValueError: With n_samples=0, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
^CError in atexit._run_exitfuncs:

not find gated cnn

hi, thanks for you sharing.
There is GLU(Gated CNN) in Generator and Discriminator. But I didn't find them.
I just find RELU, in model.py line:30 Generator:

    layers.append(nn.ReLU(inplace=True))

model is not stargan-vc

hi,
(中文熟悉些😆)貌似模型是cyclegan-vc改的,并不是stargan-vc的模型,不知转换结果怎么样呢?

I cannot run the code.

When I run the command
python preprocess.py --sample_rate 16000 --origin_wavpath data/VCTK-Corpus/wav48 --target_wavpath data/VCTK-Corpus/wav16 --mc_dir_train data/mc/train --mc_dir_test data/mc/test

I get the following error
Traceback (most recent call last): File "C:\Users\user pc\Anaconda3\lib\concurrent\futures\process.py", line 232, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "C:\Users\user pc\PycharmProjects\song\preprocess.py", line 25, in resample subprocess.call(['sox', wav_from, "-r", "16000", wav_to]) File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 323, in call with Popen(*popenargs, **kwargs) as p: File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 775, in __init__ restore_signals, start_new_session) File "C:\Users\user pc\Anaconda3\lib\subprocess.py", line 1178, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

Error in training with more than 4 speakers

Hi there,
Thanks for the code. I am able to work with the code with 4 speakers in folders Speaker and Speaker_test but if I increase the number of the speaker to 6, preprocess.py file runs fine but training through main.py is throwing an error saying that "target x is out of bounds" I am keeping data very small (100 samples for all 6 speakers to run successfully and then increasing the voice sample size. I try to read the code but not able to debug thoroughly yet.

UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
Traceback (most recent call last):
File "main.py", line 92, in
main(config)
File "main.py", line 34, in main
solver.train()
File "/Users/shalinisaini/pytorch-StarGAN-VCtk-oct-6small/solver.py", line 160, in train
cls_loss_real = CELoss(input=cls_real, target=speaker_idx_org)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 931, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2317, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/Users/shalinisaini/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2115, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 4 is out of bounds.

---can anyone tell if this code can be expanded for more than 4 speakers. I am not sure if I am missing something stupid but please see if anyone can help in the right direction. I would like to run the training for more speakers.

Thanks for your help.

Are there any requirements for training datasets?

I put my own Chinese corpus into the training model, and the conversion effect is not as good as that of the original English corpus.
Are there any requirements for training datasets?

Looking forward to answer.

thks.

How to fine-tune StarGAN-VC model?

Firstly thanks to the author for the implementation of this work. In ./converted_samples/readme, you mentioned that 'These converted samples were obtained from the not-well-fine-tuned model. If you want to get better results, please tune the hyper-parameters carefully'. Thus please give me some advice on how to fine-tune this model? Thanks a lot.

Why g_loss is lack of g_loss_identity

I found that the identity loss seems like missing in the "train the G ", and in my opinion, the code is supposed to be :

            id_mc = self.G ( mc_real, spk_c_org )
            g_loss_id = torch.mean( torch.abs( id_mc - mc_real ))

I wish that it can be explained by you . Thank you very much .

Inference time

The paper claims that this could allow at least real time, I would like to know the inference time and the hardware that you are using. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.