Giter Club home page Giter Club logo

vae_tacotron2's Introduction

VAE Tacotron-2:

Unofficial Implementation of Learning latent representations for style control and transfer in end-to-end speech synthesis

Repository Structure:

Tacotron-2
├── datasets
├── LJSpeech-1.1	(0)
│   └── wavs
├── logs-Tacotron	(2)
│   ├── mel-spectrograms
│   ├── plots
│   ├── pretrained
│   └── wavs
├── papers
├── tacotron
│   ├── models
│   └── utils
├── tacotron_output	(3)
│   ├── eval
│   ├── gta
│   ├── logs-eval
│   │   ├── plots
│   │   └── wavs
│   └── natural
└── training_data	(1)
    ├── audio
    └── mels

The previous tree shows what the current state of the repository.

  • Step (0): Get your dataset, here I have set the examples of Ljspeech.
  • Step (1): Preprocess your data. This will give you the training_data folder.
  • Step (2): Train your Tacotron model. Yields the logs-Tacotron folder.
  • Step (3): Synthesize/Evaluate the Tacotron model. Gives the tacotron_output folder.

Requirements

first, you need to have python 3.5 installed along with Tensorflow v1.6.

next you can install the requirements :

pip install -r requirements.txt

else:

pip3 install -r requirements.txt

Dataset:

This repo tested on the ljspeech dataset, which has almost 24 hours of labeled single actress voice recording.

Preprocessing

Before running the following steps, please make sure you are inside Tacotron-2 folder

cd Tacotron-2

Preprocessing can then be started using:

python preprocess.py

or

python3 preprocess.py

dataset can be chosen using the --dataset argument. Default is Ljspeech.

Training:

Feature prediction model can be trained using:

python train.py --model='Tacotron'

or

python3 train.py --model='Tacotron'

Synthesis

There are three types of mel spectrograms synthesis for the Spectrogram prediction network (Tacotron):

  • Evaluation (synthesis on custom sentences). This is what we'll usually use after having a full end to end model.

python synthesize.py --model='Tacotron' --mode='eval' --reference_audio='ref_1.wav'

or

python3 synthesize.py --model='Tacotron' --mode='eval' --reference_audio='ref_1.wav'

Note:

  • This implementation not completly tested for all scenarios but training and synthesis with reference audio working.
  • Though it only tested on synthesize without GTA and with eval mode.
  • After training 250k step with 32 batch size on LJSpeech, KL error settled down near to zero (around 0.001) still not get good style transfer and control, may be because this model trained on LJSpeech which is not quite expressive datasets and only have 24 hrs of data, it might be produce good result on expressive dataset like Blizzard 2013 voice dataset though author of the paper used 105 hrs of Blizzard Challenge 2013 dataset.
  • In my testing, I havn't get good results so far on style transfer side may be some more tweaking required, this implementation easily integrated with wavenet as well as WaveRNN.
  • Feel free to suggest some changes or even better raise PR.

Pretrained model and Samples:

TODO Claimed Samples from research paper : http://home.ustc.edu.cn/~zyj008/ICASSP2019

References and Resources:

Work in progress

vae_tacotron2's People

Contributors

rishikksh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

vae_tacotron2's Issues

What a successful error curve should look like?

@rishikksh20 Hi, I use this repo with Blizzard2013 dataset instead of ljspeech dataset with default settings. I want to know whether I am training this vae_model right. or not ? What a successful error curve should look like with this repo? The followings are my curves. I wonder whether the difference between kl_loss and reconstruction_loss is too wide. Thanks for any help?

image
image
image

Loss exploded???

I get the error "loss explode" in the training stage!
I'm not modifying the original hyperparameters, and I want to know how to solve the problem.

about preprocess

pip install -r -requirements.txt
Just install the libraries in the requirement? LWS?

Loaded runtime CuDNN library error

Hi, I run your code with tf1.6, cudnn10.1 and cudnn9.2. Both got

2019-08-13 23:13:50.199673: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7600 (compatibility version 7600) but source was compiled with 7102 (compatibility version 7100).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2019-08-13 23:13:50.201349: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 

I google this error, and most said it is related to the cudnn version. May I ask what is cudnn version you used? Or could you give some advice? Thanks a lot.

Why ?Thanks

G:\vae_tacotron2-master>python train.py --model='Tacotron'
Traceback (most recent call last):
File "train.py", line 33, in
main()
File "train.py", line 24, in main
raise ValueError('please enter a valid model to train: {}'.format(accepted_models))
ValueError: please enter a valid model to train: ['Tacotron', 'Wavenet']

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.