ide8 / tacotron2 Goto Github PK

View Code? Open in Web Editor NEW

126.0 8.0 25.0 3.03 MB

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

License: BSD 3-Clause "New" or "Revised" License

Python 48.28% Jupyter Notebook 51.61% Dockerfile 0.11%

tacotron tacotron2 tacotron2-pytorch waveglow tts multispeaker emotions nvidia

tacotron2's People

Contributors

Stargazers

Watchers

tacotron2's Issues

Inference inconsistency

After 750 epochs, we tested the created tacotron model via inference.ipynb and realized that for same input text sequence we get different generated output audio file. Additionally, there is always a large empty space at the beginning of the audio file of approximately 30 seconds. Just to mention, the data was previously preprocessed as explained in the readme file. Sometimes there is just a noise in the audio file, and the other times there is some speaking at the end of the audio file.
Do you have any experience with this issue?

Loss is NaN on first step, if data exceeds 100 sentences

Thank you for this repo. I trained for 300 epoch. I am getting the audio output but its noisy. Can you suggest me how to get better audio output from this model?

Pretrained Model

hey it will be good, if you can share your pretrained model with proper alignment, i am training from scratch since 6 days and not getting any alignment

any related papers?

Griffin Lim

First of all, thanks for the repository.

I am trying to train another dataset in other language using this repository, and since I do not have any pretrained waveglow model I cannot train a new Tacotron2 model... Is there any way to perform Griffin Lim on the inferred Mel spectrograms? I am having some issues regarding tensor dimensionality and I did not manage to get any audio...

Thanks in advance

Ander

Dataset

What dataset do you train & test your network? I cannt find any information about it except how to process data.

ide8 / tacotron2 Goto Github PK

tacotron2's People

Contributors

Stargazers

Watchers

Forkers

tacotron2's Issues

Inference inconsistency

Loss is NaN on first step, if data exceeds 100 sentences

Thank you for this repo. I trained for 300 epoch. I am getting the audio output but its noisy. Can you suggest me how to get better audio output from this model?

Pretrained Model

any related papers?

Griffin Lim

Dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent