acetylsv / gst-tacotron Goto Github PK

Reproducing Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis (https://arxiv.org/pdf/1803.09017.pdf)

Python 100.00%

speech-synthesis gst-tacotron tacotron global-style-tokens

gst-tacotron's People

Contributors

Stargazers

Watchers

Forkers

lturing entn-at peter05010402 templeblock

gst-tacotron's Issues

Pretrained model please

If possible then please share pretrained models of respective datasets.

Reference Encoder Padding

How do we ensure that the padding of the reference mel spectogram is taken into account when the reference encoder is applied on a batch of mels?

How to pass to multi-head attention?

Hi,

For condition_on_audio= False case, How to compute style_emb. What should be the GST tokens ?

Pre-trained models

It seems that the link to the pre-trained models is down. I was just wondering if you are able to put them up again. Thanks in advance of any help.

how to use pre-trained models

Thanks for sharing pre-trained models. Could you please let know how to use it.. If I rand it is asking me about input data set..

Use GST inference error

Hi,

I found this project is fantastic, thanks your contribution.

but I face some problem on trying to inference by GST tokens.

Already trained a model with LJSpeech, set condition_on_audio = False, but infer.py will output error as below:

ValueError: operands could not be broadcast together with shapes (320,) (256,)

I knew shapes (320,) and (256,) not aligned，but how to fix by modify your code to solve it?

if you can give some suggestions will be grateful. Thanks.

Hello! I have some questions about the BZ datasets. Do you have data preprocessing operation on the BZ dataset before training the model, such as breaking long sentences into small segments? Some sentences in BZ datasets are much longer than sentences in LJ.

the model is hard to converge with LJSpeech

Invalid reference audio?

I use pre-trained models and different reference audio, but the resulting audio talks barely change.
What could be the reason for this?

acetylsv / gst-tacotron Goto Github PK

gst-tacotron's People

Contributors

Stargazers

Watchers

Forkers

gst-tacotron's Issues

Pretrained model please

Reference Encoder Padding

How to pass to multi-head attention?

Pre-trained models

how to use pre-trained models

Use GST inference error

questions about the datasets.

the model is hard to converge with LJSpeech

Invalid reference audio?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent