ga642381 / fastspeech2 Goto Github PK

View Code? Open in Web Editor NEW

89.0 8.0 17.0 40.7 MB

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech :fist:

Python 100.00%

text-to-speech fastspeech2 pytorch multi-speaker-tts melgan tts waveglow

fastspeech2's People

Contributors

Stargazers

Watchers

Forkers

pohanchi raytrac3r xuexidi eric102004 rielkim hh1992 mnm-matin b07902011 muxichu taalua haojunyong shaun95 mbencherif amorjnyh zhangziliang04 balakkvj

fastspeech2's Issues

Training for Korean Languages

Hello authors,
First of all, thank you for giving us an impressive repository.
For now, I want to re-trained your model with Korean language, for example KSS (korean single speaker). However, when I synthesize, I see it is not good for korean language. Can you give me some guidelines for that.

Thank you very much

Invalid tensor shape

VCTK. Model training OK. Run synthesize.py

Additional diagnostic just before crash:

print("Print")
print(text)
print(src_len)
print(spk_ids)

Using cache found in /root/.cache/torch/hub/descriptinc_melgan-neurips_master
Synthesizing...
Weather forecast for tonight: dark
|{W EH1 DH ER0 F AO1 R K AE2 S T F AO1 R T AH0 N AY1 T D AA1 R K}|
Print
tensor([[144, 94, 91, 97, 104, 78, 130, 116, 71, 131, 133, 104, 78, 130,
133, 73, 119, 86, 133, 90, 66, 130, 116],
[144, 94, 91, 97, 104, 78, 130, 116, 71, 131, 133, 104, 78, 130,
133, 73, 119, 86, 133, 90, 66, 130, 116],
[144, 94, 91, 97, 104, 78, 130, 116, 71, 131, 133, 104, 78, 130,
133, 73, 119, 86, 133, 90, 66, 130, 116]], device='cuda:0')
tensor([23, 23, 23], device='cuda:0')
tensor([5, 6, 7], device='cuda:0')
Traceback (most recent call last):
File "synthesize.py", line 168, in
synthesize(model, waveglow, melgan, text, sentence, prefix='step_{}'.format(args.step))
File "synthesize.py", line 85, in synthesize
mel, mel_postnet, log_duration_output, f0_output, energy_output, _, _, mel_len = model(text, src_len, speaker_ids=spk_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.gather(outputs, self.output_device)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/data_parallel.py", line 181, in gather
return gather(outputs, output_device, dim=self.dim)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
res = gather_map(outputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/_functions.py", line 75, in forward
return comm.gather(inputs, ctx.dim, ctx.target_device)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/comm.py", line 235, in gather
return torch._C._gather(tensors, dim, destination)
RuntimeError: Input tensor at index 1 has invalid shape [1, 293, 80], but expected [1, 371, 80]

Division by Zero

VCTK corpus, not all words present.

step : 1000
Epoch [1/1000], Step [1000/2688000]:
Total Loss: 2.4196, Mel Loss: 0.2047, Mel PostNet Loss: 0.2929, Duration Loss: 0.3105, F0 Loss: 72.4894, Energy Loss: 8.8656;
Time Used: 300.306s, Estimated Time Remaining: 781687.842s.
step : 2000
Epoch [1/1000], Step [2000/2688000]:
Total Loss: 1.8951, Mel Loss: 0.1795, Mel PostNet Loss: 0.1984, Duration Loss: 0.2650, F0 Loss: 48.0024, Energy Loss: 7.7215;
Time Used: 598.532s, Estimated Time Remaining: 686206.710s.
step: 2000 , length 249, tensor([249, 363, 264, 352, 281, 264, 221, 275, 264, 221, 307, 277, 298, 266,
286, 268], device='cuda:0')
Traceback (most recent call last):
File "train.py", line 265, in
main(args)
File "train.py", line 239, in main
d_l, f_l, e_l, m_l, m_p_l = evaluate(model, current_step)
File "/home/FastSpeech2/evaluate.py", line 115, in evaluate
d_l = sum(d_l) / len(d_l)
ZeroDivisionError: division by zero

Does it able to learn certain voice style?

Pretrained model

Can you offer the Pretrained model on those LibriTTS dataset ?
Thx

ValueError: num_samples should be a positive integer value, but got num_samples=0

File "train.py", line 109, in __init_dataset
train_loader = DataLoader(
File "/home/ssn/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 344, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/home/ssn/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

ga642381 / fastspeech2 Goto Github PK

fastspeech2's People

Contributors

Stargazers

Watchers

Forkers

fastspeech2's Issues

Training for Korean Languages

Invalid tensor shape

Division by Zero

Does it able to learn certain voice style?

Pretrained model

ValueError: num_samples should be a positive integer value, but got num_samples=0

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent