ivanvovk / wavegrad Goto Github PK
View Code? Open in Web Editor NEWImplementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.
License: BSD 3-Clause "New" or "Revised" License
Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.
License: BSD 3-Clause "New" or "Revised" License
Hello, Thanks for your work. it is very helpful to me.
I have a question about loss exploding
I tried to train ljspeech data which you used with default setting (lr = 1e-3) and i had NaN loss issue
So i reduced lr to 5e-4 then there is no NaN loss isuue but loss exploding (normal loss : < 0.07, exploded loss : > 732M)
I know there are codes for prevent loss exploding like lr schedule, clipping however it is not working
Can you help me?
Hi! My name is Junhyeok Lee and I appreciate your works!
Maybe I found a slight mistake in your config file.
Lines 6 to 12 in 721c37c
Hi,
when training with f_max = 10000 there is a static noise introduced in the model. We are using our custom dataset. Is there any way to improve upon this?
Thanks.
Hi, on the notebook it isn't clear whether you can load your own spectrogram directly (for TTS inference) or not. Is this possible and if so, have you tried it?
I'm also adding WaveGrad to my implementation. I have a question for you.
Were your generated_samples
generated using a model trained with AMP?
I think this repository is very nice. Good job!
I'm trying to run training on a nvidia xavier agx device running nvidia docker container based on these https://ngc.nvidia.com/catalog/containers/nvidia:l4t-pytorch instructions.
But i receive following error:
Initializing logger...
Initializing model...
Number of parameters: 15810401
Initializing optimizer, scheduler and losses...
Initializing data loaders...
Traceback (most recent call last):
File "train.py", line 185, in
run(config, args)
File "train.py", line 72, in run
logger.log_specs(0, specs)
File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/logger.py", line 53, in log_specs
self.add_image(key, plot_tensor_to_numpy(image), iteration, dataformats='HWC')
File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/utils.py", line 66, in plot_tensor_to_numpy
im = ax.imshow(tensor, aspect="auto", origin="bottom", interpolation='none', cmap='hot')
File "/usr/local/lib/python3.6/dist-packages/matplotlib/init.py", line 1438, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_axes.py", line 5521, in imshow
resample=resample, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 905, in init
**kwargs
File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 246, in init
cbook._check_in_list(["upper", "lower"], origin=origin)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/cbook/init.py", line 2257, in _check_in_list
.format(v, k, ', '.join(map(repr, values))))
ValueError: 'bottom' is not a valid value for origin; supported values are 'upper', 'lower'
python3 -V: Python 3.6.9
pip3 -V: 20.2.3
Running pip3 list shows following installed packages:
absl-py (0.10.0)
appdirs (1.4.4)
cachetools (4.1.1)
certifi (2020.6.20)
chardet (3.0.4)
cycler (0.10.0)
Cython (0.29.20)
decorator (4.4.2)
future (0.18.2)
google-auth (1.22.1)
google-auth-oauthlib (0.4.1)
grpcio (1.32.0)
idna (2.10)
importlib-metadata (2.0.0)
kiwisolver (1.2.0)
Mako (1.1.3)
Markdown (3.3)
MarkupSafe (1.1.1)
matplotlib (3.3.1)
numpy (1.19.0)
oauthlib (3.1.0)
Pillow (7.2.0)
pip (9.0.1)
protobuf (3.13.0)
pyasn1 (0.4.8)
pyasn1-modules (0.2.8)
pycuda (2019.1.2)
pyparsing (2.4.7)
python-dateutil (2.8.1)
pytools (2020.3.1)
requests (2.24.0)
requests-oauthlib (1.3.0)
rsa (4.6)
setuptools (50.3.0)
six (1.15.0)
tensorboard (2.3.0)
tensorboard-plugin-wit (1.7.0)
torch (1.6.0)
torchaudio (0.6.0a0+d6f81d1)
torchvision (0.7.0a0+6631b74)
tqdm (4.50.2)
urllib3 (1.25.10)
Werkzeug (1.0.1)
wheel (0.35.1)
zipp (3.3.0)
I tried matplotlib (3.3.1) and 3.3.2 both with same result.
Any ideas what i miss?
Thank you.
I trained the model with my own data, but it so slow! It took me nearly 5 days to train 30k steps. BTW, I used 4 Tesla P40. Do you know how to solve it?
The checkpoint is loaded in inference.ipynb
but not in inference.py
, I guess it was forgotten.
WaveGrad/model/linear_modulation.py
Lines 21 to 24 in f59d4bd
At line 22, the exponents are calculated as: exponents = exponents ** 1e-4
instead of exponents = 1e-4 ** exponents
from the original transformer paper.
This makes the values very closed to each other at different dimensions, I plot an example using exponents = exponents ** 1e-4
, with noise level linspace(0, 1, 250)
and n_channels=512
.
After changing to exponents = 1e-4 ** exponents
, the positional encoding looks fine:
The strange thing is that even with the current position coding, the model seems to be trained well. I tried to train on LibriTTS, and the generated speech sounds okay to me. I'll try to switch to the latter and see whether there will be an improvement.
Huge thanks for implement! I have a question regarding the training time in the single GPU you mentioned.
I did the same training procedure in batch size 96 on the RTX 2080Ti GPU as you did, but it took a lot longer than the training time you mentioned (12hrs to ~10k training iterations).
I have no idea the cause of this issue at all. Could you explain your training environment precisely?
Please refer to my working environment at the bottom.
Docker environment with
CUDA 10.1
cuDNN v7
ubuntu 18.04
python 3.8
Hi thanks for your work and all the info in you README!
Did you measure the RTF on a CPU?
I'm fortunate enough to have a machine with an NVIDIA RTX 3090 GPU. However, the GPU-enabled binary versions of PyTorch 1.6.0 available from the PyTorch project won't run on the 3090, and probably won't run on any 3000 series GPUs - the necessary CUDA binaries don't seem to be compiled in.
PyTorch 1.7.0 does run on my 3090, so I've built a virtual enviroment with that and torchaudio 0.7.0. I started up training on the "LJ" dataset to see if it worked and it appeared to be functioning; it used about 11.5 GB of GPU RAM and about 45% of GPU processors. Do you anticipate any other problems with PyTorch 1.7.0, or should I go ahead with training on my own dataset?
Lines 47 to 49 in 6be2f4c
Since np.random.randint
has the right exclusive boundary, np.random.randint(0, 0) raises an error. Consider to switch to random.randint
or change to audio.shape[-1] > self.segment_length
?
I am not fully understand the Noise schedules .
Is the model in schedules/pretrained suitable for other dataset , 22k and 16k?
I tried to train my own dataset whose sample-rate is 16000, and use the pretrained schedules model(16, 25 and 100 iters), the predict results sound good, especially using 100 iters.
But I don't understand, why the schedules model can also used for 16k sample-rate?
Or though the synthsized wavs are good, it is not the correct way?
Could you provide the evaluation code ? Thank you.
Hello, I have two somewhat identical datasets with similar samples. I have trained the WaveGrad with 22k sample rate audios and it is quite good. However, the synthesis quality for 44k sample rate data is not as good. Would really appreciate any suggestions, especially in terms of changing model parameters. The only changes in parameters are as follows:
sample rate: 44k
n_fft: 2048
window_size: 2048
ho_length: 512
I do not understand this funcition in the script "diffusion_process.py". Could someone help me?
Frankly speaking,I do not fully understand the process of diffusion and the inverse process.
But for other functions, I can find corresponding equation in the paper "wavegrad" or "Denoising Diffusion Probabilistic Models". But for this function, I cannot.
And the inverse process that generate a wave seems much differ and complex than the process in
https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/inference.py. lines 56~64.
Why is there such a difference?
What train/test values are reasonable after 1 or 2 days of training?
Thank you for releasing the great project.
I'd like to train on TPU(v3) on Google Cloud, how can I change the code to make it work?
Hi,
Could you share your 6 iter best noise schedule for LJSpeech dataset?
what's your suggestion to reduce the background noise with lower iteration (e.g., 100 iterations) during inference?
the model is actually okay for inference with 1000 iterations.
As I understand it, this tts-algorithm works with your audio files without assigned text.
Hi, awesome contribution for TTS community :) I am wondering, did you manage to train a model that would have higher audio quality than the pretrained checkpoint provided with this repo? The audio samples seem to have lower quality than the ones presented in the paper. Any ideas what might be missing?
I am now training the model from scratch and the audio samples are very noisy now (approx 12 hours on 2 GPUs, batch size 128). It is getting better, but I am curious in some upper bound on the quality with the provided source code.
when I configure factors for [4,4,4,2,2]
to match with hop_size=256
but I could not find segment_length
Would you have idea for exactly matching segment_length
to match dilation, padding for the configuration?
I got error below with segment_length=7200
Traceback (most recent call last):
File "train.py", line 185, in <module>
run(config, args)
File "train.py", line 92, in run
loss = model.compute_loss(mels, batch)
File "/content/WaveGrad/model/diffusion_process.py", line 176, in compute_loss
eps_recon = self.nn(mels, y_noisy, continuous_sqrt_alpha_cumprod)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/WaveGrad/model/nn.py", line 119, in forward
ublock_outputs = ublock(x=ublock_outputs, scale=scale, shift=shift)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/WaveGrad/model/upsampling.py", line 82, in forward
outputs = self.first_block_main_branch['modulation'](outputs, scale, shift)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/WaveGrad/model/upsampling.py", line 30, in forward
outputs = self.featurewise_affine(x, scale, shift)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/WaveGrad/model/linear_modulation.py", line 98, in forward
outputs = scale * x + shift
RuntimeError: The size of tensor a (450) must match the size of tensor b (448) at non-singleton dimension 2
full configuration
{
"model_config": {
"factors": [4,4,4,2,2],
"upsampling_preconv_out_channels": 768,
"upsampling_out_channels": [512, 512, 256, 128, 128],
"upsampling_dilations": [
[1, 2, 1, 2],
[1, 2, 1, 2],
[1, 2, 4, 8],
[1, 2, 4, 8],
[1, 2, 4, 8]
],
"downsampling_preconv_out_channels": 32,
"downsampling_out_channels": [128, 128, 256, 512],
"downsampling_dilations": [
[1, 2, 4], [1, 2, 4], [1, 2, 4], [1, 2, 4]
]
},
"data_config": {
"sample_rate": 22050,
"n_fft": 1024,
"win_length": 1024,
"hop_length": 256,
"f_min": 0,
"f_max": 8000,
"n_mels": 80
},
Hey, the quality of generated results is amazing! 🤩 But, unfortunately, I have a little problem with inferencing the model.
I downloaded the checkpoint file you provide named wavegrad_lj_pretrained.pt
(which I renamed to checkpoint_wavegrad_lj_pretrained.pt
) and moved to the following location in WaveGrad
directory.
├── logs
│ └── default
│ └── checkpoint_wavegrad_lj_pretrained.pt
This is because of the regex requirement existing in the following code
Line 19 in 714cb82
which is referred from the fifth cell of inference.ipynb as following:
# model.load_state_dict(torch.load('../logs/default/checkpoint_180630.pt)['model'], strict=False)
model, _, _ = utils.load_latest_checkpoint(config.training_config.logdir, model)
When I run this cell I get the following error. 🙁
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-2e2546d6cd38> in <module>
1 # model.load_state_dict(torch.load('../logs/default/checkpoint_180630.pt)['model'], strict=False)
----> 2 model, _, _ = utils.load_latest_checkpoint(config.training_config.logdir, model)
~/Desktop/WaveGrad/utils.py in load_latest_checkpoint(logdir, model, optimizer)
25
26 def load_latest_checkpoint(logdir, model, optimizer=None):
---> 27 latest_model_path = latest_checkpoint_path(logdir, regex="checkpoint_*.pt")
28 print(f'Latest checkpoint: {latest_model_path}')
29 d = torch.load(
~/Desktop/WaveGrad/utils.py in latest_checkpoint_path(dir_path, regex)
19 def latest_checkpoint_path(dir_path, regex="checkpoint_*.pt"):
20 f_list = glob.glob(os.path.join(dir_path, regex))
---> 21 f_list.sort(key=lambda f: int("".join(filter(str.isdigit, f))))
22 x = f_list[-1]
23 return x
~/Desktop/WaveGrad/utils.py in <lambda>(f)
19 def latest_checkpoint_path(dir_path, regex="checkpoint_*.pt"):
20 f_list = glob.glob(os.path.join(dir_path, regex))
---> 21 f_list.sort(key=lambda f: int("".join(filter(str.isdigit, f))))
22 x = f_list[-1]
23 return x
ValueError: invalid literal for int() with base 10: ''
Can you please tell me how to fix this issue?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.