Comments (8)
@TakoYuxin Can you please share how did you "convert the mel spectrogram predicted by FastSpeech into the same scale used by WaveGlow"?
Here is what I added after FastSpeech predicted mel_postnet but this didn't work.
waveglow_npy = audio._db_to_amp(audio._denormalize(mel_postnet) + hp.ref_level_db)
waveglow_npy = torch.from_numpy(waveglow_npy)
waveglow_npy = torch.log(torch.clamp(waveglow_npy, min=1e-5) * 1)
torch.save(waveglow_npy, os.path.join(mode+ '.pt'))
you can find FastSpeech audio processing functions in audio.py and WaveGlow audio processing functions in NVIDIA/tacotron2/layers.py, audio_processing.py
from fastspeech.
I tried TTS with WaveGlow as follow, but i've got noise result
Could you explain the reason for me?
def synthesis_waveglow(text_seq, model, waveglow, alpha=1.0, mode=""):
denoiser = Denoiser(waveglow)
text = text_to_sequence(text_seq, hp.text_cleaners)
text = text + [0]
text = np.stack([np.array(text)])
text = torch.from_numpy(text).long().to(device)pos = torch.stack([torch.Tensor([i+1 for i in range(text.size(1))])]) pos = pos.long().to(device) model.eval() with torch.no_grad(): _, mel_postnet = model(text, pos, alpha=alpha) with torch.no_grad(): #wav = waveglow.infer(mel_postnet, sigma=0.666) wav = waveglow.infer(torch.transpose(mel_postnet,1,2).type(torch.cuda.HalfTensor), sigma=0.666) print("Wav Have Been Synthesized.") if not os.path.exists("results"): os.mkdir("results") wav_denoised = denoiser(wav, strength=0.01)[:, 0] #audio.save_wav(wav[0].data.cpu().numpy(), os.path.join( # "results", text_seq + mode + ".wav")) audio.save_wav(wav_denoised[0].cpu().numpy(), os.path.join( "results", text_seq + mode + ".wav"))
Thank you
I also got noisy results. I think the reason is WaveGlow uses a slightly different audio processing method so these two models are trained on different scaled mel spectrograms, thus not compatible. I tried to convert the mel spectrogram predicted by FastSpeech into the same scale used by WaveGlow but failed...... If you happened to get good results, please let me know
from fastspeech.
@TakoYuxin Can you please share how did you "convert the mel spectrogram predicted by FastSpeech into the same scale used by WaveGlow"?
from fastspeech.
Thank you @TakoYuxin .
I've got some result but the quality of voice is not good.
My result is in epoch 1000.
Should I do more the training of FastSpeech model?
Could you explain for me?
from fastspeech.
I haven't got any good results either >_< I trained the model for 198k steps but the generated voice was so not clear that I could barely understand what it was saying and a few words were skipped. I don't really know exactly how to fix this but continuing to train for more epochs sounds like a plan lol. We should probably wait for the author's answer.
from fastspeech.
Anyone got good result with waveglow? I tried Tako's denormalize and got some audible result but still can't get anything better. Not even close to Grinffin Lim
from fastspeech.
The newest repo have audio example synthesized by waveglow in result
.
from fastspeech.
I am also having this issue - I trained to 164,000 steps but the wav file was just silent background noise.
I also tried running synthesis.py on earlier training steps (2000, 9000, 130,000) and the only one that sounded remotely like speech was 2000. By 9000 steps it was just an empty silent wav file.
I saw that @xcmyz said that batch size needed to be 32 or more, but if I set it to 32 I get an out of memory error despite running on a good GPU. I reduced to 16 batch size and it runs without error.
I am not sure why the audio wav files are basically silent noise.
from fastspeech.
Related Issues (20)
- RuntimeError: shape '[1, 1, 155520]' is invalid for input of size 311040 [custom data training] HOT 3
- training stops in few seconds and no checkpoint file created HOT 3
- error in new commit HOT 9
- How to get alignment? HOT 8
- Preprocess.py got stuck: Tried to debug
- Have anyone tried using LSTM to replace FFT block?
- What is the difference between postnet and CBHG? HOT 3
- 请问训练多久得到的pretrain model呢?然后,请问使用了多少GPU呢? HOT 1
- Resume training from checkpoint result in NaN? HOT 1
- How to extract alignment from tacotron2? HOT 6
- long int 转换成float erro
- denoiser HOT 1
- some question about squeezewave denoiser
- onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from fastwave.onnx failed:Type Error: Type parameter (T) bound to different types (tensor(double) and tensor(float) in node ().
- Expected object of backend CUDA but got backend CPU for argument #3 'index' HOT 1
- duration loss calulated in log domain or linear domain
- wav in chinese HOT 1
- Error when training new model for another language
- BUG:OSError: sndfile library not found HOT 1
- RuntimeError: stack expects each tensor to be equal size, but got [40, 240] at entry 0 and [78, 202] at entry 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastspeech.