Comments (6)
Why are alignments used for after all? Tacotron-2 paper will not mention alignments.
from fastspeech.
I found this in FastSpeech2 paper:
The training of FastSpeech relies on an autoregressive teacher model to provide 1) the duration of each phoneme to train a duration predictor, and 2) the generated mel-spectrograms for knowledge distillation. While these designs in FastSpeech ease the learning of the one-to-many mapping problem in TTS, they also bring several disadvantages: 1) the two-stage teacher-student distillation pipeline is complicated; 2) the duration extracted from the attention map of the teacher model is not accurate enough, and the target mel-spectrograms distilled from the teacher model suffer from information loss due to data simplification, both of which limit the voice quality and prosody.
This speaks clearly that you need another trained model to work with FastSpeech custom dataset, which is not so smart.
Or, the alignments are such a big problem, because based on those alignments the the training is possible. No alignments, no training. This paper "FastSpeech" is worth inspecting to understand how is done (in principle), but for some out of the box training possible is not the best choice.
You may find the alignments.py
file was present in this project before but was removed. Commit id: e11b60d, but no commit message has been set to explain.
from fastspeech.
Thank you, i found alignments.py previous commit and tried it. In result, synthesis quality not bad, but when i inference long sentence long than five or six words, there was stuttering and missing letters problem in synthesis. Now i try FastSpeech2. Alignments are really such a big problem.
from fastspeech.
Hi, i have the same question. I also try to train my language with FastSpeech2, but alignments are really difficult.
My tacotron2 model is trained very good with my dataset. Therefore, its alignment will be good, but synthesis is quite bad.
They seem to be able to understand and are mixed. So, my question is whether durations generated by Tacotron matchs mels, energies, pitches generated by librosa or TacotronSTFT module. This problem is so complexity to explain how to FastSpeech2 made good quality audios. Thanks
from fastspeech.
I researched this problem and saw something about reduction factor. I didn't clearly understand architecture but we can say tacotron can easly learn with large reduction factor, however there is no reduction factor nvidia tacotron2 implementation. Maybe nvidia tacotron good for synthesis but it bad at for extract alignment. I'm not sure, i will research and editing.
from fastspeech.
@CanKorkut Hi, i'm using that alignment.py (Commit id: e11b60d) to extract alignments files but the result show different dimension with LJSpeech alignment files (in this source code Fast Speech already had). Can you show me your code to extract exactly alignment files to train another language ? thank you
from fastspeech.
Related Issues (20)
- Is this still maintained?
- training stops in few seconds and no checkpoint file created HOT 3
- error in new commit HOT 9
- How to get alignment? HOT 8
- Preprocess.py got stuck: Tried to debug
- Have anyone tried using LSTM to replace FFT block?
- What is the difference between postnet and CBHG? HOT 3
- 请问训练多久得到的pretrain model呢?然后,请问使用了多少GPU呢? HOT 1
- Resume training from checkpoint result in NaN? HOT 1
- long int 转换成float erro
- denoiser HOT 1
- some question about squeezewave denoiser
- onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from fastwave.onnx failed:Type Error: Type parameter (T) bound to different types (tensor(double) and tensor(float) in node ().
- Expected object of backend CUDA but got backend CPU for argument #3 'index' HOT 1
- duration loss calulated in log domain or linear domain
- wav in chinese HOT 1
- Error when training new model for another language
- BUG:OSError: sndfile library not found HOT 1
- RuntimeError: stack expects each tensor to be equal size, but got [40, 240] at entry 0 and [78, 202] at entry 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastspeech.