Hi, I want to try fastspeech on different dataset. therefore, can yo

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to extract alignment from tacotron2? about fastspeech HOT 6 OPEN

CanKorkut commented on July 1, 2024

How to extract alignment from tacotron2?

from fastspeech.

Comments (6)

prorev commented on July 1, 2024

Why are alignments used for after all? Tacotron-2 paper will not mention alignments.

from fastspeech.

prorev commented on July 1, 2024

I found this in FastSpeech2 paper:

The training of FastSpeech relies on an autoregressive teacher model to provide 1) the duration of each phoneme to train a duration predictor, and 2) the generated mel-spectrograms for knowledge distillation. While these designs in FastSpeech ease the learning of the one-to-many mapping problem in TTS, they also bring several disadvantages: 1) the two-stage teacher-student distillation pipeline is complicated; 2) the duration extracted from the attention map of the teacher model is not accurate enough, and the target mel-spectrograms distilled from the teacher model suffer from information loss due to data simplification, both of which limit the voice quality and prosody.

This speaks clearly that you need another trained model to work with FastSpeech custom dataset, which is not so smart.

Or, the alignments are such a big problem, because based on those alignments the the training is possible. No alignments, no training. This paper "FastSpeech" is worth inspecting to understand how is done (in principle), but for some out of the box training possible is not the best choice.

You may find the alignments.py file was present in this project before but was removed. Commit id: e11b60d, but no commit message has been set to explain.

from fastspeech.

CanKorkut commented on July 1, 2024

Thank you, i found alignments.py previous commit and tried it. In result, synthesis quality not bad, but when i inference long sentence long than five or six words, there was stuttering and missing letters problem in synthesis. Now i try FastSpeech2. Alignments are really such a big problem.

from fastspeech.

cuongnguyengit commented on July 1, 2024

Hi, i have the same question. I also try to train my language with FastSpeech2, but alignments are really difficult.
My tacotron2 model is trained very good with my dataset. Therefore, its alignment will be good, but synthesis is quite bad.
They seem to be able to understand and are mixed. So, my question is whether durations generated by Tacotron matchs mels, energies, pitches generated by librosa or TacotronSTFT module. This problem is so complexity to explain how to FastSpeech2 made good quality audios. Thanks

from fastspeech.

CanKorkut commented on July 1, 2024

I researched this problem and saw something about reduction factor. I didn't clearly understand architecture but we can say tacotron can easly learn with large reduction factor, however there is no reduction factor nvidia tacotron2 implementation. Maybe nvidia tacotron good for synthesis but it bad at for extract alignment. I'm not sure, i will research and editing.

from fastspeech.

khainh3101 commented on July 1, 2024

@CanKorkut Hi, i'm using that alignment.py (Commit id: e11b60d) to extract alignments files but the result show different dimension with LJSpeech alignment files (in this source code Fast Speech already had). Can you show me your code to extract exactly alignment files to train another language ? thank you

from fastspeech.

How to extract alignment from tacotron2? about fastspeech HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent