Hi again, I was wondering if there were any documentation regarding the rhythmizers, a

Rhythmizers for other languages about diffsinger HOT 3 CLOSED

openvpi commented on July 30, 2024

Rhythmizers for other languages

from diffsinger.

Comments (3)

yqzhishen commented on July 30, 2024

Rhythmizers are actually temporary solution of phoneme duration prediction for MIDI-less models. A rhythmizer contains the FastSpeech2 encoder module and the DurationPredictor module from MIDI-A mode.

Models of MIDI-A mode can predict phoneme durations well and generate nice spectrograms, but their datasets are hard to label (you need MIDI sequence and slurs), and they have poor ability to predict the pitch, although they do have PitchPredictor. That is why we are deprecating this mode in this forked repository.

To get a rhythmizer, you need to first choose or design a phoneme dictionary. Then you should label your dataset in the opencpop segments format. Please note that the MIDI duration transcriptions of opencpop is in consonant-vowel format, and you need to label your dataset in vowel-consonant format, which is to say, the beginning of note should be aligned with the beginning of vowels instead of consonants (see issue). Here is an example of the labels that we converted from the original opencpop transcriptions: transcriptions-strict-revised2.txt. The last step is to preprocess your dataset and train a MIDI-A model with this config. After that, you can export the part for duration prediction with this script.

For CVVC languages like English and Polish, the answer is no. Because we currently can only deal with two-phase (CV) phoneme systems like Chinese and Japanese. MIDI-A, MIDI-B, duration predictors, data labels and all other word-phoneme related stuff will be re-designed in the future, and for that time you can expect a full support to all universal languages. No rhythmizers will be needed then - everyone can train their own variance adaptors (containing duration and pitch models and much more) via standard pipelines as easy as that of preparing and training MIDI-less acoustic models for now.

By the way, members of our team are already preparing for a Japanese rhythmizer. When they finish the dictionary, rhythmizer and the MFA model, we will formally support Japanese MIDI-less mode preparation in our pipeline. If you really find difficulties preparing by your own, it is fine to just wait for our progress.

from diffsinger.

Recommend Projects

Rhythmizers for other languages about diffsinger HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent