Hello, The work you are doing and the work the people at PiperTTS ar

Making sense of all the PiperTTS models about dsnote HOT 3 OPEN

Kentoseth commented on June 12, 2024

Making sense of all the PiperTTS models

from dsnote.

Comments (3)

mkiol commented on June 12, 2024 1

Thanks for the question.

In general it is hard. The problem here is that there are too many English voices! I didn't want to decide which one was better and which one was worse, so I included them all. Maybe it wasn't good strategy, maybe I should make some selection but I didn't. Everything what is available in Piper is also available in Speech Note.

Piper LibriTTS and Piper LibriTTS-R are multi speaker models, so one checkpoint file can generate many totally different voices. In Speech Note every "voice" is presented as separated model but under the hood all LibriTTS/LibriTTS voices use the same checkpoint file. The file is downloaded only once, so no worries.

The names "P7910" or "3615" comes from original name of speaker in the training data. My initial idea was to add at least a male/female indication to the name, but I gave up because there are just too many of them. That's why you see these long and meaningless names :/

In this particular example, LibriTTS-R is a restored version of LibriTTS corpus. Voices are similar but LibriTTS-R should sound a bit better.

from dsnote.

mkiol commented on June 12, 2024 1

My suggestion to fix this issue of many voices using one model is to enable downloading of the model as only one option. And then within the main interface, a person can choose the many different voices that are available. Similar to how the CoQui X-TTS model work

That's a very good idea! 👍🏿

If you choose to do that, then I can go through some of the different voice models and add some metadata to them to indicate whether it is male or female.

That would be super great :) I will let you know when it is ready. Most likely I won't be able to implement this in an upcoming release, but later.

from dsnote.

Kentoseth commented on June 12, 2024

Piper LibriTTS and Piper LibriTTS-R are multi speaker models, so one checkpoint file can generate many totally different voices. In Speech Note every "voice" is presented as separated model but under the hood all LibriTTS/LibriTTS voices use the same checkpoint file. The file is downloaded only once, so no worries.

My suggestion to fix this issue of many voices using one model is to enable downloading of the model as only one option. And then within the main interface, a person can choose the many different voices that are available. Similar to how the CoQui X-TTS model works, where a person can choose different voice options from the main interface.

If you choose to do that, then I can go through some of the different voice models and add some metadata to them to indicate whether it is male or female.

I can submit this file as a text file here in the GitHub issues. Or you can indicate your preferred format and I can provide that for you. So that adding it to the application will be as easy as adding the file and linking to the file's data.

from dsnote.

Making sense of all the PiperTTS models about dsnote HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent