Hi, first of all thank you very much for your contribution! I'm

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I tried following pip3 instal tts==0.5.0 and run <cod

Improve Audio Generation Speed about thorsten-voice HOT 8 CLOSED

thorstenmueller commented on May 27, 2024

Improve Audio Generation Speed

from thorsten-voice.

Comments (8)

synesthesiam commented on May 27, 2024 2

@r4nc0r Keep watch for the release of Mimic 3 (samples), which should be this month. You should get a 8-10x speedup with it; I typically get an RTF of 0.03, but I'm also on a Ryzen 5950X.

from thorsten-voice.

thorstenMueller commented on May 27, 2024

@domcross and i are working on new/better models using HifiGAN vocoder. Samples available on Thorsten-Voice project website. These models might be faster than the current one available. But maybe you should check work by @synesthesiam with larynx. My voice is available there too and it's really fast.

Did you test with "WaveGrad" or "Fullband-MelGAN" vocoder (Fullband-MelGAN is way faster).

from thorsten-voice.

r4nc0r commented on May 27, 2024

Thanks for your quick reply and for pointing me in the right direction!

I just used your model with the parameters specified in the readme: tts-server --model_name tts_models/de/thorsten/tacotron2-DCA

from thorsten-voice.

thorstenMueller commented on May 27, 2024

I tried following pip3 instal tts==0.5.0 and run tts-server --model_name tts_models/de/thorsten/tacotron2-DCA . Got an RTF around 0,6 - 1 on my notebook cpu which i think isn't too bad. What RTF do you have?

Just if you're interested in:
https://www.thorsten-voice.de/2022/03/20/vergleich-thorsten-aktuell-mit-dem-neuen-modell/

from thorsten-voice.

r4nc0r commented on May 27, 2024

I just did that with the addition of --show_details SHOW_DETAILS and my RTF is about 0,6:

 > Processing time: 3.101564407348633
 > Real-time factor: 0.5756691513639508

I use a 12 Core Ryzen 3000 Processor.
But the Processing time of 3s is extremly high given my use case of generating just in time responses for my voice Assistant.
I build a workaround wich caches most wav files, but if I generate Responses with variable in the text this doenst work.

Also i would love to use your new model, is there a way to use it?

from thorsten-voice.

thorstenMueller commented on May 27, 2024

The new model is not released yet. I'll keep community updated on release date on Twitter or my Youtube channel.
I'd recommend you taking a look larynx as it's designed for small compute power (like a raspberry) and my german voice is available too.

from thorsten-voice.

thorstenMueller commented on May 27, 2024

Also i would love to use your new model, is there a way to use it?

Hi @r4nc0r ,
you can download model and config on @coqui-ai prerelease 0.7.0 here: https://github.com/coqui-ai/TTS/releases
Easy pip based installation will follow when final 0.7.0 will be released.

Keep watch for the release of Mimic 3

You can play around with beta of Mimic 3 with my german voice (and some more german voices) as mentioned by @synesthesiam: https://mycroft.ai/blog/mimic-3-preview/

from thorsten-voice.

thorstenMueller commented on May 27, 2024

As Mimic 3 is already released you can easily use this. You can watch this video on how to set it up and use it and/or check official doc.

If you want to use Coqui TTS (little bit slower, but better quality) you can do this by:

pip install tts==0.7.1
tts-server --model_name tts_models/de/thorsten/vits

I close this issue for now, but feel free top reopen if you have further questions.

from thorsten-voice.

Improve Audio Generation Speed about thorsten-voice HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent