Giter Club home page Giter Club logo

opentts's Introduction

Open Text to Speech Server

Unifies access to multiple open source text to speech systems and voices for many languages.

Supports a subset of SSML that can use multiple voices, text to speech systems, and languages!

<speak>
  The 1st thing to remember is that 27 languages are supported in Open TTS as of 10/13/2021 at 3pm.

  <voice name="glow-speak:en-us_mary_ann">
    <s>
      The current voice can be changed, even to a different text to speech system!
    </s>
  </voice>

  <voice name="coqui-tts:en_vctk#p228">
    <s>Breaks are possible</s>
    <break time="0.5s" />
    <s>between sentences.</s>
  </voice>

  <s lang="en">
    One language is never enough
  </s>
  <s lang="de">
   Eine Sprache ist niemals genug
  </s>
  <s lang="ja">
    言語を一つは決して足りない
  </s>
  <s lang="sw">
    Lugha moja haitoshi
  </s>
</speak>

See the full SSML example (use synesthesiam/opentts:all Docker image with all voices included)

Listen to voice samples

Web interface screenshot

Voices

  • Larynx
    • English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
    • Patched embedded version of Larynx 1.0
  • Glow-Speak
    • English (2), German (1), French (1), Spanish (1), Dutch (1), Russian (1), Swedish (1), Italian (1), Swahili (1), Greek (1), Finnish (1), Hungarian (1), Korean (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
  • Coqui-TTS
    • English (110), Japanese (1), Chinese (1)
    • Patched embedded version of Coqui-TTS 0.3.1
  • nanoTTS
    • English (2), German (1), French (1), Italian (1), Spanish (1)
  • MaryTTS
    • English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
    • Includes embedded MaryTTS
  • flite
    • English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3)
  • Festival
    • English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2)
    • Spanish/Catalan/Finnish use ISO-8859-15 encoding
    • Czech uses ISO-8859-2 encoding
    • Russian is transliterated from Cyrillic to Latin script automatically
    • Arabic uses UTF-8 and is diacritized with mishkal
  • eSpeak
    • Supports huge number of languages/locales, but sounds robotic

Running

Basic OpenTTS server:

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE>

where <LANGUAGE> is one of:

  • all (All languages)
  • ar (Arabic)
  • bn (Bengali)
  • ca (Catalan)
  • cs (Czech)
  • de (German)
  • el (Greek)
  • en (English)
  • es (Spanish)
  • fi (Finnish)
  • fr (French)
  • gu (Gujarati)
  • hi (Hindi)
  • hu (Hungarian)
  • it (Italian)
  • ja (Japanese)
  • kn (Kannada)
  • ko (Korean)
  • mr (Marathi)
  • nl (Dutch)
  • pa (Punjabi)
  • ru (Russian)
  • sv (Swedish)
  • sw (Swahili)
  • ta (Tamil)
  • te (Telugu)
  • tr (Turkish)
  • zh (Chinese)

Visit http://localhost:5500

For HTTP API test page, visit http://localhost:5500/openapi/

Exclude eSpeak (robotic voices):

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --no-espeak

WAV Cache

You can have the OpenTTS server cache WAV files with --cache:

$ docker run -it -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --cache

This will store WAV files in a temporary directory (inside the Docker container). A specific directory can also be used:

$ docker run -it -v /path/to/cache:/cache -p 5500:5500 synesthesiam/opentts:<LANGUAGE> --cache /cache

HTTP API Endpoints

See swagger.yaml

  • GET /api/tts
    • ?voice - voice in the form tts:voice (e.g., espeak:en)
    • ?text - text to speak
    • ?cache - disable WAV cache with false
    • Returns audio/wav bytes
  • GET /api/voices
    • Returns JSON object
    • Keys are voice ids in the form tts:voice
    • Values are objects with:
      • id - voice identifier for TTS system (string)
      • name - friendly name of voice (string)
      • gender - M or F (string)
      • language - 2-character language code (e.g., "en")
      • locale - lower-case locale code (e.g., "en-gb")
      • tts_name - name of text to speech system
    • Filter voices using query parameters:
      • ?tts_name - only text to speech system(s)
      • ?language - only language(s)
      • ?locale - only locale(s)
      • ?gender - only gender(s)
  • GET /api/languages
    • Returns JSON list of supported languages
    • Filter languages using query parameters:
      • ?tts_name - only text to speech system(s)

SSML

A subset of SSML is supported:

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> / <token> - word (disables automatic tokenization)
  • <voice name="..."> - set voice of inner text
    • voice - name or language of voice
      • Name format is tts:voice (e.g., "glow-speak:en-us_mary_ann") or tts:voice#speaker_id (e.g., "coqui-tts:en_vctk#p228")
      • If one of the supported languages, a preferred voice is used (override with --preferred-voice <lang> <voice>)
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending on interpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <sub alias=""> - substitute alias for inner text

MaryTTS Compatible Endpoint

Use OpenTTS as a drop-in replacement for MaryTTS.

The voice format is <TTS_SYSTEM>:<VOICE_NAME>. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here.

You may need to change the port in your docker run command to -p 59125:5500 for compatibility with existing software.

Larynx Voice Quality

On the Raspberry Pi, you may need to lower the quality of Larynx voices to get reasonable response times.

This is done by appending the quality level to the end of your voice:

tts:
  - platform: marytts
    voice:larynx:harvard;low

Available quality levels are high (the default), medium, and low.

Note that this only applies to Larynx and Glow-Speak voices.

Speaker ID

For multi-speaker models (currently just coqui-tts:en_vctk), you can append a speaker name or id to your voice:

tts:
  - platform: marytts
    voice:coqui-tts:en_vctk#p228

You can get the available speaker names from /api/voices or provide a 0-based index instead:

tts:
  - platform: marytts
    voice:coqui-tts:en_vctk#42

Default Larynx Settings

Default settings for Larynx can be provided on the command-line:

  • --larynx-quality - vocoder quality ("high", "medium", or "low", default: "high")
  • --larynx-noise-scale - voice volatility (0-1, default: 0.667)
  • --larynx-length-scale - voice speed (< 1 is faster, default: 1.0)

Building From Source

OpenTTS uses Docker buildx to build multi-platform images based on Debian bullseye.

Before building, make sure to download the voices you want to the voices directory. Each TTS system that uses external voices has a sub-directory with instructions on how to download voices.

If you only plan to build an image for your current platform, you should be able to run:

make <lang>

from the root of the cloned repository, where <lang> is one of the supported languages. If it builds successfully, you can run it with:

make <lang>-run

For example, the English image can be built and run with:

make en
make en-run

Under the hood, this does two things:

  1. Runs the configure script with --languages <lang>
  2. Runs docker buildx build with the appropriate arguments

You can manually run the configure script -- see ./configure --help for more options. This script generates the following files (used by the build process):

  • build_packages - Debian packages installed with apt-get during the build only
  • packages - Debian packages installed with apt-get for runtime
  • python_packages - Python packages installed with pip
  • .dockerignore - Files that docker will ignore during building ("!" inverts)
  • .dockerargs - Command-line arguments passed to docker buildx build

Multi-Platform images

To build an image for a different platform, you need to initialize a docker buildx builder:

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker buildx create --config /etc/docker/buildx.conf --use --name mybuilder
docker buildx use mybuilder
docker buildx inspect --bootstrap

NOTE: For some reason, you have to do these steps each time you reboot. If you see errors like "Error while loading /usr/sbin/dpkg-split: No such file or directory", run docker buildx rm mybuilder and re-run the steps above.

When you run make, specify the platform(s) you want to build for:

DOCKER_PLATFORMS='--platform linux/amd64,linux/arm64,linux/arm/v7' make <lang>

You may place pre-compiled Python wheels in the download directory. They will be used during the installation of Python packages.

opentts's People

Contributors

adelriosantiago avatar alexbarcelo avatar dependabot[bot] avatar drsensor avatar nagyrobi avatar synesthesiam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opentts's Issues

OpenBLAS Warning when using coqui-tts

Running OpenTTS in docker.

When using coqui-tts the following message appears in the console and the application becomes non responsive.

OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option

Festival breaks on special characters

I was trying the catalan festival voice and it works well except on special characters (like à, é, í, ç...). Same misbehaviour happens also in spanish. Other backends (espeak, nanotts) work correctly with those same characters.

These languages require those special characters. When they appear in a word, the TTS behaves funny and avoids that letter.

Maybe it is related to festival not supporting UTF-8? I just found this link https://www.web3.lu/character-encoding-for-festival-tts-files/ but I know nothing about OpenTTS internals or Festival internals. If that is indeed the case, maybe it is required to do some encoding changing from UTF-8 to ISO-8859-15 for the festival backend? Does that make sense?

ValueError on ARM chip

Hi Michael. Your work allows me to install TTS with ease. There is no issue with the Intel chip. Recently I installed on an ARM-based VPS and things showed up. While there is no issue with other voice ids, the following one always showed up with Voice id: coqui-tts:en_ljspeech, which IMO is the best one.

Hope you might have time to have a look.

The current one I have installed here: Voice id: http://168.138.190.231:5555/

Cheer.

Voice id: coqui-tts:en_ljspeech
ValueError: On entry to DLASCL parameter number 4 had an illegal value

MaryTTS : ValueError: invalid literal for int() with base 10: ''

image

Hey it looks like Mary TTS crash when asking to convert a big text, after the error on the screenshot above, if we try to click on the "Speak" button again, we get this error message :

ConnectionResetError: Connection lost

Here's the piece of text i took from a random article to try out the voices.


Rassurez-vous, on peut aussi faire de très jolies photos à d’autres moments de la journée. Mais vous devrez sans doute composer avec les ombres. À midi par exemple, le soleil est à son zénith ce qui laisse apparaître beaucoup d’ombre, notamment sur les sujets humains ou les animaux. En revanche, cette lumière apporte du contraste sur les photos de paysages majestueux.```

FileNotFoundError: [Errno 2] No such file or directory: '/home/opentts/app/VERSION'

Docker Image: synesthesiam/opentts:en-2.1

Docker container fails to run and produces the following error in the log:

Traceback (most recent call last):
  File "/home/opentts/app/app.py", line 50, in <module>
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/opentts/app/VERSION'

I skimmed the Dockerfile and it seems as if it isn't configured to copy the VERSION file into the /home/opentts/app directory during the build. I'm not quite savvy enough to test/fix it, but it seems to me that might be where the problem is.

generated audio timestamps

i'm trying to use the generated audio for some automation.
Is there any way to ascertain something like word/character "timestamps" from the generation process? either would work.
Obviously the tts blends, it isn't sounding one character or one word at a time, but i'd imagine it still has to organise itself somehow.

Sorry i'm not too familiar with how tts engines work, hopefully that makes sense?

Still active?

Is this repo going to be maintained? It looks like the last updates were 3 years ago.

[Question] Guidance adding (coquitts) voices

Hi!
Thank you for all the work!

I wanted to ask for guidance on adding voices to opentts.
I understand that I would have to compile opentts again and build the docker image, as explained in the README.
What I am unsure about is what are the voices files and how to add more voices.

For the case of CoquiTTS, I get that I should put the voices file in the voices\coqui-tts folder before building.
Some voices files used in opentts by default are part of the release files.

My main question is: which files from coqui do you need to make up a voice?
I am personally interested in adding a german voice to opentts.

Larynx voices sometimes erroring

Hello, today I was trying to use one of the new larynx voices and got this traceback.

Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/app/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/app/app.py", line 371, in app_say
    use_cache=use_cache,
  File "/app/app.py", line 244, in text_to_wav
    line, voice_id, vocoder=vocoder, denoiser_strength=denoiser_strength
  File "/app/tts.py", line 1377, in say
    for _, audio in text_and_audios:
  File "/app/usr/local/lib/python3.7/site-packages/larynx/__init__.py", line 82, in text_to_speech
    sentence.tokens, word_indexes=word_indexes, word_breaks=True
  File "/app/usr/local/lib/python3.7/site-packages/gruut/phonemize.py", line 207, in phonemize
    for word, word_phonemes in self.predict(words=words_to_guess):
  File "/app/usr/local/lib/python3.7/site-packages/gruut/phonemize.py", line 247, in predict
    words, model_path=self.g2p_model_path, **kwargs
  File "/app/usr/local/lib/python3.7/site-packages/phonetisaurus/__init__.py", line 60, in predict
    phonetisaurus_cmd, env=env, universal_newlines=True
  File "/app/usr/local/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/app/usr/local/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['phonetisaurus-apply', '--model', '/app/usr/local/lib/python3.7/site-packages/gruut/data/en-us/g2p.fst', '--word_list', '/tmp/tmpyuws4fry.txt', '--nbest', '1']' returned non-zero exit status 127.
Exception ignored in: <function Wave_write.__del__ at 0x7f81e4ad1e60>
Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/wave.py", line 327, in __del__
    self.close()
  File "/app/usr/local/lib/python3.7/wave.py", line 445, in close
    self._ensure_header_written(0)
  File "/app/usr/local/lib/python3.7/wave.py", line 463, in _ensure_header_written
    raise Error('# channels not specified')
wave.Error: # channels not specified

BTW: Thank you so much for making this! It is amazing to have so many voices easily accessible.

An execution error occurs when certain strings are included.

An execution error occurs when certain strings are included.
For example, when a string such as "<" is unborn.

Error Message in Terminal

$ docker run -it -p 5500:5500 synesthesiam/opentts:ja

Traceback (most recent call last):
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
    return await self.ensure_async(handler)(**request_.view_args)
  File "/home/opentts/app/app.py", line 718, in app_say
    wav_bytes = await text_to_wav(
  File "/home/opentts/app/app.py", line 368, in text_to_wav
    wavs = [result async for result in wavs_gen]
  File "/home/opentts/app/app.py", line 368, in <listcomp>
    wavs = [result async for result in wavs_gen]
  File "/home/opentts/app/app.py", line 492, in ssml_to_wavs
    for sent_index, sentence in enumerate(
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/__init__.py", line 79, in sentences
    graph, root = text_processor(text, lang=lang, ssml=ssml, **process_args)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/text_processor.py", line 439, in __call__
    return self.process(*args, **kwargs)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/text_processor.py", line 490, in process
    root_element = etree.fromstring(f"<speak>{text}</speak>")
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1347, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 10

Error Message in Devtools

(インデックス):190 
        
        
       GET http://0.0.0.0:5500/api/tts?voice=coqui-tts%3Aja_kokoro&lang=ja&vocoder=high&denoiserStrength=0.005&text=%3C%E6%A6%82%E8%A6%81&speakerId=&ssml=true&ssmlNumbers=true&ssmlDates=true&ssmlCurrency=true&cache=false 500 (Internal Server Error)
スクリーンショット 2024-01-11 0 08 58

PermissionError Operation Not Permitted

I'm running the docker command docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak but getting the following error:

Traceback (most recent call last):
  File "/app/app.py", line 46, in <module>
    _LOOP = asyncio.get_event_loop()
  File "/app/usr/local/lib/python3.7/asyncio/events.py", line 640, in get_event_loop
    self.set_event_loop(self.new_event_loop())
  File "/app/usr/local/lib/python3.7/asyncio/events.py", line 660, in new_event_loop
    return self._loop_factory()
  File "/app/usr/local/lib/python3.7/asyncio/unix_events.py", line 51, in __init__
    super().__init__(selector)
  File "/app/usr/local/lib/python3.7/asyncio/selector_events.py", line 54, in __init__
    super().__init__()
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 370, in __init__
    self._clock_resolution = time.get_clock_info('monotonic').resolution
PermissionError: [Errno 1] Operation not permitted
Exception ignored in: <function BaseEventLoop.__del__ at 0x765cf150>
Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 625, in __del__
    warnings.warn(f"unclosed event loop {self!r}", ResourceWarning,
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 389, in __repr__
    f'<{self.__class__.__name__} running={self.is_running()} '
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 1805, in get_debug
    return self._debug
AttributeError: '_UnixSelectorEventLoop' object has no attribute '_debug'

I see where it's line 46 in app.py that's causing the error, but no idea why or how. Maybe it's something else?

Error when trying to use any larynx voices

Thank you for the amazing project / container, but I am getting the following error when trying to use any larynx voice using :latest (v2.1)

ERROR:opentts:/onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int ()(int, Eigen::ThreadPoolInterface), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request_.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, *say_args)
File "/home/opentts/app/tts.py", line 1288, in say
for result in results:
File "/home/opentts/app/larynx/init.py", line 88, in text_to_speech
tts_model = get_tts_model(
File "/home/opentts/app/larynx/init.py", line 300, in get_tts_model
model = load_tts_model(voice_model_type, model_dir,)
File "/home/opentts/app/larynx/init.py", line 337, in load_tts_model
return GlowTextToSpeech(config)
File "/home/opentts/app/larynx/glow_tts.py", line 25, in init
self.onnx_model = onnxruntime.InferenceSession(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 368, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char
, int, unsigned int ()(int, Eigen::ThreadPoolInterface), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:

I did find this, not sure if it's the actual issue or not and or how to implement the property:

microsoft/onnxruntime#10113

{request} Not updated the repository !

In DockerHub you have updates the quality of voice and features but when I build it using github docker file voice quality is poor
So try to update it

Unable to build: .dockerargs: No such file or directory

Hi, I've made some source changes to give cache files human-readable names (rather than hashed file names). I'm now trying to build the project with make en so that those changes will take effect. But I'm getting the following error:

$ make en
./configure --language en
en
./configure: line 484: build_packages[@]: unbound variable
xargs < .dockerargs docker buildx build . -f Dockerfile  --output=type=docker --tag synesthesiam/opentts:en --tag synesthesiam/opentts:latest
/bin/sh: .dockerargs: No such file or directory
make: *** [en] Error 1

In the Makefile, I see the reference to .dockerargs but there doesn't seem to be a .dockerargs file in the directory.

I'm on MacOS 10.15.

How can I add custom voice ? should I add /home/opentts/app/voices/ ?

Thanks for opentts.

I'm curious about how to add more voices.

I think adding voices and make new folder and deployment somefiles generator.onnx

for example, i've ko_kss voices then there is /app/voices/glow-speak/ko_kss

well I have question about how to add custom voices.

I think I have to make generator.onnx file . but it's not easy part.

anyone help about that ?

MozillaTTS support removed in v2.1

In v2.1, support for MozillaTTS was removed, see 6de77a7 , file tts.py, lines 920ff.

As I do not see this mentioned in the CHANGELOG, I was wondering if this was intentional, and if yes, why?

OpenTTS provided a very handy way to use MozillaTTS with MariaTTS-compatible applications such as Home Assistant.

SSML doesn't work (for me?)

Hi,

I might do it wrong, but I try to use SSML to add breaks to my text. So I activated the checkbox SSML and wrapped everything in the tag. Then I added the to my transcript. It gets totally ignored. Am I missing anything here?

image

Could not initialize NNPACK! Reason: Unsupported hardware

‘’‘
INFO:opentts:Synthesizing with coqui-tts:zh_baker (3 char(s))...

Using model: tacotron2
Model's reduction rate r is set to: 2
Vocoder Model: fullband_melgan
Generator Model: fullband_melgan_generator
Discriminator Model: melgan_multiscale_discriminator
INFO:opentts:Synthesizing with coqui-tts:zh_baker (9 char(s))...
Text splitted to sentences.
Text splitted to sentences.
['门开了.']
['开锁失败请再试一次.']
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.891 seconds.
DEBUG:jieba:Loading model cost 0.891 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.
ERROR:opentts:Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request_.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, **say_args)
File "/home/opentts/app/tts.py", line 1716, in say
audio = await loop.run_in_executor(
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/opentts/app/TTS/utils/synthesizer.py", line 303, in tts
outputs = synthesis(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 271, in synthesis
outputs = run_model_torch(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 100, in run_model_torch
outputs = _func(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/opentts/app/TTS/tts/models/tacotron2.py", line 229, in inference
decoder_outputs, alignments, stop_tokens = self.decoder.inference(
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 397, in inference
decoder_output, alignment, stop_token = self.decode(memory)
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 314, in decode
self.context = self.attention(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in call_impl
result = self.forward(*input, **kwargs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 322, in forward
attention, _ = self.get_location_attention(query, processed_inputs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 252, in get_location_attention
attention_cat = torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
ERROR:opentts:Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request
.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, **say_args)
File "/home/opentts/app/tts.py", line 1716, in say
audio = await loop.run_in_executor(
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/opentts/app/TTS/utils/synthesizer.py", line 303, in tts
outputs = synthesis(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 271, in synthesis
outputs = run_model_torch(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 100, in run_model_torch
outputs = _func(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/opentts/app/TTS/tts/models/tacotron2.py", line 229, in inference
decoder_outputs, alignments, stop_tokens = self.decoder.inference(
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 397, in inference
decoder_output, alignment, stop_token = self.decode(memory)
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 314, in decode
self.context = self.attention(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 322, in forward
attention, _ = self.get_location_attention(query, processed_inputs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 252, in get_location_attention
attention_cat = torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
’‘’

Original error was: libcblas.so.3: cannot open shared object file: No such file or directory

I use
docker run -it -p 5500:5500 synesthesiam/opentts:zh

Error
ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.9 from "/home/opentts/app/.venv/bin/python3" * The NumPy version is: "1.20.3" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: libcblas.so.3: cannot open shared object file: No such file or directory

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you find it useful.

Here are the key functionalities of the project:

  1. Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).

  2. Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.

  3. Sound Quality Improvement: It improves the quality of the audio when needed.

  4. Audio Segmentation: It can segment audio files within specified second ranges.

  5. Transcription: The project transcribes the segmented audio, providing a textual representation.

  6. Gender Identification: It identifies the gender of each speaker in the audio.

  7. Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.

  8. Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.

  9. Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.

  10. Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.

  11. Syllabic and words-per-minute metrics

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

David Martin Rius

docker image with nanotts voice

I would like to use nanotts voice because I think the quality for italian language is good.
But "latest" image and "all" image don't contain this component.
Only "it-2.1" works with nanotts.
Is there a pre-compiled image that contains all voices and all languages?

Add Festival Spanish voice

Hi, Is there a manual to install other voices?, I ran opentts via docker container and after I installed .deb festival package with the new voice and test in console, but when I refeshed the opentts site and api never updated the list of spanish's voices. I think I need to update some files to configure these new lenguages to appere in the web interface.

Or here is the packages if you want to add to opentts (I almost sure this files are opensource)
https://github.com/guadalinex-archive/hispavoces

Standalone version OS X

Sure would be cool for us non-coders to be able to use something besides MaryTTS' standalone from 2016. Can't stand the Mac OS X voices, especially since they are not available for commercial usage.

Anyway to queue operations?

I am using the opentts system to dynamically generate audio files through an automation platform. When I call the API to generate multiple files at the same time, it distorts each file if the text to generate is long. For smaller files, it won't be a problem, but we will be scaling up to larger text generations and it will be a problem in the future. Is there a way to queue operations so it will only process 1 operation at a time, and won't distort the current file?

PermissionError: [Errno 1] Operation not permitted

opentts_1  | Traceback (most recent call last):
opentts_1  |   File "/home/opentts/app/app.py", line 54, in <module>
opentts_1  |     _LOOP = asyncio.get_event_loop()
opentts_1  |   File "/usr/lib/python3.9/asyncio/events.py", line 639, in get_event_loop
opentts_1  |     self.set_event_loop(self.new_event_loop())
opentts_1  |   File "/usr/lib/python3.9/asyncio/events.py", line 659, in new_event_loop
opentts_1  |     return self._loop_factory()
opentts_1  |   File "/usr/lib/python3.9/asyncio/unix_events.py", line 54, in __init__
opentts_1  |     super().__init__(selector)
opentts_1  |   File "/usr/lib/python3.9/asyncio/selector_events.py", line 55, in __init__
opentts_1  |     super().__init__()
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 397, in __init__
opentts_1  |     self._clock_resolution = time.get_clock_info('monotonic').resolution
opentts_1  | PermissionError: [Errno 1] Operation not permitted
opentts_1  | Exception ignored in: <function BaseEventLoop.__del__ at 0x76750f58>
opentts_1  | Traceback (most recent call last):
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 681, in __del__
opentts_1  |     _warn(f"unclosed event loop {self!r}", ResourceWarning, source=self)
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 419, in __repr__
opentts_1  |     f'closed={self.is_closed()} debug={self.get_debug()}>'
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 1909, in get_debug
opentts_1  |     return self._debug

Trying to run in raspberry pi 3. My docker-compose

  opentts:
    image: synesthesiam/opentts:fi
    restart: unless-stopped
    volumes:
      - /etc/localtime:/etc/localtime:ro
    ports:
        - "5500:5500"

Fast synthesis for speech length estimation

Hi, thanks for the great software,

I was wondering if there is a way to scale down the voice quality (which is very good btw) to accelerate synthesis. I often use opentts to merely estimate the length of a given spoken text, and only need the high quality version. It currently takes quite some time to synthesize a 10 minutes text. Any ideas ?

Cheers

Python Basics Example/Demo

Hi Team,
I ended up here from browsing HackerNews where many people were looking for open-source TTS software packages: https://news.ycombinator.com/item?id=34211457
I started having a go with OpenTTS but was significantly slowed down since I could not quickly find a nice basic python implementation showing exactly how to get it up and running (i.e. in python read aloud "hello world" in one of the many voices). Is there any possibility of such a thing being put on the repo for people to build upon, rather than the html interface focus at present?

mozilla-tts: how to build

I am currently trying to reproduce your docker container,
but versions sounds a lot metallic compared to yours.
I copied out the model but it does not sound as good.
Can you reproduce what git commit you used?

How to install as an engine on Windows 10?

Hello. I want a third-party application to use a voice other than the default available on Windows 10.
I'm googling this stuff but can't figure out a way to install anything. Only the frameworks and docker containers. But I need to install it into Windows so the application would give me a choise to use it.

docker container

hi
I am a pretty basic user, I installed docker on my Mac, pulled the docker image and running it.

I am getting following error , can anyone help please
image

Integrating Opentts into a android tts engine.

Hello!
I thought i'd attempt to raise this issue here, as it seems fit. Be aware im quite new to working with or understanding source code.
I have been trying to integrate the opentts api into a open source android system-wide tts (tts-server-android). It was going well except there seems to be a conflict with the opentts api when trying to do a http request to it.

The application allows for custom http requests, in this format:

"The format is the same as the Legado APP network TTS engine:
http://url, {"method":"POST", "body": "POST body. support using {{js code or variable}} "}

Built-in variables:

  • Text:{{speakText}}
  • Speed:{{speakSpeed}}
  • Volume:{{speakVolume}}

Baidu Example:
http://tsn.baidu.com/text2audio,{"method": "POST", "body": "tex={{encodeURI(speakText)}}&spd={{speakSpeed}}&per=4114&cuid=baidu_speech_demo&idx=1&cod=2&lan=zh&ctp=1&pdt=220&vol={{speakVolume}}&aue=6&pit=5&res_tag=audio"} "

I tried to make a custom http request to the opentts server running in docker.
Using this url:
http://192.168.0.226:5500/api/tts?voice=larynx%3Acmu_aew-glow_tts&text={{java.encodeURI(speakText)}}&vocoder=low&denoiserStrength=0&cache=true

Some raw inputs work and others seem to conflict with the syntax.

This does not work:

from an intelligence explosion (Good 1965): a process in which software based intelligent minds enter a runaway reaction of self improvement cycles, with each new and more intelligent generation appearing faster than its predecessor

log output:

Failed: (1) cc.l: Expected start of the object '{', but had 'EOF' instead at path: $ JSON input: %20with%20each%20new%20and%20more%20intelligent%20generation%20appearing%20faster%20than%20its%20predecessor.&vocoder=low&denoiserStrength=0&cache=true

This does work:

Part I of this volume is dedicated to essays which argue that progress in artificial intelligence and machine learning may indeed increase machine intelligence beyond that of any human being.

I'm curious to see what you think (or if you notice a issue I cant seem to detect.), as I strongly believe if I can get this reliably integrated into this application, I will have a functioning and incredibly good quality tts, that might encourage further development. So far the text that does get parsed is incredble.

Additionally here is a link to a issue raised by myself to the developer of the android application aswell. It has more detail on the information specific to the application itself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.