Giter Club home page Giter Club logo

voxpopuli's Introduction

Voxpopuli

PyPI PyPI Build Status Documentation Status license

A wrapper around Espeak and Mbrola.

This is a lightweight Python wrapper for Espeak and Mbrola, two co-dependent TTS tools. It enables you to render sound by simply feeding it text and voice parameters. Phonemes (the data transmitted by Espeak to mbrola) can also be manipulated using a mimalistic API.

This is a short introduction, but you might want to look at the readthedoc documentation.

Install

These instructions should work on any Debian/Ubuntu-derivative

Install with pip as:

pip install voxpopuli

You have to have espeak and mbrola installed beforehand:

sudo apt install mbrola espeak

You'll also need some mbrola voices installed, which you can either get on their project page, and then uppack in /usr/share/mbrola/<lang><voiceid>/ or more simply by installing them from the ubuntu repo's. All the voices' packages are of the form mbrola-<lang><voiceid>. You can even more simply install all the voices available by running:

sudo apt install mbrola-*

In case the voices you need aren't all in the ubuntu repo's, you can use this convenient little script that install voices directly from Mbrola's voice repo:

# this installs all british english and french voices for instance
sudo python3 -m voxpopuli.voice_install en fr

Usage

Picking a voice and making it say things

The most simple usage of this lib is just bare TTS, using a voice and a text. The rendered audio is returned in a .wav bytes object:

from voxpopuli import Voice
voice = Voice(lang="fr")
wav = voice.to_audio("salut c'est cool")

Evaluating type(wav) whould return bytes. You can then save the wav using the wb file option

with open("salut.wav", "wb") as wavfile:
    wavfile.write(wav)

If you wish to hear how it sounds right away, you'll have to make sure you installed pyaudio via pip, and then do:

voice.say("Salut c'est cool")

Ou can also, say, use scipy to get the pcm audio as a ndarray:

import scipy.io.wavfile import read, write
from io import BytesIO

rate, wave_array = read(BytesIO(wav))
reversed = wave_array[::-1] # reversing the sound file
write("tulas.wav", rate, reversed)

Getting different voices

You can set some parameters you can set on the voice, such as language or pitch

from voxpopuli import Voice
# really slow fice with high pitch
voice = Voice(lang="us", pitch=99, speed=40, voice_id=2)
voice.say("I'm high on helium")

The exhaustive list of parameters is:

  • lang, a language code among those available (us, fr, en, es, ...) You can list them using the listvoices method from a Voice instance.
  • voice_id, an integer, used to select the voice id for a language. If not specified, the first voice id found for a given language is used.
  • pitch, an integer between 0 and 99 (included)
  • speed, an integer, in the words per minute. Default and regular speed is 160 wpm.
  • volume, float ratio applied to the output sample. Some languages have presets that our best specialists tested. Otherwise, defaults to 1.

Handling the phonemic form

To render a string of text to audio, the Voice object actually chains espeak's output to mbrola, who then renders it to audio. Espeak only renders the text to a list of phonemes (such as the one in the IPA), who then are to be processed by mbrola. For those who like pictures, here is a diagram of what happens when you run voice.to_audio("Hello world")

phonemes

phonemes are represented sequentially by a code, a duration in milliseconds, and a list of pitch modifiers. The pitch modifiers are a list of couples, each couple representing the percentage of the sample at which to apply the pitch modification and the pitch.

Funny thing is, with voxpopuli, you can "intercept" that phoneme list as a simple object, modify it, and then pass it back to the voice to render it to audio. For instance, let's make a simple alteration that'll double the duration for each vowels in an english text.

from voxpopuli import Voice, BritishEnglishPhonemes

voice = Voice(lang="en")
# here's how you get the phonemes list
phoneme_list = voice.to_phonemes("Now go away or I will taunt you a second time.") 
for phoneme in phoneme_list: #phoneme list object inherits from the list object
    if phoneme.name in BritishEnglishPhonemes.VOWELS:
        phoneme.duration *= 3
        
# rendering and saving the sound, then saying it out loud:
voice.to_audio(phoneme_list, "modified.wav")
voice.say(phoneme_list)

Notes:

  • For French, Spanish, German and Italian, the phoneme codes used by espeak and mbrola are available as class attributes similar to the BritishEnglishPhonemes class as above.
  • More info on the phonemes can be found here: SAMPA page

What's left to do

  • Moar unit tests
  • Maybe some examples

voxpopuli's People

Contributors

bakszero avatar gnomeddev avatar hadware avatar klvbdmh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

voxpopuli's Issues

to_phonemes() only parses the first word in a sentence [Windows]

Code that reproduces the issue

Straight from the unit tests:

from voxpopuli import Voice

voice = Voice(lang="en")
print(voice.to_phonems("Salut les amis").phonemes_str)

Expected behavior

'Salut les amis'
salylezami__

Observed behavior

'Salut les amis'
saly__

Comments

Another espeak parsing issue that happens only on Windows. A phrase with spaces has to be enclosed in double quotes. So instead of

"C:\Program Files (x86)\eSpeak\command_line\espeak" -s 160 -p 50 --pho -q -v mb-fr1 'Salut les amis'

it has to be

"C:\Program Files (x86)\eSpeak\command_line\espeak" -s 160 -p 50 --pho -q -v mb-fr1 "Salut les amis"

mbrola languages (Windows support)

Since mbrola and espeak both provide Windows binaries, I'd like to give voxpopuli a shot and see if I can make it work on Windows.

However, mbrola project website seems to be unavailable and I can't download neither the binary nor language files. Is there any mirror that hosts those files?

ImportError: cannot import name 'EnglishPhonemes' from 'voxpopuli'

Comments

I see that the latest version of phonemes.py file does not contain a class for EnglishPhonemes, but rather only a class for BritishEnglishPhonemes which seems to be the cause for an error when importing EnglishPhonemes as follows:

from voxpopuli import EnglishPhonemes
ImportError: cannot import name 'EnglishPhonemes' from 'voxpopuli' (/home/$USER/anaconda3/lib/python3.7/site-packages/voxpopuli/__init__.py)

Possible Solution

Either the class must be renamed or the README must be updated to reflect this. I can send in a pull request for the latter but since you mention the following:

For French, Spanish, American English, British English and german, the phoneme codes used by espeak and mbrola are available as class attributes like in the Englishphonemes class used before.

I'm assuming you have some design decision w.r.t EnglishPhonemes class, would love to know! Great piece of work btw! :D

Mandarin no output

I am installed the Mandarin voice and am trying to use it with voxpopuli. The language file is installed correctly in /usr/share/mbrola/cn1

However, there is no output for Mandarin no matter what I do. Would appreciate some help here. Thanks

from voxpopuli import Voice
voice = Voice(lang="cn")
wav = voice.to_audio("你好")

Can't generate English phonemes from some words containing letter N

Code that reproduces the issue

from voxpopuli import Voice

voice = Voice(lang="en")
print(voice.to_phonems("second").phonemes_str)

Expected behavior

The phonemes list is printed

Observed behavior

IndexError is raised:

Traceback (most recent call last):
  File "D:/Dev/voicesynth/speakeasy.py", line 6, in <module>
    print(voice.to_phonems("second").phonemes_str)
  File "D:\Dev\voicesynth\voxpopuli\main.py", line 187, in to_phonems
    return self._str_to_phonems(quote(text))
  File "D:\Dev\voicesynth\voxpopuli\main.py", line 156, in _str_to_phonems
    .decode("utf-8")
  File "D:\Dev\voicesynth\voxpopuli\phonems.py", line 40, in __init__
    super().__init__([Phonem.from_str(pho_str) for pho_str in pho_str_list.split("\n") if pho_str])
  File "D:\Dev\voicesynth\voxpopuli\phonems.py", line 40, in <listcomp>
    super().__init__([Phonem.from_str(pho_str) for pho_str in pho_str_list.split("\n") if pho_str])
  File "D:\Dev\voicesynth\voxpopuli\phonems.py", line 27, in from_str
    name = split_pho.pop(0)  # type:str
IndexError: pop from empty list

Comments

Interestingly, it works when I change language to French or when I try the word corner in English.

Add tests to PyPi package

Please add the tests to the PyPi package. I'm packaging this for Alpine Linux and use the tagged release from PyPi, but those don't include the tests so we can't verify if the functionality of this package works at build time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.