Giter Club home page Giter Club logo

Comments (19)

klvbdmh avatar klvbdmh commented on June 12, 2024 1

Ok, the first obvious problem is that mbrola_voices_folder variable in Voice class is hardcoded to /usr/share/mbrola. Folder like this doesn't exist on Windows. In fact, there's no default location for voice databases and it's left up to the user. So anyone who wants to use voxpopuli on Windows needs to provide their own path. I've done it with

from voxpopuli import Voice
Voice.mbrola_voices_folder = 'D:\Downloads\mbrola-voices'

The next problem (kind of a recurring theme) is that private methods in Voice class related to mbrola and espeak have hardcoded strings. Since espeak and mbrola aren't likely to be included in the Windows PATH, we need to spell out whole paths to executables. I think providing default paths creating during the install (C:\Program Files (x86)\Mbrola Tools and C:\Program Files (x86)\eSpeak\command_line\espeak) is a sensible approach.

Another Windows-specific issue is that mbrola.exe doesn't exit. Mbrola Tools have a bunch of executables - two of them are GUIs which are not that interesting. phoplayer.exe is a command line client and I was able to successfully generate a .wav file from a .pho file. As far as I tested, espeak works without any problems.

I've noticed that most of the run commands are formatted as strings, which makes them less flexible and harder to maintain. I think using an argument list adds clarity and makes it easier to include platform-specific settings like MALLOC_CHECK (btw a comment in the source code explaining why it's required would be great).

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Anyway, I managed to fork the project locally and played with it a bit. So far I got TestStrToPhonems tests to pass (except test_german, for some reason de1 mbrola voice file is not detected - but de2 is). Can you check my fork and see if there's no regression on Linux?

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Ouh. My first contributer. Alright, first, I really appreciate your help, however I've always been programming python on linux, so for me, Windows porting is dark magic.

I can try and set up a python dev env on windows as to help you, but i really don't know how the subprocess library behaves on windows. Nor do i know how the pyaudio sound backend does on windows, so, those two things should be considered.
Otherwise, well, since voxpopuli is basically a wrapper for espeak and mbrola I/O, everything should be the same.

EDIT: mbrola's website seems to be down...

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Here are the windows binaries, btw: http://tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

Oh thanks! Yeah, the site was down but it looks like it works now. I'll download the binaries and language files and I'll give it a go later (and of course I'll report my findings).

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Nice. I'll set up a python dev env to test out your discoveries :)

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Aw, well, I'm a bit too tired and I merged your forked without looking at this very exhaustive message. I'm sorry for that 🤕 . It doesn't matter much anyway, since I haven't pushed the current version to the pypi repo. I'm sorry, it's my first package, i'm a bit clumsy with these things.

I'm currently fixing some regressions that have been introduced, but nothing bad, don't worry.

Concerning the problems linked to the windows version of mbrola, I truly have no clue about what to do. I should be able to set up a python windows env by the end of the week to see what I can do.

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Oh, that's on purpose. It's a small optimisation. Since, most of the time, people don't care about phonems, and just want some audio out of some text (I have some very strong statistics to support this claim!), I figured it's more efficient to have this pipeline:

python -> espeak -> mbrola -> python

instead of this pipeline (which is used when editing phonems):

python -> espeak -> python -> mbrola -> python

It spares the parsing of the phonem object, to immediately synthesize it back to a string. Hope this makes sense to you.

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

It totally makes sense. At the same time it looks like premature optimization to me. How big are the gains exactly? Also it leads to code duplication - we have big chunks of code doing exactly the same thing.

In other news, I got a working .wav file with voice.to_audio()! I had to add .wav to the stdout parameter. Otherwise the produced wav files were unplayable. However, I noticed this warning in mbrola readme:

Never use .wav format when you pipe the ouput (mbrola can't rewind the
file to write the audio size in the header). Wav format was not
developped for Unix (on the contrary Au format let you specify in the
header "we're on a pipe, read until end of file").

Can you check if it's going to be a problem?

Also, I can't get de1 database to work on my system no matter what, so I can't complete two tests. I even downloaded the Ubuntu package and extracted voice files from it (they were the same).

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

I have no idea about the de1 database either. Maybe try getting a trace of the error somehow (although i don't thing mbrola has a --verbose option).

Regarding the .wav stuff, i'll have to admit I copied someone else's code to "package" mbrola's output into a "real" wav file (c.f. the _wav_format method). I also thought that, even if the .wav format is old and from microsoft, it perfectly works on linux, it's uncompressed so it leaves to the user the choice of formatting. Moreover, wav is widely used (.au and .aiff are mostly used by audio professionals, if i'm not mistaken). At last, we've been using mbrola's output (as wav) in the exact same format for about a year on https://loult.family/ very intensively, and it has proved to be reliable.

Your concern is a valid one though.

We could, however, add an output_format option that defaults to wav.

PS: it'd be good though if we could comment that mysterious line from _wav_format so regular people understand it. I'll probably investigate.

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

I can't even run in from the command line. It simply says Failed to read voice 'mb-de1'.

I see. So there's no problem with using wav as default.

Fully agree on commenting the _wav_format. Perhaps WAVE PCM soundfile format could be a good start.

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Nice, Im going to dig into it to figure out how that bytes-packing sorcery works. It's the kind of stuff I tend to like.

REgarding the wav dicussion, I think keeping only wav is a good option for now, but if someone asks for more formats, we could add those. In the meantime, i'm probably going to add some examples on how to use that .wav bytes object to do other kind of stuff, a bit like in the README.md.

Rregarding the mb-de1 problem, is the error coming from espeak or from mbrola?

Thanks again for your help.

EDIT: I figured out how the bytes-packing sorcery works, and more especially, using your quoted text from mbrola, why exactly did the dude that made this code did it. I'll add comments as soon as possible to _wav_format

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

Good call on the examples. And it's great that you figured out the bytes method.

The error is coming from espeak.

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Hm. I'll check again on linux this evening, see if it's related to windows.

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

Ok, we're almost there. There's one more problem with running tests on Windows. For some reason I still fail all ToAudio tests. They all raise AssertionError when comparing bytes of .wav files. Example:

Failure
Traceback (most recent call last):
  File "D:\lib\env\py35\lib\unittest\case.py", line 58, in testPartExecutor
    yield
  File "D:\lib\env\py35\lib\unittest\case.py", line 600, in run
    testMethod()
  File "D:\Dev\voicesynth\tests\tests.py", line 65, in test_en
    self.assertEqual(wavfile.read(), wav_byte)
  File "D:\lib\env\py35\lib\unittest\case.py", line 820, in assertEqual
    assertion_func(first, second, msg=msg)
  File "D:\lib\env\py35\lib\unittest\case.py", line 813, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: b'RIF[6565 chars]03\x9d\x03a\x03\x06\x03$\x03G\x03\x14\x04\xe5\[156127 chars]\x00' != b'RIF[6565 chars]03\x9e\x03a\x03\x06\x03$\x03G\x03\x14\x04\xe5\[156127 chars]\x00'
Failure
Traceback (most recent call last):
  File "D:\lib\env\py35\lib\unittest\case.py", line 58, in testPartExecutor
    yield
  File "D:\lib\env\py35\lib\unittest\case.py", line 600, in run
    testMethod()
  File "D:\Dev\voicesynth\tests\tests.py", line 39, in test_salut
    self.assertEqual(wavfile.read(), wav_byte)
  File "D:\lib\env\py35\lib\unittest\case.py", line 820, in assertEqual
    assertion_func(first, second, msg=msg)
  File "D:\lib\env\py35\lib\unittest\case.py", line 813, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: b'RIFF2\xad\x00\x00WAVEfmt \x10\x00\x00\x00\x01[159028 chars]\x00' != b'RIFF\x00}\x00\x00WAVEfmt \x10\x00\x00\x00\x01[114443 chars]\x00'
-------------------- >> begin captured stdout << ---------------------
'Salut les amis'

--------------------- >> end captured stdout << ----------------------

Interestingly, voice.say("Salut les amis") plays the sentence without any problems.

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

I think i know what causes these errors, I think it has to do with the quoting you introduced to fix your previous error on windows. I'll have to test this at home this evening.

Btw, i had some problems on travis to install PyAudio, but it's fixed. Once the unittests are OK on ubuntu 14.04, we should be good to go for a 0.2 release. \o/

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

Awesome news about travis!

The error happened before I made those changes too (and I confirmed it by reverting to an old version on my local repo).

I added a default voice folder location on Windows (it will be empty at first, of course).

Also, I figured out the cause of my mb-de1 problem. My espeak installation didn't have mb-de1 file in eSpeak\espeak-data\voices\mb. Simply copying and renaming mb-de2 solved it. It's very peculiar; I checked the latest source version (1.48.04) from SourceForge and it is indeed missing from there too. Do you have that file on your Linux installation?

from voxpopuli.

hadware avatar hadware commented on June 12, 2024

Sorry, I was away for some time. I checked my installation on an Ubuntu 14.04 machine, and it did not have the de2 mbrola voice file. You can see all of those in this listing:
http://packages.ubuntu.com/trusty/amd64/espeak-data/filelist (directory /voice/mb/).

This is probably a problem to be solved using the self.sex variable, which i should probably rename to self.espeak_voice_id. I also kind of copied the weird logic that concerns espeak voices off some other code, it's a bit tricky to understand what happens there. Maybe this should also be clarified.

PS: I've just remarked that, in theory, it's possible to make espeak say stuff in greek with a german accent : /usr/lib/x86_64-linux-gnu/espeak-data/voices/mb/mb-de6-grc . How ironic is that?

from voxpopuli.

klvbdmh avatar klvbdmh commented on June 12, 2024

Then how are you able to pass the tests with de1 voice?

from voxpopuli.

Rachine avatar Rachine commented on June 12, 2024

Ok, the first obvious problem is that mbrola_voices_folder variable in Voice class is hardcoded to /usr/share/mbrola. Folder like this doesn't exist on Windows. In fact, there's no default location for voice databases and it's left up to the user. So anyone who wants to use voxpopuli on Windows needs to provide their own path. I've done it with

from voxpopuli import Voice
Voice.mbrola_voices_folder = 'D:\Downloads\mbrola-voices'

The next problem (kind of a recurring theme) is that private methods in Voice class related to mbrola and espeak have hardcoded strings. Since espeak and mbrola aren't likely to be included in the Windows PATH, we need to spell out whole paths to executables. I think providing default paths creating during the install (C:\Program Files (x86)\Mbrola Tools and C:\Program Files (x86)\eSpeak\command_line\espeak) is a sensible approach.

Another Windows-specific issue is that mbrola.exe doesn't exit. Mbrola Tools have a bunch of executables - two of them are GUIs which are not that interesting. phoplayer.exe is a command line client and I was able to successfully generate a .wav file from a .pho file. As far as I tested, espeak works without any problems.

I've noticed that most of the run commands are formatted as strings, which makes them less flexible and harder to maintain. I think using an argument list adds clarity and makes it easier to include platform-specific settings like MALLOC_CHECK (btw a comment in the source code explaining why it's required would be great).

I've also noticed that a functionality of _str_to_audio method is duplicated - it's basically the same as _str_to_phonems and _phonems_to_audio. Can we combine those two in Python instead of piping the outputs?

Anyway, I managed to fork the project locally and played with it a bit. So far I got TestStrToPhonems tests to pass (except test_german, for some reason de1 mbrola voice file is not detected - but de2 is). Can you check my fork and see if there's no regression on Linux?

mbrola_voices_folder should be custom because without sudo rights, impossible to use voices even on linux for simple users

from voxpopuli.

PierreOrhan avatar PierreOrhan commented on June 12, 2024

For anyone looking for Windows support in 2023, one can install mbrola and espeak through WSL, the new linux interpreter/partition provided by microsoft. Here I was running python from the windows system, but simply calling the linux mbrola and espeak. (I guess one could also do everything in the linux partition, which would require no change to this software.)

Effectively one just need to change a few line of code in this library:

    if platform in ('linux', 'darwin'):
        espeak_binary = 'espeak'
        mbrola_binary = 'mbrola'
        mbrola_voices_folder = "/usr/share/mbrola"
    elif platform == 'win32':
        # If the path has spaces it needs to be enclosed in double quotes.
        espeak_binary = '"C:\\Program Files (x86)\\eSpeak\\command_line\\espeak"'
        mbrola_binary = '"C:\\Program Files (x86)\\Mbrola Tools\\mbrola"'
        mbrola_voices_folder = os.path.expanduser('~\\.mbrola\\')
        if not os.path.exists(mbrola_voices_folder):
            os.makedirs(mbrola_voices_folder)
        if not os.path.exists(espeak_binary):
            espeak_binary = "wsl espeak"
            mbrola_binary = "wsl MALLOC_CHECK_=0 mbrola"
            mbrola_voices_folder = "/usr/share/mbrola"
        # if (Path(self.mbrola_voices_folder)
        #     / Path(voice_name)
        #     / Path(voice_name)).is_file():
        #     self.lang = lang
        #     self.voice_id = voice_id
        # else:
        #     raise self.InvalidVoiceParameters(
        #         "Voice %s not found. Check language and voice id, or install "
        #         "by running 'sudo apt install mbrola-%s'. On Windows download "
        #         "voices from https://github.com/numediart/MBROLA-voices"
        #         % (voice_name, voice_name))
        self.lang = lang
        self.voice_id = voice_id
    def _phonemes_to_audio(self, phonemes: PhonemeList) -> bytes:
        # voice_path_template = ('%s/%s%d/%s%d'
        #                        if platform in ("linux", "darwin")
        #                        else '%s\\%s%d\\%s%d')
        voice_path_template = "%s/%s%d/%s%d"
        voice_phonemic_db = (voice_path_template
                             % (self.mbrola_voices_folder, self.lang,
                                self.voice_id, self.lang, self.voice_id))

The reason for commenting the path checking and changing the path template to a linux path template is that I am not sure how to properly deal with path formatting toward the linux partition in my situation.
So a quick, dirty, fix!

from voxpopuli.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.