readbeyond / aeneas Goto Github PK

View Code? Open in Web Editor NEW

2.5K 73.0 222.0 29.82 MB

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Home Page: http://www.readbeyond.it/aeneas/

License: GNU Affero General Public License v3.0

Python 49.18% HTML 3.95% Shell 0.45% C 9.73% C++ 36.35% Makefile 0.34%

speech alignment tts python linux macos windows nlp espeak espeak-ng

aeneas's Introduction

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

Version: 1.7.3
Date: 2017-03-15
Developed by: ReadBeyond
Lead Developer: Alberto Pettarin
License: the GNU Affero General Public License Version 3 (AGPL v3)
Contact: [email protected]
Quick Links: Home - GitHub - PyPI - Docs - Tutorial - Benchmark - Mailing List - Web App

Goal

aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

For example, given this text file and this audio file, aeneas determines, for each fragment, the corresponding time interval in the audio file:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes,         => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies,                 => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel:        => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament,         => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring,                  => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content,             => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding:          => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

This synchronization map can be output to file in several formats, depending on its application:

research: Audacity (AUD), ELAN (EAF), TextGrid;
digital publishing: SMIL for EPUB 3;
closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
Web: JSON;
further processing: CSV, SSV, TSV, TXT, XML.

System Requirements, Supported Platforms and Installation

System Requirements

a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)
FFmpeg
eSpeak
Python packages BeautifulSoup4, lxml, and numpy
Python headers to compile the Python C/C++ extensions (optional but strongly recommended)
A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms

aeneas has been developed and tested on Debian 64bit, with Python 2.7 and Python 3.5, which are the only supported platforms at the moment. Nevertheless, aeneas has been confirmed to work on other Linux distributions, Mac OS X, and Windows. See the PLATFORMS file for details.

If installing aeneas natively on your OS proves difficult, you are strongly encouraged to use aeneas-vagrant, which provides aeneas inside a virtualized Debian image running under VirtualBox and Vagrant, which can be installed on any modern OS (Linux, Mac OS X, Windows).

Installation

All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.

The generic OS-independent procedure is simple:

Install Python (2.7.x preferred), FFmpeg, and eSpeak
Make sure the following executables can be called from your shell: espeak, ffmpeg, ffprobe, pip, and python
First install numpy with pip and then aeneas (this order is important):
```
pip install numpy
pip install aeneas
```
To check whether you installed aeneas correctly, run:
```
 python -m aeneas.diagnostics
```

Usage

Run without arguments to get the usage message:

python -m aeneas.tools.execute_task
python -m aeneas.tools.execute_job

You can also get a list of live examples that you can immediately run on your machine thanks to the included files:

python -m aeneas.tools.execute_task --examples
python -m aeneas.tools.execute_task --examples-all

To compute a synchronization map map.json for a pair (audio.mp3, text.txt in plain text format), you can run:
```
python -m aeneas.tools.execute_task \
    audio.mp3 \
    text.txt \
    "task_language=eng|os_task_file_format=json|is_text_type=plain" \
    map.json
```
(The command has been split into lines with \ for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.)

To compute a synchronization map map.smil for a pair (audio.mp3, page.xhtml containing fragments marked by id attributes like f001), you can run:
```
python -m aeneas.tools.execute_task \
    audio.mp3 \
    page.xhtml \
    "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
    map.smil
```
As you can see, the third argument (the configuration string) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the documentation for details.
If you have several tasks to process, you can create a job container to batch process them:
```
python -m aeneas.tools.execute_job job.zip output_directory
```
File job.zip should contain a config.txt or config.xml configuration file, providing aeneas with all the information needed to parse the input assets and format the output sync map files. Consult the documentation for details.

The documentation contains a highly suggested tutorial which explains how to use the built-in command line tools.

Documentation and Support

Documentation: http://www.readbeyond.it/aeneas/docs/
Command line tools tutorial: http://www.readbeyond.it/aeneas/docs/clitutorial.html
Library tutorial: http://www.readbeyond.it/aeneas/docs/libtutorial.html
Old, verbose tutorial: A Practical Introduction To The aeneas Package
Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment
Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
High level description of how aeneas works: HOWITWORKS
Development history: HISTORY
Testing: TESTING
Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/

Supported Features

Input text files in parsed, plain, subtitles, or unparsed (XML) format
Multilevel input text files in mplain and munparsed (XML) format
Text extraction from XML (e.g., XHTML) files using id and class attributes
Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
Input audio file formats: all those readable by ffmpeg
Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
MFCC and DTW computed via Python C extensions to reduce the processing time
Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
Batch processing of multiple audio/text pairs
Download audio from a YouTube video
In multilevel mode, recursive alignment from paragraph to sentence to word level
In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
Adjustable splitting times, including a max character/second constraint for CC applications
Automated detection of audio head/tail
Output an HTML file for fine tuning the sync map manually (finetuneas project)
Execution parameters tunable at runtime
Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release

Limitations and Missing Features

Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
Open issues

A Note on Word-Level Alignment

A significant number of users runs aeneas to align audio and text at word-level (i.e., each fragment is a word). Although aeneas was not designed with word-level alignment in mind and the results might be inferior to ASR-based forced aligners for languages with good ASR models, aeneas offers some options to improve the quality of the alignment at word-level:

multilevel text (since v1.5.1),
MFCC nonspeech masking (since v1.7.0, disabled by default),
use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).

If you use the aeneas.tools.execute_task command line tool, you can add --presets-word switch to enable MFCC nonspeech masking, for example:

$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

If you use aeneas as a library, just set the appropriate RuntimeConfiguration parameters. Please see the command line tutorial for details.

License

aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.

Licenses for third party code and files included in aeneas can be found in the licenses directory.

No copy rights were harmed in the making of this project.

Supporting and Contributing

Supporting

Would you like supporting the development of aeneas?

I accept sponsorships to

fix bugs,
add new features,
improve the quality and the performance of the code,
port the code to other languages/platforms, and
improve the documentation.

Feel free to get in touch.

Contributing

If you think you found a bug or you have a feature request, please use the GitHub issue tracker to submit it.

If you want to ask a question about using aeneas, your best option consists in sending an email to the mailing list.

Finally, code contributions are welcome! Please refer to the Code Contribution Guide for details about the branch policies and the code style to follow.

Acknowledgments

Many thanks to Nicola Montecchio, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text.

Paolo Bertasi, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage.

Chris Hubbard prepared the files for packaging aeneas as a Debian/Ubuntu .deb.

Daniel Bair prepared the brew formula for installing aeneas and its dependencies on Mac OS X.

Daniel Bair, Chris Hubbard, and Richard Margetts packaged the installers for Mac OS X and Windows.

Firat Ozdemir contributed the finetuneas HTML/JS code for fine tuning sync maps in the browser.

Willem van der Walt contributed the code snippet to output a sync map in TextGrid format.

Chris Vaughn contributed the MacOS TTS wrapper.

All the mighty GitHub contributors, and the members of the Google Group.

aeneas's People

Contributors

Stargazers

Watchers

Forkers

nathanielrb cbeer pettarin garyfeng chadananda intermezzo-fr dburt eomerdws chrisvire mrslain cambell-prince fduch2k ptrwtts danielbair stevenlol theolivenbaum biddyweb chrisvaughn priyablue gneil90 neuroradiology hbcbh1999 kastnerkyle nieshaoshuai zengxijin magicknight bjtman chagge bootinge hhy5277 noahcse kod3r bradparks benjamesbabala duyamin rschmaelzle speechprojects marti733 mgaitan subquester hades210 somtts rohithkodali semtle jvsriram98 drozturk pietrop cbenhagen leaf918 uncledickhe slbinilkumar johnthomson krantas delort confidentmachine xuanhan863 praveenmunagapati zemosolabs lincaiming vuongthai91 andreasscherman maggie0830 mbencherif krislc toannhu saitamandd baifengbai gjinhui timothyaaron chengstone kangliqiang boltomli htwmedia happy-forks yongyug tuchang wooramkang radhanathdas ykay007 liangbogopher wbwj allenxzh nanangarsyad mrzyzhaozeyu ayhelloworld czfork ieud yangchunyong gaoyiyeah kaicorp kristijanarmeni ialak georgeliou sahwar abhimanyoo aishwaryavarma michaelh-sc aber-lijingshu chanwinsze mauna-ai

aeneas's Issues

Creating executables of aeneas with pyinstaller

Working on it on my personal repo, in devel branch.

This needs:

addressing sys.in.encoding being None
creating an hydra tool, so that only one exec should be built for each (OS, 32/64-bit) pair
including the correct res/ files in the .spec configuration
provide the .spec configurations: one for "one directory" and one for "one file" mode

Packaging for OSX

At SIL, we are working on releasing Scripture App Builder for Mac (will build Android and iOS apps). We would like to include Aeneas support on the Mac. I have been in discussion with @danielbair on creating a package for OSX. Would you accept a pull request for this (similar to the debian packaging) or should we keep it as a separate repo?

Thanks,

Chris

Rewrite ``adjustboundaryalgorithm``

The current functions are not very elegant or pythonic.

Cache synthesized WAV files

Currently, when using a TTS called via subprocess or remote API, each fragment is synthesized individually. Hence, in case of repeated fragments, they get synthesized more than once.

The problem is especially impacting those using (paid or free but limited) TTS API.

The solution would be adding a "cache" mechanism to avoid synthesizing again a fragment if previously a fragment with the same text and language has been synthesized. This requires two things:

keeping a dictionary, mapping fragment (language, text) => tmp WAV file
removing all the WAV files at the end of the synt process

Perhaps this caching must be explicitly enabled by the user (since it requires more tmp disk space) and/or enabled by default only for TTS API wrappers, like the current Nuance one.

Remove linux-only blocks for aeneas.cew

Please remove the linux-only blocks for aeneas.cew now.
I have merged the patches from https://github.com/pettarin/espeakosx into the homebrew espeak to compile and install libespeak.
I've submitted a pull request against the espeak.rb formula, but now homebrew maintainers are considering dropping espeak from their official formula list, see Homebrew/homebrew-core#2726 so it may be necessary to use my homebew tap from now on.

Add the tap:
brew tap danielbair/tap
Then install as any other formula:
brew install danielbair/tap/espeak

Mac and Windows installers are available for aeneas from https://github.com/sillsdev/aeneas-installer/releases with cew compiled and working!

Investigate mypy (static type checker via type annotations)

Since the code base is both PY2 and PY3, we would need to use the comment-based type annotations.

Yet, it is an interesting idea that could find some errors in advance, like the float vs. TimeValue bug of a few releases ago.

Former TODO list (to be splitted out)

Improving robustness against music in background
Isolating non-speech intervals (music, prolonged silence)
Automated text fragmentation based on audio analysis
Auto-tuning DTW parameters
Reporting the alignment score
Multilevel sync map granularity (e.g., multilevel SMIL output)
Testing other approaches, like GMM/HMM/NN (e.g., using HTK or Kaldi)

Expose additional eSpeak voices

Currently the languages allowed by the validation process are a subset of the voices available to espeak. Could we add the rest, or at least the english variations such as en-gb and en-us?

Testing other approaches, like GMM/HMM/NN (e.g., using HTK or Kaldi)

Extract MFCCs without loading the whole WAVE file in memory

In theory the MFCCs can be computed by buffering a portion of the WAVE file, processing it, and moving to the next portion.

This helps with long WAVE files (say, >5h) that cannot fit in 2-4GB RAM.

Move check aeneas_check_setup.py inside aeneas module

Investigate whether other aligners can be integrated/wrapped

For example:

gentle (based on Kaldi) : https://lowerquality.com/gentle/
Prosodylab-Aligner (based on HTK) : http://prosodylab.org/tools/aligner/

Call festival via C++ extension

Festival has a C++ API, so we might consider creating a cfw Python C(++?) extension, similar to cew for eSpeak.

From my preliminary test (a simple C++ executable that synthesizes a given number of fragments and concatenates them, saving a single file to disk), it is 8-10x faster to generate 100-1000 fragments than the current subprocess-based Python wrapper. For 1k fragments (2k words, ~21min total audio), the C++ code takes about 2 min, instead of ~25 min of the Python code.

There might be issues with having the Python C(++?) extension to compile, as the C++ part depends on several libraries, in particular festival and several sub-libraries of speech_tools.

cc @ozdefir

Consider replacing pafy with youtube-dl

Apparently pafy uses youtube-dl, so it makes sense to use youtube-dl directly.

Create minimal driver programs for Python C extensions

cew on Windows

The Python C extension cew can be compiled on Windows, but it requires manually patching the espeak DLLs, etc.

See if espeak-ng make this feasible.

Auto-tuning DTW parameters, running external when too big for in-memory

Add finetuneas to repo and add flag to execute_task to output pre-compiled HTML file

See https://groups.google.com/forum/#!topic/aeneas-forced-alignment/MXyNl2juZW0

Add exception management to aeneas.tools.*

Speed Python MFCC code up

This is faster than the current mfcc.py, but it seems to adopt a slightly different definition:

https://github.com/jameslyons/python_speech_features

Automated text fragmentation based on audio analysis

Please update debian/changelog?

Hello Alberto,

Thank you for all the work you have been doing with Aeneas! It is great work.

We would like to update the package the we build of Aeneas that gets used by Scripture App Builder and Reading App Builder. Could you update the debian/changelog and create an entry for 1.5.0.3 and include the changes in the log that you think are relevant? You have done such a great job of including information about changes for previous entries in the changelog. I could try to come up with a list, but I don't know whether I could get a good list.

Thanks,

Chris Hubbard

Replace scikits.audiolab with scipy.io.wavfile

The dependency from scikits.audiolab seems to cause a lot of issues to non-Debian users. Moving to something more standard, like scipy.io.wavfile might help eliminate these issues.

Rewrite ``vad``

Use numpy more, e.g. boolean masks (numpy.ma) and rolling windows.

Catch exception on MemoryError due to very long task

Currently aeneas fails "silently" when fed with a task too long to fit the WAVE file in RAM.

Catch the MemoryError explicitly and generate a human-readable error.

See: http://www.mobileread.com/forums/showpost.php?p=3245813&postcount=17

Reporting the alignment score

Not sure this can be done, however leaving here a placeholder.

Prune Debian dependencies in install_dependencies.sh

Possibly, in a way compatible with aeneas-vagrant

Add Travis CI

debian/ubuntu package

We would like to include aeneas as a package dependency on the linux version of Scripture App Builder (http://software.sil.org/scriptureappbuilder) which free software. Is anyone working on a debian/ubuntu package? Would you accept a pull request if I did the work as a native package or I could create a non-native package and have it in a separate repo. What would you prefer?

Some unit tests are missing

Chiefly:

dtw
sd

Add check on audio head/tail/process

Currently if e.g. the user sets an audio tail beyond the actual length of the audio file, a cryptic error Unexpected error while executing task : The given index is not valid is returned.

Adding a check will help the user diagnose the issue.

Add parameter to format identifiers when input text format is plain or subtitles

Suggested by Joseph Polizzotto in the ML: "Alternatively, could Aeneas also have an export argument that gives us the possibility of autogenerating an ID with Bookmark* or Word* formulas in the SMIL file?"

Creating a Path class or some path sanitize functions

Right now paths are treated as (Unicode) strings, and this might pose problems for all the nefarious Windows issues we all know.

Perhaps it is worth considering creating a specialized class or some path sanitize functions in globalfunctions.py.

A specialized class has the advantage of making e.g. "slash conversion" (/ => \ on Windows) transparent to the rest of the code. But perhaps it is overkill and global functions will suffice.

cew on OS X

At the moment the Python C extension cew works on OS X (with a modified cew_setup.py) but it requires compiling espeak as a static library and copying it in the aeneas/ directory.

See if this can be automated, especially now that espeak-ng seems the active upstream.

Rewrite ``executetask``

The current code is not elegant or pythonic.

execute_job fails on Windows

Probably a path join/normalization problem, very likely a case of mixed forward- and back- slashes.

See https://groups.google.com/d/msg/aeneas-forced-alignment/p9cb1FA0X0I/CtCHH_mpBQAJ

Better VAD: py-webrtcvad

This VAD, conveniently available on PyPI, is a Python wrapper around the VAD from WebRTC by Google:

https://github.com/wiseman/py-webrtcvad

From preliminary tests it works well and it is fast. Its API might need some adaptation and/or we can create our own C extension.

BeautifulSoup4 v4.5.0 breaks aeneas (API change?)

BeautifulSoup4 v4.5.0, released on PyPI on 2016-07-20, seems to include some API change that breaks aeneas when trying to parse XML files with lxml:

soup = BeautifulSoup("\n".join(lines), "lxml")

I am not sure whether this is a bug (there is nothing on the bs4 bug tracker yet), or an intentional API change in bs4.

For now (=> aeneas v1.5.1), with #92 I fixed this issue by setting exact version numbers for lxml and BeautifulSoup4 in requirements.txt and in setup.py, but the issue should be investigated further for the next releases.

For example, we might end specifying exact versions for all pip-installable packages.

CC: @danielbair @chrisvire --- your installers should be fine, as they require BeautifulSoup4==4.4.1 and lxml==3.6.0. Same for the Vagrant procedure, which relies on pip install aeneas which should install the correct versions.

Compiling C extensions on Windows and Python 3.4/3.5

After a preliminary search, it looks like there is no equivalent of "Microsoft Visual C++ compiler for Python 2.7" for Python 3.

One must install the correct Microsoft Visual Studio or Visual C/C++ (free, but several GB of download...), as described here:

https://matthew-brett.github.io/pydagogue/python_msvc.html

http://stackoverflow.com/questions/29909330/microsoft-visual-c-compiler-for-python-3-4

before being able of compiling Python C extensions.

Investigate this further.

Config files and parameter names

This is a long term goal.

Adopting a popular format (INI-like, e.g. TOML).

Changing the current parameter names (too long and complex), with simpler ones.

Add explicit is_audio_file_tail_length, analogous to is_audio_file_head_length

Already done in my private devel branch. To be merged next time.

Multilevel sync map granularity (e.g., multilevel SMIL output)

Rewrite ``sd``

Too many magic numbers. Test other/better approaches.

Global execution parameters

Either on command line, config file or ~/.config/aeneas.conf .

For stuff like setting the MFCC window size, disabling C extensions, etc.

Suitable replacement for old DAISY Pipeline "aligner"?

Great job @pettarin ! :)

CC @rdeltour @mgylling

I wonder if Aeneas could be integrated into the DAISY Pipeline (preferably v2).

https://github.com/daisy/pipeline-scripts

See the old DAISY Pipeline "aligner":

http://sourceforge.net/p/daisymfc/code/HEAD/tree/trunk/dmfc/transformers/se_tpb_aligner/

Aeneas and Python3

Hi there,

I have not dived yet into the actual aeneas code, but I'd like to get things clear before doing that.
For testing purposes, I wanted to include it in a Python 3 project, but that choked on the beautifulsoup version (3.2.1) that it required.

Am I correct that aeneas only runs in Python 2?
Could Aeneas work with a higher version of BS?
How much would it take to rework Aeneas into a Py 3 version?

Thanks a lot

The job cannot be loaded from the specified container

This is the result from my execute_job test. I couldn't find what's causing the problem.
It worked when it was tested on Unix machine, but on Windows 7 64-bit it doesn't work.
Fresh installation of Python 2.7.10 (+BeautifulSoup and lxml), ffmpeg-20150916, espeak-1.48.04, numpy-1.9.2+mkl-cp27, scikits.audiolab-0.11.0-cp27, and VCForPython27.msi

c:\sync\aeneas-master>python -m aeneas.tools.execute_job test/01.zip output/ -v
[INFO] Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Validating container...
[DEBU] 2015-09-21 21:20:38.113000 Validator: Checking container file 'test/01.zip'
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file exists
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file has config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Container has TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container with TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Trying to read config file from con tainer
[DEBU] 2015-09-21 21:20:38.144000 Validator: Config file found in container
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking contents TXT config file
[DEBU] 2015-09-21 21:20:38.144000 Validator: Converting file contents to config string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that string is well encode d
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that the given string is w ell encoded
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking encoding of string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking for reserved characters
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.160000 Validator: Checking required parameters '['is_ hierarchy_type', 'is_hierarchy_prefix', 'is_text_file_relative_path', 'is_text_file_name_regex', 'is_text_type', 'is_audio_file_relative_path', 'is_audio_file_name_regex', 'os_job_file_name', 'os_job_file_container', 'os_job_file_hierarchy_ type', 'os_job_file_hierarchy_prefix', 'os_task_file_name', 'os_task_file_format ', 'job_language']'
[DEBU] 2015-09-21 21:20:38.285000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.300000 Validator: Checking input parameters are not empty
[DEBU] 2015-09-21 21:20:38.332000 Validator: Checking no required parameter is missing
[DEBU] 2015-09-21 21:20:38.378000 Validator: Checking all parameter values are allowed
[DEBU] 2015-09-21 21:20:38.410000 Validator: Checking allowed values for parameter 'job_language'
[DEBU] 2015-09-21 21:20:38.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.472000 Validator: Checking allowed values for parameter 'task_language'
[DEBU] 2015-09-21 21:20:38.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.535000 Validator: Checking allowed values for parameter 'os_job_file_container'
[DEBU] 2015-09-21 21:20:38.582000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.597000 Validator: Checking allowed values for parameter 'is_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.644000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.660000 Validator: Checking allowed values for parameter 'os_job_file_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.707000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.722000 Validator: Checking allowed values for parameter 'is_text_type'
[DEBU] 2015-09-21 21:20:38.753000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.785000 Validator: Checking allowed values for parameter 'os_task_file_format'
[DEBU] 2015-09-21 21:20:38.816000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.847000 Validator: Checking allowed values for parameter 'task_adjust_boundary_algorithm'
[DEBU] 2015-09-21 21:20:38.878000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.910000 Validator: Checking all implied parameters are present
[DEBU] 2015-09-21 21:20:38.941000 Validator: Checking implied parameters by 'is_hierarchy_type'='paged'
[DEBU] 2015-09-21 21:20:38.988000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.003000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.050000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.066000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.113000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.128000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.160000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.191000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.222000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.238000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='percent'
[DEBU] 2015-09-21 21:20:39.285000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.300000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rate'
[DEBU] 2015-09-21 21:20:39.347000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.363000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rateaggressive'
[DEBU] 2015-09-21 21:20:39.394000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.425000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='aftercurrent'
[DEBU] 2015-09-21 21:20:39.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.472000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='beforenext'
[DEBU] 2015-09-21 21:20:39.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.550000 Validator: Checking required parameters: returning True
[DEBU] 2015-09-21 21:20:39.582000 Validator: Checking contents TXT config file: returning True
[DEBU] 2015-09-21 21:20:39.628000 Validator: Analyze the contents of the container
[DEBU] 2015-09-21 21:20:39.675000 Validator: Checking the Job object generated from container
[DEBU] 2015-09-21 21:20:39.722000 Validator: Checking the Job is not None
[DEBU] 2015-09-21 21:20:39.738000 Validator: Checking the Job has at least one Task
[DEBU] 2015-09-21 21:20:39.785000 Validator: Unable to create at least one Task from the container.
[DEBU] 2015-09-21 21:20:39.816000 Validator: Checking container with TXT config file: returning False
[DEBU] 2015-09-21 21:20:39.863000 Validator: Checking container: returning False
[DEBU] 2015-09-21 21:20:39.894000 ExecuteJob: Validating container: failed
[DEBU] 2015-09-21 21:20:39.925000 ExecuteJob: Loading job from container: failed
[INFO] Loading job from container... done [ERRO] The job cannot be loaded from the specified container

Config:

is_hierarchy_type=flat
is_hierarchy_prefix=input/
is_text_file_relative_path=.
is_text_file_name_regex=..txt
is_text_type=parsed
is_audio_file_relative_path=.
is_audio_file_name_regex=..MP3

os_job_file_name=output_test-01
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=input/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3

job_language=en
job_description=Test 01 (flat hierarchy, parsed text files)

Long term move from Python C extensions to CFFI

Today I tried running aeneas under PyPy (Python 2.7.10 branch). Everything seems working, except cdtw and cmfcc that gets compiled, but they do not import, producing the following error: AttributeError: _ARRAY_API not found ... ImportError: numpy.core.multiarray failed to import, both with NumPyPy and upstream NumPy.

Asking on their IRC channel, they strongly suggest to switch to CFFI, as the C API is not the preferred mechanism of PyPy for calling C code.

So, for the long run, it might be worth considering switching to CFFI or supporting it along side C extensions.

DTW anchor indexing problem due to non-integer TTS sample rate * shift (was: Systematic negative bias observable in longer audios)

With longer audios I observe a consistent negative bias which increases gradually towards the end. To make sure it's not a playback issue I tested with Audacity which confirmed the observation.
Examples:

https://readiance.org/finetuneas/librivox/the-brothers-karamazov-by-fyodor-dostoyevsky/40-book-6-chapter-2-the-duel-the
https://readiance.org/finetuneas/librivox/childrens-short-works-vol-011-by-various/the-little-mermaid-childrens-short-works?g=s

The alignments are almost perfect, so I thought it could be due to floating point math or rounding.

Creating a TimeInterval class

To replace all the [begin, end] lists floating around the Python code.