Giter Club home page Giter Club logo

aeneas's Introduction

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

Goal

aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

For example, given this text file and this audio file, aeneas determines, for each fragment, the corresponding time interval in the audio file:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes,         => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies,                 => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel:        => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament,         => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring,                  => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content,             => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding:          => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

Waveform with aligned labels, detail

This synchronization map can be output to file in several formats, depending on its application:

  • research: Audacity (AUD), ELAN (EAF), TextGrid;
  • digital publishing: SMIL for EPUB 3;
  • closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
  • Web: JSON;
  • further processing: CSV, SSV, TSV, TXT, XML.

System Requirements, Supported Platforms and Installation

System Requirements

  1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
  2. Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)
  3. FFmpeg
  4. eSpeak
  5. Python packages BeautifulSoup4, lxml, and numpy
  6. Python headers to compile the Python C/C++ extensions (optional but strongly recommended)
  7. A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms

aeneas has been developed and tested on Debian 64bit, with Python 2.7 and Python 3.5, which are the only supported platforms at the moment. Nevertheless, aeneas has been confirmed to work on other Linux distributions, Mac OS X, and Windows. See the PLATFORMS file for details.

If installing aeneas natively on your OS proves difficult, you are strongly encouraged to use aeneas-vagrant, which provides aeneas inside a virtualized Debian image running under VirtualBox and Vagrant, which can be installed on any modern OS (Linux, Mac OS X, Windows).

Installation

All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.

The generic OS-independent procedure is simple:

  1. Install Python (2.7.x preferred), FFmpeg, and eSpeak

  2. Make sure the following executables can be called from your shell: espeak, ffmpeg, ffprobe, pip, and python

  3. First install numpy with pip and then aeneas (this order is important):

    pip install numpy
    pip install aeneas
  4. To check whether you installed aeneas correctly, run:

     python -m aeneas.diagnostics

Usage

  1. Run without arguments to get the usage message:

    python -m aeneas.tools.execute_task
    python -m aeneas.tools.execute_job

    You can also get a list of live examples that you can immediately run on your machine thanks to the included files:

    python -m aeneas.tools.execute_task --examples
    python -m aeneas.tools.execute_task --examples-all
  2. To compute a synchronization map map.json for a pair (audio.mp3, text.txt in plain text format), you can run:

    python -m aeneas.tools.execute_task \
        audio.mp3 \
        text.txt \
        "task_language=eng|os_task_file_format=json|is_text_type=plain" \
        map.json

    (The command has been split into lines with \ for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.)

    To compute a synchronization map map.smil for a pair (audio.mp3, page.xhtml containing fragments marked by id attributes like f001), you can run:

    python -m aeneas.tools.execute_task \
        audio.mp3 \
        page.xhtml \
        "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
        map.smil

    As you can see, the third argument (the configuration string) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the documentation for details.

  3. If you have several tasks to process, you can create a job container to batch process them:

    python -m aeneas.tools.execute_job job.zip output_directory

    File job.zip should contain a config.txt or config.xml configuration file, providing aeneas with all the information needed to parse the input assets and format the output sync map files. Consult the documentation for details.

The documentation contains a highly suggested tutorial which explains how to use the built-in command line tools.

Documentation and Support

Supported Features

  • Input text files in parsed, plain, subtitles, or unparsed (XML) format
  • Multilevel input text files in mplain and munparsed (XML) format
  • Text extraction from XML (e.g., XHTML) files using id and class attributes
  • Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
  • Input audio file formats: all those readable by ffmpeg
  • Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
  • Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
  • MFCC and DTW computed via Python C extensions to reduce the processing time
  • Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
  • Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
  • Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
  • Batch processing of multiple audio/text pairs
  • Download audio from a YouTube video
  • In multilevel mode, recursive alignment from paragraph to sentence to word level
  • In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
  • Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
  • Adjustable splitting times, including a max character/second constraint for CC applications
  • Automated detection of audio head/tail
  • Output an HTML file for fine tuning the sync map manually (finetuneas project)
  • Execution parameters tunable at runtime
  • Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
  • Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release

Limitations and Missing Features

  • Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
  • Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
  • No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
  • Open issues

A Note on Word-Level Alignment

A significant number of users runs aeneas to align audio and text at word-level (i.e., each fragment is a word). Although aeneas was not designed with word-level alignment in mind and the results might be inferior to ASR-based forced aligners for languages with good ASR models, aeneas offers some options to improve the quality of the alignment at word-level:

  • multilevel text (since v1.5.1),
  • MFCC nonspeech masking (since v1.7.0, disabled by default),
  • use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).

If you use the aeneas.tools.execute_task command line tool, you can add --presets-word switch to enable MFCC nonspeech masking, for example:

$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

If you use aeneas as a library, just set the appropriate RuntimeConfiguration parameters. Please see the command line tutorial for details.

License

aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.

Licenses for third party code and files included in aeneas can be found in the licenses directory.

No copy rights were harmed in the making of this project.

Supporting and Contributing

Sponsors

  • July 2015: Michele Gianella generously supported the development of the boundary adjustment code (v1.0.4)

  • August 2015: Michele Gianella partially sponsored the port of the MFCC/DTW code to C (v1.1.0)

  • September 2015: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)

  • October 2015: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0)

  • April 2016: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0

  • December 2016: the Centro Internazionale Del Libro Parlato "Adriano Sernagiotto" (Feltre, Italy) partially sponsored the development of the v1.7 series

Supporting

Would you like supporting the development of aeneas?

I accept sponsorships to

  • fix bugs,
  • add new features,
  • improve the quality and the performance of the code,
  • port the code to other languages/platforms, and
  • improve the documentation.

Feel free to get in touch.

Contributing

If you think you found a bug or you have a feature request, please use the GitHub issue tracker to submit it.

If you want to ask a question about using aeneas, your best option consists in sending an email to the mailing list.

Finally, code contributions are welcome! Please refer to the Code Contribution Guide for details about the branch policies and the code style to follow.

Acknowledgments

Many thanks to Nicola Montecchio, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text.

Paolo Bertasi, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage.

Chris Hubbard prepared the files for packaging aeneas as a Debian/Ubuntu .deb.

Daniel Bair prepared the brew formula for installing aeneas and its dependencies on Mac OS X.

Daniel Bair, Chris Hubbard, and Richard Margetts packaged the installers for Mac OS X and Windows.

Firat Ozdemir contributed the finetuneas HTML/JS code for fine tuning sync maps in the browser.

Willem van der Walt contributed the code snippet to output a sync map in TextGrid format.

Chris Vaughn contributed the MacOS TTS wrapper.

All the mighty GitHub contributors, and the members of the Google Group.

aeneas's People

Contributors

cbeer avatar chrisvaughn avatar chrisvire avatar danielbair avatar eomerdws avatar readbeyond avatar stephenmcconnel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aeneas's Issues

Creating executables of aeneas with pyinstaller

Working on it on my personal repo, in devel branch.

This needs:

  1. addressing sys.in.encoding being None
  2. creating an hydra tool, so that only one exec should be built for each (OS, 32/64-bit) pair
  3. including the correct res/ files in the .spec configuration
  4. provide the .spec configurations: one for "one directory" and one for "one file" mode

Packaging for OSX

At SIL, we are working on releasing Scripture App Builder for Mac (will build Android and iOS apps). We would like to include Aeneas support on the Mac. I have been in discussion with @danielbair on creating a package for OSX. Would you accept a pull request for this (similar to the debian packaging) or should we keep it as a separate repo?

Thanks,

Chris

Cache synthesized WAV files

Currently, when using a TTS called via subprocess or remote API, each fragment is synthesized individually. Hence, in case of repeated fragments, they get synthesized more than once.

The problem is especially impacting those using (paid or free but limited) TTS API.

The solution would be adding a "cache" mechanism to avoid synthesizing again a fragment if previously a fragment with the same text and language has been synthesized. This requires two things:

  1. keeping a dictionary, mapping fragment (language, text) => tmp WAV file
  2. removing all the WAV files at the end of the synt process

Perhaps this caching must be explicitly enabled by the user (since it requires more tmp disk space) and/or enabled by default only for TTS API wrappers, like the current Nuance one.

Remove linux-only blocks for aeneas.cew

Please remove the linux-only blocks for aeneas.cew now.
I have merged the patches from https://github.com/pettarin/espeakosx into the homebrew espeak to compile and install libespeak.
I've submitted a pull request against the espeak.rb formula, but now homebrew maintainers are considering dropping espeak from their official formula list, see Homebrew/homebrew-core#2726 so it may be necessary to use my homebew tap from now on.

Add the tap:
brew tap danielbair/tap
Then install as any other formula:
brew install danielbair/tap/espeak

Mac and Windows installers are available for aeneas from https://github.com/sillsdev/aeneas-installer/releases with cew compiled and working!

Former TODO list (to be splitted out)

  • Improving robustness against music in background
  • Isolating non-speech intervals (music, prolonged silence)
  • Automated text fragmentation based on audio analysis
  • Auto-tuning DTW parameters
  • Reporting the alignment score
  • Multilevel sync map granularity (e.g., multilevel SMIL output)
  • Testing other approaches, like GMM/HMM/NN (e.g., using HTK or Kaldi)

Expose additional eSpeak voices

Currently the languages allowed by the validation process are a subset of the voices available to espeak. Could we add the rest, or at least the english variations such as en-gb and en-us?

Call festival via C++ extension

Festival has a C++ API, so we might consider creating a cfw Python C(++?) extension, similar to cew for eSpeak.

From my preliminary test (a simple C++ executable that synthesizes a given number of fragments and concatenates them, saving a single file to disk), it is 8-10x faster to generate 100-1000 fragments than the current subprocess-based Python wrapper. For 1k fragments (2k words, ~21min total audio), the C++ code takes about 2 min, instead of ~25 min of the Python code.

There might be issues with having the Python C(++?) extension to compile, as the C++ part depends on several libraries, in particular festival and several sub-libraries of speech_tools.

cc @ozdefir

cew on Windows

The Python C extension cew can be compiled on Windows, but it requires manually patching the espeak DLLs, etc.

See if espeak-ng make this feasible.

Please update debian/changelog?

Hello Alberto,

Thank you for all the work you have been doing with Aeneas! It is great work.

We would like to update the package the we build of Aeneas that gets used by Scripture App Builder and Reading App Builder. Could you update the debian/changelog and create an entry for 1.5.0.3 and include the changes in the log that you think are relevant? You have done such a great job of including information about changes for previous entries in the changelog. I could try to come up with a list, but I don't know whether I could get a good list.

Thanks,

Chris Hubbard

Rewrite ``vad``

Use numpy more, e.g. boolean masks (numpy.ma) and rolling windows.

debian/ubuntu package

We would like to include aeneas as a package dependency on the linux version of Scripture App Builder (http://software.sil.org/scriptureappbuilder) which free software. Is anyone working on a debian/ubuntu package? Would you accept a pull request if I did the work as a native package or I could create a non-native package and have it in a separate repo. What would you prefer?

Add check on audio head/tail/process

Currently if e.g. the user sets an audio tail beyond the actual length of the audio file, a cryptic error Unexpected error while executing task : The given index is not valid is returned.

Adding a check will help the user diagnose the issue.

Creating a Path class or some path sanitize functions

Right now paths are treated as (Unicode) strings, and this might pose problems for all the nefarious Windows issues we all know.

Perhaps it is worth considering creating a specialized class or some path sanitize functions in globalfunctions.py.

A specialized class has the advantage of making e.g. "slash conversion" (/ => \ on Windows) transparent to the rest of the code. But perhaps it is overkill and global functions will suffice.

cew on OS X

At the moment the Python C extension cew works on OS X (with a modified cew_setup.py) but it requires compiling espeak as a static library and copying it in the aeneas/ directory.

See if this can be automated, especially now that espeak-ng seems the active upstream.

BeautifulSoup4 v4.5.0 breaks aeneas (API change?)

BeautifulSoup4 v4.5.0, released on PyPI on 2016-07-20, seems to include some API change that breaks aeneas when trying to parse XML files with lxml:

soup = BeautifulSoup("\n".join(lines), "lxml")

I am not sure whether this is a bug (there is nothing on the bs4 bug tracker yet), or an intentional API change in bs4.

For now (=> aeneas v1.5.1), with #92 I fixed this issue by setting exact version numbers for lxml and BeautifulSoup4 in requirements.txt and in setup.py, but the issue should be investigated further for the next releases.

For example, we might end specifying exact versions for all pip-installable packages.

CC: @danielbair @chrisvire --- your installers should be fine, as they require BeautifulSoup4==4.4.1 and lxml==3.6.0. Same for the Vagrant procedure, which relies on pip install aeneas which should install the correct versions.

Compiling C extensions on Windows and Python 3.4/3.5

After a preliminary search, it looks like there is no equivalent of "Microsoft Visual C++ compiler for Python 2.7" for Python 3.

One must install the correct Microsoft Visual Studio or Visual C/C++ (free, but several GB of download...), as described here:

https://matthew-brett.github.io/pydagogue/python_msvc.html

or

http://stackoverflow.com/questions/29909330/microsoft-visual-c-compiler-for-python-3-4

before being able of compiling Python C extensions.

Investigate this further.

Config files and parameter names

This is a long term goal.

Adopting a popular format (INI-like, e.g. TOML).

Changing the current parameter names (too long and complex), with simpler ones.

Rewrite ``sd``

Too many magic numbers. Test other/better approaches.

Global execution parameters

Either on command line, config file or ~/.config/aeneas.conf .

For stuff like setting the MFCC window size, disabling C extensions, etc.

Aeneas and Python3

Hi there,

I have not dived yet into the actual aeneas code, but I'd like to get things clear before doing that.
For testing purposes, I wanted to include it in a Python 3 project, but that choked on the beautifulsoup version (3.2.1) that it required.

  • Am I correct that aeneas only runs in Python 2?
  • Could Aeneas work with a higher version of BS?
  • How much would it take to rework Aeneas into a Py 3 version?

Thanks a lot

The job cannot be loaded from the specified container

This is the result from my execute_job test. I couldn't find what's causing the problem.
It worked when it was tested on Unix machine, but on Windows 7 64-bit it doesn't work.
Fresh installation of Python 2.7.10 (+BeautifulSoup and lxml), ffmpeg-20150916, espeak-1.48.04, numpy-1.9.2+mkl-cp27, scikits.audiolab-0.11.0-cp27, and VCForPython27.msi

c:\sync\aeneas-master>python -m aeneas.tools.execute_job test/01.zip output/ -v
[INFO] Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Validating container...
[DEBU] 2015-09-21 21:20:38.113000 Validator: Checking container file 'test/01.zip'
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file exists
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file has config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Container has TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container with TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Trying to read config file from con tainer
[DEBU] 2015-09-21 21:20:38.144000 Validator: Config file found in container
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking contents TXT config file
[DEBU] 2015-09-21 21:20:38.144000 Validator: Converting file contents to config string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that string is well encode d
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that the given string is w ell encoded
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking encoding of string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking for reserved characters
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.160000 Validator: Checking required parameters '['is_ hierarchy_type', 'is_hierarchy_prefix', 'is_text_file_relative_path', 'is_text_file_name_regex', 'is_text_type', 'is_audio_file_relative_path', 'is_audio_file_name_regex', 'os_job_file_name', 'os_job_file_container', 'os_job_file_hierarchy_ type', 'os_job_file_hierarchy_prefix', 'os_task_file_name', 'os_task_file_format ', 'job_language']'
[DEBU] 2015-09-21 21:20:38.285000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.300000 Validator: Checking input parameters are not empty
[DEBU] 2015-09-21 21:20:38.332000 Validator: Checking no required parameter is missing
[DEBU] 2015-09-21 21:20:38.378000 Validator: Checking all parameter values are allowed
[DEBU] 2015-09-21 21:20:38.410000 Validator: Checking allowed values for parameter 'job_language'
[DEBU] 2015-09-21 21:20:38.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.472000 Validator: Checking allowed values for parameter 'task_language'
[DEBU] 2015-09-21 21:20:38.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.535000 Validator: Checking allowed values for parameter 'os_job_file_container'
[DEBU] 2015-09-21 21:20:38.582000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.597000 Validator: Checking allowed values for parameter 'is_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.644000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.660000 Validator: Checking allowed values for parameter 'os_job_file_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.707000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.722000 Validator: Checking allowed values for parameter 'is_text_type'
[DEBU] 2015-09-21 21:20:38.753000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.785000 Validator: Checking allowed values for parameter 'os_task_file_format'
[DEBU] 2015-09-21 21:20:38.816000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.847000 Validator: Checking allowed values for parameter 'task_adjust_boundary_algorithm'
[DEBU] 2015-09-21 21:20:38.878000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.910000 Validator: Checking all implied parameters are present
[DEBU] 2015-09-21 21:20:38.941000 Validator: Checking implied parameters by 'is_hierarchy_type'='paged'
[DEBU] 2015-09-21 21:20:38.988000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.003000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.050000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.066000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.113000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.128000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.160000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.191000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.222000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.238000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='percent'
[DEBU] 2015-09-21 21:20:39.285000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.300000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rate'
[DEBU] 2015-09-21 21:20:39.347000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.363000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rateaggressive'
[DEBU] 2015-09-21 21:20:39.394000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.425000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='aftercurrent'
[DEBU] 2015-09-21 21:20:39.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.472000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='beforenext'
[DEBU] 2015-09-21 21:20:39.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.550000 Validator: Checking required parameters: returning True
[DEBU] 2015-09-21 21:20:39.582000 Validator: Checking contents TXT config file: returning True
[DEBU] 2015-09-21 21:20:39.628000 Validator: Analyze the contents of the container
[DEBU] 2015-09-21 21:20:39.675000 Validator: Checking the Job object generated from container
[DEBU] 2015-09-21 21:20:39.722000 Validator: Checking the Job is not None
[DEBU] 2015-09-21 21:20:39.738000 Validator: Checking the Job has at least one Task
[DEBU] 2015-09-21 21:20:39.785000 Validator: Unable to create at least one Task from the container.
[DEBU] 2015-09-21 21:20:39.816000 Validator: Checking container with TXT config file: returning False
[DEBU] 2015-09-21 21:20:39.863000 Validator: Checking container: returning False
[DEBU] 2015-09-21 21:20:39.894000 ExecuteJob: Validating container: failed
[DEBU] 2015-09-21 21:20:39.925000 ExecuteJob: Loading job from container: failed
[INFO] Loading job from container... done [ERRO] The job cannot be loaded from the specified container

Config:

is_hierarchy_type=flat
is_hierarchy_prefix=input/
is_text_file_relative_path=.
is_text_file_name_regex=..txt
is_text_type=parsed
is_audio_file_relative_path=.
is_audio_file_name_regex=.
.MP3

os_job_file_name=output_test-01
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=input/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3

job_language=en
job_description=Test 01 (flat hierarchy, parsed text files)

Long term move from Python C extensions to CFFI

Today I tried running aeneas under PyPy (Python 2.7.10 branch). Everything seems working, except cdtw and cmfcc that gets compiled, but they do not import, producing the following error: AttributeError: _ARRAY_API not found ... ImportError: numpy.core.multiarray failed to import, both with NumPyPy and upstream NumPy.

Asking on their IRC channel, they strongly suggest to switch to CFFI, as the C API is not the preferred mechanism of PyPy for calling C code.

So, for the long run, it might be worth considering switching to CFFI or supporting it along side C extensions.

DTW anchor indexing problem due to non-integer TTS sample rate * shift (was: Systematic negative bias observable in longer audios)

With longer audios I observe a consistent negative bias which increases gradually towards the end. To make sure it's not a playback issue I tested with Audacity which confirmed the observation.
Examples:

https://readiance.org/finetuneas/librivox/the-brothers-karamazov-by-fyodor-dostoyevsky/40-book-6-chapter-2-the-duel-the
https://readiance.org/finetuneas/librivox/childrens-short-works-vol-011-by-various/the-little-mermaid-childrens-short-works?g=s

The alignments are almost perfect, so I thought it could be due to floating point math or rounding.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.