Giter Club home page Giter Club logo

phonemizer's Introduction

travis DOI

Phonemizer -- foʊnmaɪzɚ

  • Simple text to phonemes converter for multiple languages, based on festival and espeak/espeak-ng Text-to-Speech systems.

  • Provides both the phonemize command-line tool and the Python function phonemizer.phonemize

  • Festival provides US English phonemization with syllable tokenization, espeak endows multiple languages but without syllable boundaries.

  • The phoneset used is IPA for the espeak backend whereas festival use its default US phoneset

Installation

  • First you need to install festival and espeak on your system. Visit this festival link and that espeak one for installation guidelines. On Debian/Ubuntu simply run:

      $ sudo apt-get install festival espeak
    

    Alternatively you may want to use espeak-ng (Next Generation) instead of espeak. Install it from github here.

  • Then download and install the phonemizer from github with:

      $ git clone https://github.com/bootphon/phonemizer
      $ cd phonemizer
      $ python setup.py build
      $ [sudo] python setup.py install
    

    The phonemize command should be in your $PATH.

Command-line exemples

  • First, have a

      $ phonemize --help
    
  • Input/output exemples

    • from stdin to stdout:

        $ echo "hello world" | phonemize
        hhaxlow werld
      
    • from file to stdout

        $ echo "hello world" > hello.txt
        $ phonemize hello.txt
        hhaxlow werld
      
    • from file to file

        $ phonemize hello.txt -o hello.phon --strip
        $ cat hello.phon
        hhaxlow werld
      
  • Token separators

      $ echo "hello world" | phonemize -p '-' -s '|'
      hh-ax-l-|ow-| w-er-l-d-|
    
      $ echo "hello world" | phonemize -p '-' -s '|' --strip
      hh-ax-l|ow w-er-l-d
    
      $ echo "hello world" | phonemize -p ' ' -s ';esyll ' -w ';eword '
      hh ax l ;esyll ow ;esyll ;eword w er l d ;esyll ;eword
    
  • Languages

    Festival US English is the default

      $ echo "hello world" | phonemize -l en-us-festival
      hhaxlow werld
    

    This uses espeak instead

      $ echo "hello world" | phonemize -l en-us
      həloʊ wɜːld
    

    In French

      $ echo "bonjour le monde" | phonemize -l fr-fr
      bɔ̃ʒuʁ lə- mɔ̃d
    

    Languages supported by festival are:

      en-us-festival	->	english-us
    

    Languages supported by espeak are (espeak-ng supports even more of them):

      af	->	afrikaans
      an	->	aragonese
      bg	->	bulgarian
      bs	->	bosnian
      ca	->	catalan
      cs	->	czech
      cy	->	welsh
      da	->	danish
      de	->	german
      el	->	greek
      en	->	default
      en-gb	->	english
      en-sc	->	en-scottish
      en-uk-north	->	english-north
      en-uk-rp	->	english_rp
      en-uk-wmids	->	english_wmids
      en-us	->	english-us
      en-wi	->	en-westindies
      eo	->	esperanto
      es	->	spanish
      es-la	->	spanish-latin-am
      et	->	estonian
      fa	->	persian
      fa-pin	->	persian-pinglish
      fi	->	finnish
      fr-be	->	french-Belgium
      fr-fr	->	french
      ga	->	irish-gaeilge
      grc	->	greek-ancient
      hi	->	hindi
      hr	->	croatian
      hu	->	hungarian
      hy	->	armenian
      hy-west	->	armenian-west
      id	->	indonesian
      is	->	icelandic
      it	->	italian
      jbo	->	lojban
      ka	->	georgian
      kn	->	kannada
      ku	->	kurdish
      la	->	latin
      lfn	->	lingua_franca_nova
      lt	->	lithuanian
      lv	->	latvian
      mk	->	macedonian
      ml	->	malayalam
      ms	->	malay
      ne	->	nepali
      nl	->	dutch
      no	->	norwegian
      pa	->	punjabi
      pl	->	polish
      pt-br	->	brazil
      pt-pt	->	portugal
      ro	->	romanian
      ru	->	russian
      sk	->	slovak
      sq	->	albanian
      sr	->	serbian
      sv	->	swedish
      sw	->	swahili-test
      ta	->	tamil
      tr	->	turkish
      vi	->	vietnam
      vi-hue	->	vietnam_hue
      vi-sgn	->	vietnam_sgn
      zh	->	Mandarin
      zh-yue	->	cantonese
    

Licence

Copyright 2015 - 2017 Mathieu Bernard

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

phonemizer's People

Contributors

jubenjum avatar mmmaat avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.