Giter Club home page Giter Club logo

text2num's Introduction

text2num

Documentation Status

text2num is a python package that provides functions and parser classes for:

  • parsing numbers expressed as words in French, English and Spanish and convert them to integer values;
  • detect ordinal, cardinal and decimal numbers in a stream of French, English or Spanish words and get their decimal digit representations. Spanish does not support ordinal numbers yet.

Compatibility

Tested on python 3.7. Requires Python >= 3.6.

License

This sofware is distributed under the MIT license of which you should have received a copy (see LICENSE file in this repository).

Installation

text2num does not depend on any other third party package.

To install text2num in your (virtual) environment:

pip install text2num

That's all folks!

Usage examples

Parse and convert

French examples:

>>> from text_to_num import text2num
>>> text2num('quatre-vingt-quinze', "fr")
95

>>> text2num('nonante-cinq', "fr")
95

>>> text2num('mille neuf cent quatre-vingt dix-neuf', "fr")
1999

>>> text2num('dix-neuf cent quatre-vingt dix-neuf', "fr")
1999

>>> text2num("cinquante et un million cinq cent soixante dix-huit mille trois cent deux", "fr")
51578302

>>> text2num('mille mille deux cents', "fr")
ValueError: invalid literal for text2num: 'mille mille deux cent'

English examples:

>>> from text_to_num import text2num

>>> text2num("fifty-one million five hundred seventy-eight thousand three hundred two", "en")
51578302

>>> text2num("eighty-one", "en")
81

Spanish examples:

>>> from text_to_num import text2num
>>> text2num("ochenta y uno", "es")
81

>>> text2num("nueve mil novecientos noventa y nueve", "es")
9999

>>> text2num("cincuenta y tres millones doscientos cuarenta y tres mil setecientos veinticuatro", "es")
53243724

Find and transcribe

Any numbers, even ordinals.

French:

>>> from text_to_num import alpha2digit
>>> sentence = (
...         "Huit cent quarante-deux pommes, vingt-cinq chiens, mille trois chevaux, "
...         "douze mille six cent quatre-vingt-dix-huit clous.\n"
...         "Quatre-vingt-quinze vaut nonante-cinq. On tolère l'absence de tirets avant les unités : "
...         "soixante seize vaut septante six.\n"
...         "Nombres en série : douze quinze zéro zéro quatre vingt cinquante-deux cent trois cinquante deux "
...         "trente et un.\n"
...         "Ordinaux: cinquième troisième vingt et unième centième mille deux cent trentième.\n"
...         "Décimaux: douze virgule quatre-vingt dix-neuf, cent vingt virgule zéro cinq ; "
...         "mais soixante zéro deux."
...     )
>>> print(alpha2digit(sentence))
842 pommes, 25 chiens, 1003 chevaux, 12698 clous.
95 vaut 95. On tolère l'absence de tirets avant les unités : 76 vaut 76.
Nombres en série : 12 15 004 20 52 103 52 31.
Ordinaux: 5ème 3ème 21ème 100ème 1230ème.
Décimaux: 12,99, 120,05 ; mais 60 02.

English:

>>> from text_to_num import alpha2digit

>>> text = "On May twenty-third, I bought twenty-five cows, twelve chickens and one hundred twenty five point forty kg of potatoes."
>>> alpha2digit(text, "en")
'On May 23rd, I bought 25 cows, 12 chickens and 125.40 kg of potatoes.'

Spanish (ordinals not supported):

>>> from text_to_num import alpha2digit

>>> text = "Compramos veinticinco vacas, doce gallinas y ciento veinticinco coma cuarenta kg de patatas."
>>> alpha2digit(text, "es")
'Compramos 25 vacas, 12 gallinas y 125.40 kg de patatas.'

>>> text = "Tenemos mas veinte grados dentro y menos quince fuera."
>>> alpha2digit(text, "es")
'Tenemos +20 grados dentro y -15 fuera.'

Read the complete documentation on ReadTheDocs.

Contribute

Join us on https://github.com/allo-media/text2num

text2num's People

Contributors

rtxm avatar pablorodriper avatar chaitanya-jadhav avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.