Giter Club home page Giter Club logo

Language Learning Toolkit

Build Status PyPi version PyPi license

The Language Learning Toolkit combines several approaches such as natural language processing and web scraping to perform a variety of tasks useful for (human) language learning. This includes:

  • Part-of-speech tagging (POS) supported by Pattern
  • Phonetic transcriptions in accordance with the International Phonetic Alphabet (IPA)
  • Audiosamples (Forvo, Google Translate)
  • Textsamples/Sample sentences (Tatoeba)
  • Visual representations of a given word using Google Images
  • Conjugation of verbs (Present, Perfect, Past, Pluperfect, Future) supported by Verbix
  • Pluralization of nouns (accuracy depending on the language)
  • Indefinite and definite articles for nouns (accuracy depending on the language)
  • Comparative and superlative for adjectives
  • Basic gender detection for nouns

General information

Everything inside LLTK is split up into different modules, allowing for a maximum of flexibility and interchangeability. In fact, each language is a module for itself. When calling a language-specific function, you can choose between addressing the module directly (e.g. lltk.nl.plural('hond')), or using the generic interface (e.g. lltk.generic.plural('nl', 'hond')). Both calls will pass down the request to an appropriate scraper and can be considered equivalent.

To get a quick overview of LLTK's syntax, launch IPython, import lltk and start browsing using tab completion. If you want, you can enable the debug mode by setting lltk.config['debug'] = True.

Examples

The syntax should be pretty straightforward and intuitive. Nevertheless, you might want to have a look at the following examples:

  • IPA: lltk.generic.ipa('de', 'Blume') returns a list of possible IPA writings or None.
  • Pluralization: lltk.generic.plural('nl', 'boom') returns a list of plural forms or None.

Some scrapers know when there's no plural form of a given word. They will return [''].

  • Definite/Indefinite articles: lltk.generic.articles('de', 'Katze') returns a list of lists of valid articles (singular and plural). Have a look at lltk.generic.reference as well.

When using the generic interface, LLTK will raise the NotImplementedError exception if the desired functionality is not available in your target language.

  • For conjugation of verbs, try the following:
lltk.generic.conjugate('de', 'bauen', 'present')
lltk.generic.conjugate('de', 'bauen', 'past')
lltk.generic.conjugate('de', 'bauen', 'perfect')
  • If you want to listen to audio samples, register at Forvo and get your API key. Then paste:
urls = lltk.generic.audiosamples('it', 'mela', key = '---')
lltk.helpers.download(urls[0], '/tmp/audiosample-it-mela.mp3')
lltk.helpers.play('/tmp/audiosample-it-mela.mp3')
  • To see a word used in context, request sample sentences (currently using Tatoeba). Try:
sentences = lltk.generic.textsamples('es', u'jardín')
for sentence in sentences:
	print sentence
  • View images related to a given word (currently using Google Images). Try the following:
photos = lltk.generic.images('fr', u'souris')
clipart = lltk.generic.images('fr', u'souris', itype = 'clipart', isize = 'large')
lineart = lltk.generic.images('fr', u'souris', itype = 'lineart', isize = 'small')

Requirements

The Language Learning Toolkit is written for Python 2.7. There is no support for Python 3, yet. Please install the following Python packages: requests, lxml, Pattern, functools32. You can do that by running:

sudo pip install -r requirements/base.txt

Furthermore, we highly encourage you to install CouchDB for caching. If you are a developer, you should probably install everything from base.txt, extra.txt and development.txt.

License

GNU Lesser General Public License (LGPL), see LICENSE.txt for further details.

Language Learning Toolkit's Projects

batyr icon batyr

Batyr is a web application that can be used to improve listening skills.

koko icon koko

Language Learning for primates.

lltk icon lltk

The Language Learning Toolkit (LLTK) performs a variety of tasks useful for (human) language learning.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.