Giter Club home page Giter Club logo

aspell-python's Introduction

aspell-python - Python bindings for GNU aspell

Introduction

GNU Aspell is a leading spelling engine, fast, with many dictionaries available. Take a look at Python Cookbook ---Ryan Kelly have collected links to all python bindings for spellers.

aspell-python is a Python wrapper for GNU Aspell, there are two variants:

  • pyaspell.py --- Python library, that utilize ctypes module; compatible with python3;
  • aspell-python --- C extension, two versions are available, one for Python 2.x, and Python 3.x.

C exension exist in two versions: one compatible with Python 2.x and other with Python 3.x.

Version for Py2 has been tested with Python 2.1, Python 2.3.4 and Python 2.4.1. Probably it works fine with all Python versions not older then 2.0. Version for Py3 has been tested with Python 3.2.

License

Both libraries are licensed under BSD license

Author

Wojciech Muła, [email protected]

Thanks to:

  • Adam Karpierz for conviencing me to change license from GPL to BSD and for compiling early versions of C extension under Windows
  • Gora Mohanty for reporting a bug.

Installation

To build & install module for python2.x please use script setup.2.py, i.e.:

$ python setup.2.py build
$ python setup.2.py install

Module for python3.x is build with setup.py:

$ python setup.py build
$ python setup.py install

Details

You need to have libaspell headers installed, Debian package is called libaspell-dev, other distributions should have similar package.

In order to install aspell-python for all users, you must be a root. If you are, type following command:

$ python setup.py install

It builds package and installs aspell.so in directory /usr/lib/{python}/site-packages.

If you don't have root login, you can build package locally. Type following command:

$ python setup.py build

It builds package and places aspell.so in build/lib.{something}. Change to this directory and when you will run python it will be able to import module (you may also copy aspell.so wherever you want, and there run python).

API

Aspell-python module is seen in python under name aspell. So, aspell-python module is imported in following way:

import aspell

The module provides Speller class, two methods, and three types of exceptions --- all described below.

Methods

ConfigKeys() => dictionary

Method returns a dictionary, where keys are names of configuration item, values are 3-tuples:

  • key type (string, integer, boolean, list)
  • default value for the key
  • short description - "internal" means that aspell doesn't provide any description of item and you shouldn't set/change it, unless you know what you do

Aspell's documentation covers in details all of keys and their meaning. Below is a list of most useful and obvious options (it is a filtered output of ConfigKeys).

('data-dir', 'string', '/usr/lib/aspell-0.60', 'location of language data files')
('dict-dir', 'string', '/usr/lib/aspell-0.60', 'location of the main word list')
('encoding', 'string', 'ISO-8859-2', 'encoding to expect data to be in')
('home-dir', 'string', '/home/wojtek', 'location for personal files')
('ignore', 'integer', 1, 'ignore words <= n chars')
('ignore-accents', 'boolean', False, 'ignore accents when checking words -- CURRENTLY IGNORED')
('ignore-case', 'boolean', False, 'ignore case when checking words')
('ignore-repl', 'boolean', False, 'ignore commands to store replacement pairs')
('keyboard', 'string', 'standard', 'keyboard definition to use for typo analysis')
('lang', 'string', 'pl_PL', 'language code')
('master', 'string', 'pl_PL', 'base name of the main dictionary to use')
('personal-path', 'string', '/home/wojtek/.aspell.pl_PL.pws', 'internal')
('repl-path', 'string', '/home/wojtek/.aspell.pl_PL.prepl', 'internal')
('run-together', 'boolean', False, 'consider run-together words legal')
('save-repl', 'boolean', True, 'save replacement pairs on save all')
('warn', 'boolean', True, 'enable warnings')
('backup', 'boolean', True, 'create a backup file by appending ".bak"')
('reverse', 'boolean', False, 'reverse the order of the suggest list')
('suggest', 'boolean', True, 'suggest possible replacements')

Classes

_Speller()

Method creates an AspellSpeller object which is an interface to the GNU Aspell.

Speller called with no parameters creates speller using default configuration. If you want to change or set some parameter you can pass pair of strings: key and it's value. One can get available keys using ConfigKeys.

>>> aspell.Speller("key", "value")

If you want to set more than one pair of key&value, pass the list of pairs to the Speller().

>>> aspell.Speller( ("k1","v1"), ("k2","v2"), ("k3","v3") )

Exceptions

Module defines following errors:

Additionally TypeError is raised when you pass wrong parameters to method.

_AspellConfigError

Error is reported by methods Speller and ConfigKeys. The most common error is passing unknown key.

>>> s = aspell.Speller('python', '2.3') Traceback (most recent call last): File "<stdin>", line 1, in ? aspell.AspellConfigError: The key "python" is unknown. >>>

_AspellModuleError

Error is reported when module can't allocate aspell structures.

_AspellSpellerError

Error is reported by libaspell.

>>> # we set master dictionary file, the file doesn't exist >>> s = Speller('master', '/home/dictionary.rws') Traceback (most recent call last): File "<stdin>", line 1, in ? aspell.AspellSpellerError: The file "/home/dictionary.rws" can not be opened for reading. >>>

_AspellSpeller Object

The AspellSpeller object provides interface to the aspell. It has several methods, described below.

In examples the assumption is that following code has been executed earlier:

>>> import aspell >>> s = aspell.Speller('lang', 'en') >>> s <AspellSpeller object at 0x40209050> >>>

_ConfigKeys() => dictionary

New in version 1.1, changed in 1.13.

Method returns current configuration of speller.

Result has the same meaning as ConfigKeys() procedure.

_check(word) => boolean

Method checks spelling of given word. If word is present in the main or personal (see addtoPersonal) or session dictionary (see addtoSession) returns True, otherwise False.

>>> s.check('word') # correct word True >>> s.check('wrod') # incorrect False >>>

New in version 1.13.

It's possible to use operator in or not in instead of check().

>>> 'word' in s True >>> 'wrod' in s False >>>

_suggest (word) => list of suggestions

Method returns a list of suggested spellings for given word. Even if word is correct, i.e. method check returned 1, action is performed.

>>> s.suggest('wrod') # we made mistake, what aspell suggests? ['word', 'Rod', 'rod', 'Brod', 'prod', 'trod', 'Wood', 'wood', 'wried'] >>>

Warning! suggest() in aspell 0.50 is very, very slow. I recommend caching it's results if program calls the function several times with the same argument.

_addReplecement(incorrect, correct) => None

Adds a replacement pair, it affects order of words in suggest result.

>>> # we choose 7th word from previous result >>> s.addReplecement('wrod', 'trod')

>>> # and the selected word appears at the 1st position >>> s.suggest('word') ['trod', 'word', 'Rod', 'rod', 'Brod', 'prod', 'Wood', 'wood', 'wried']

If config key save-repl is true method saveAllwords saves the replacement pairs to file ~/.aspell.{lang_code}.prepl.

_addtoPersonal(word) => None

Adds word to the personal dictionary, which is stored in file ~./.aspell.{lang_code}.pws. The added words are available for AspellSpeller object, but they remain unsaved until method saveAllwords is called.

# personal dictionary is empty now
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 0

$ python
>>> import aspell
>>> s = aspell.Speller('lang', 'en')
# word 'aspell' doesn't exist
>>> s.check('aspell')
0

# we add it to the personal dictionary
>>> s.addtoPersonal('aspell')

# and now aspell knows it
>>> s.check('aspell')
1

# we save personal dictionary
>>> s.saveAllwords()

# new word appeared in the file
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 1
aspell

# check it once again
$ python
>>> import aspell
>>> s = aspell.Speller('lang', 'en')

# aspell still knows it's own name
>>> s.check('aspell')
1

>>> s.check('aaa')
0
>>> s.check('bbb')
0
# add incorrect words, they shouldn't be saved
>>> s.addtoPersonal('aaa')
>>> s.addtoPersonal('bbb')
>>> s.check('aaa')
1
>>> s.check('bbb')
1

# we've exit without saving, words 'aaa' and 'bbb' doesn't exists
$ cat ~/.aspell.en.pws
personal_ws-1.1 en 1
aspell
$

_addtoSession(word) => None

Adds word to the session dictionary. The session dictionary is volatile, it is not saved to any file. It is destroyed with AspellSpeller object or when method clearSession is called.

_saveAllwords() => None

Save all words from personal dictionary.

_clearSession() => None

Clears session dictionary.

>>> import aspell >>> s = aspell.Speller('lang', 'en') >>> s.check('linux') 0 >>> s.addtoSession('linux') >>> s.check('linux') 1 >>> s.clearSession() >>> s.check('linux') 0

_getPersonalwordlist() => [list of strings]

Returns list of words from personal dictionary.

_getSessionwordlist() => [list of strings]

Returns list of words from session dictionary.

>>> s.addtoSession('aaa') >>> s.addtoSession('bbb') >>> s.getSessionwordlist() ['aaa', 'bbb'] >>> s.clearSession() >>> s.getSessionwordlist() [] >>>

_getMainwordlist() => [list of strings]

Returns list of words from the main dictionary.

Known problems

All version of aspell I've tested have the same error - calling method getMainwordlist produces SIGKILL. It is aspell problem and if you really need a full list of words, use external program word-list-compress.

method aspell 0.50.5 aspell 0.60.2 aspell 0.60.3
ConfigKeys ok ok ok
Speller ok ok ok
check ok ok ok
suggest ok ok ok
addReplecement ok ok ok
addtoPersonal ok ok ok
saveAllwords ok ok ok
addtoSession ok ok ok
clearSession ok AspellSpellerError ok
getPersonalwordlist ok SIGKILL ok
getSessionwordlist ok SIGKILL ok
getMainwordlist SIGKILL SIGKILL SIGKILL

Character encoding

Aspell uses 8-bit encoding. The encoding depend on dictionary setting and is stored in key encoding. One can obtain this key using speller's ConfigKeys.

If your application uses other encoding then aspell, the translation is needed. Here is a sample session (polish dictionary is used).

>>> import aspell >>> s=aspell.Speller('lang', 'pl') >>> >>> s.ConfigKeys()['encoding'] [('encoding', 'string', 'iso8859-2')] >>> enc =s.ConfigKeys()['encoding'][2] >>> enc # dictionary encoding 'iso8859-2' >>> word # encoding of word is utf8 # 'gżegżółka' means in some polish dialects 'cuckoo' 'gxc5xbcegxc5xbcxc3xb3xc5x82ka' >>> s.check(word) 0 >>> s.check( unicode(word, 'utf-8').encode(enc) ) 1

Major updates

  • 2011-03-06: version for Python 3.x

aspell-python's People

Contributors

wojciechmula avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.