Giter Club home page Giter Club logo

spacy_hunspell's Introduction

spacy_hunspell: Hunspell extension for spaCy

This package uses the spaCy 2.0 extensions to add Hunspell support for spellchecking. Inspired from this discussion here.

Usage

Add the spaCyHunSpell to the spaCy pipeline.

import spacy
from spacy_hunspell import spaCyHunSpell

nlp = spacy.load('en_core_web_sm')
hunspell = spaCyHunSpell(nlp, 'mac')
nlp.add_pipe(hunspell)

doc = nlp('I can haz cheezeburger.')
haz = doc[2]
haz._.hunspell_spell  # False
haz._.hunspell_suggest  # ['ha', 'haze', 'hazy', 'has', 'hat', 'had', 'hag', 'ham', 'hap', 'hay', 'haw', 'ha z']

There are two default locations for Hunspell dictionaries for each platform (mac, and linux). If there are not you can specify the two files manually.

hunspell = spaCyHunSpell(nlp, 'mac')
hunspell = spaCyHunSpell(nlp, 'linux')
hunspell = spaCyHunSpell(nlp, ('en_US.dic', 'en_US.aff'))

You can find the English dictionary files here.

Installation

You can install the package directly if you have the prerequisites to install Hunspell. If it errors out, manually install Hunspell (see below).

pip install spacy_hunspell

Install Hunspell on Linux.

sudo apt-get install libhunspell-dev

Install Hunspell on Mac.

brew install hunspell

Install the Python bindings for Hunspell (pyhunspell):

pip install hunspell

For Mac, you may have to add a few steps before pip installing:

export C_INCLUDE_PATH=/usr/local/include/hunspell
ln -s /usr/local/lib/libhunspell-{VERSION_NUMBER}.a /usr/local/lib/libhunspell.a

For Mac 10.13 High Sierra, you may have to set the C flags (issue).

CFLAGS=$(pkg-config --cflags hunspell) LDFLAGS=$(pkg-config --libs hunspell) pip install hunspell

Install the rest of the requirements.

pip install -r requirements.txt

And download at least one spaCy model.

python -m spacy download en_core_web_sm

spacy_hunspell's People

Contributors

tokestermw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spacy_hunspell's Issues

Error on Jupyter when setting the extension

Problem: Because of the mess that state is on Jupyter (and Python), normal operation causes errors.

Solution: When setting the extensions, either check if Token already has them, or force setting them.

Note: If unclear, I can do a pull request on Saturday.

Hard time to install spacy_hunspell on Windows

After answering all requests made by the installer (include all *.hxx, etc), I finally got the following error, which I have no idea how to solve.

`Microsoft Windows [versão 10.0.14393]
(c) 2016 Microsoft Corporation. Todos os direitos reservados.

C:\Users\esther.reis\PycharmProjects\migracao-sas>C:\Users\esther.reis\AppData\Local\Programs\Python\Python36\python.exe -m pip install spacy_hunspell
Collecting spacy_hunspell
Using cached https://files.pythonhosted.org/packages/d9/6a/d977f74eff8354a5fdd6b5c0d8b4f8caa8d676970e18ff961694d978e7f7/spacy_hunspell-0.1.0.tar.gz
Requirement already satisfied: spacy>=2.0.0 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy_hunspell) (2.0.11)
Collecting hunspell==0.5.0 (from spacy_hunspell)
Using cached https://files.pythonhosted.org/packages/2d/77/8c68d28afca3b07d3b89d3c60af56e1a3e5f381ddd1bc01f31e97233a03c/hunspell-0.5.0.tar.gz
Requirement already satisfied: numpy>=1.7 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (1.14.3)
Requirement already satisfied: murmurhash<0.29,>=0.28 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (0.28.0)
Requirement already satisfied: cymem<1.32,>=1.30 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (1.31.2)
Requirement already satisfied: preshed<2.0.0,>=1.0.0 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (1.0.0)
Requirement already satisfied: thinc<6.11.0,>=6.10.1 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (6.10.2)
Requirement already satisfied: plac<1.0.0,>=0.9.6 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (0.9.6)
Requirement already satisfied: pathlib in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (1.0.1)
Requirement already satisfied: ujson>=1.35 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (1.35)
Requirement already satisfied: dill<0.3,>=0.2 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (0.2.7.1)
Requirement already satisfied: regex==2017.4.5 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from spacy>=2.0.0->spacy_hunspell) (2017.4.5)
Requirement already satisfied: wrapt in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (1.10.11)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (4.23.4)
Requirement already satisfied: cytoolz<0.9,>=0.8 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (0.8.2)
Requirement already satisfied: six<2.0.0,>=1.10.0 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (1.11.0)
Requirement already satisfied: termcolor in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (1.1.0)
Requirement already satisfied: msgpack-python in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (0.5.6)
Requirement already satisfied: msgpack-numpy==0.4.1 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (0.4.1)
Requirement already satisfied: pyreadline>=1.7.1 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from dill<0.3,>=0.2->spacy>=2.0.0->spacy_hunspell) (2.1)
Requirement already satisfied: toolz>=0.8.0 in c:\users\esther.reis\appdata\local\programs\python\python36\lib\site-packages (from cytoolz<0.9,>=0.8->thinc<6.11.0,>=6.10.1->spacy>=2.0.0->spacy_hunspell) (0.9.0)
Installing collected packages: hunspell, spacy-hunspell
Running setup.py install for hunspell ... error
Complete output from command C:\Users\esther.reis\AppData\Local\Programs\Python\Python36\python.exe -u -c "import setuptools, tokenize;file='C:\Users\ESTHER1.REI\AppData\Local\Temp\pip-install-wwpostjh\hunspell\setup.py';f=getattr(tokenize, 'open'
, open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\ESTHER
1.REI\AppData\Local\Temp\pip-record-qzjz10kn\install-record.txt --single-version-externally-managed --compile:
C:\Users\esther.reis\AppData\Local\Programs\Python\Python36\lib\distutils\extension.py:131: UserWarning: Unknown Extension options: 'compile_args', 'macros'
warnings.warn(msg)
running install
running build
running build_ext
building 'hunspell' extension
creating build
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IV:/hunspell-1.3.3/src/hunspell -IC:\Users\esther.reis\AppData\Local\Programs\Python\Python36\include -IC:\Users
\esther.reis\AppData\Local\Programs\Python\Python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\includ
e" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /EHsc /Tphunspell.cpp /Fobuild\temp.win-amd64-3.6\Release\hunspell.obj
hunspell.cpp
creating C:\Users\ESTHER~1.REI\AppData\Local\Temp\pip-install-wwpostjh\hunspell\build\lib.win-amd64-3.6
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:V:/hunspell-1.3.3/src/win_api/x64/Release/libhunspell /LIBPATH:C:
Users\esther.reis\AppData\Local\Programs\Python\Python36\libs /LIBPATH:C:\Users\esther.reis\AppData\Local\Programs\Python\Python36\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\ATLMFC\lib\x64" "/LIB
PATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17134.0\ucrt\x64" "/LIBPATH:C:
Program Files (x86)\Windows Kits\10\lib\10.0.17134.0\um\x64" libhunspell.lib /EXPORT:PyInit_hunspell build\temp.win-amd64-3.6\Release\hunspell.obj /OUT:build\lib.win-amd64-3.6\hunspell.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hunspell.cp36-win_a
md64.lib
Criando biblioteca build\temp.win-amd64-3.6\Release\hunspell.cp36-win_amd64.lib e objeto build\temp.win-amd64-3.6\Release\hunspell.cp36-win_amd64.exp
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::remove(char const *)" (_imp?remove@Hunspell@@QEAAHPEBD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::add_with_affix(char const *,char const *)" (_imp?add_with_affix@Hunspell@@QEAAHPEBD0@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::add(char const *)" (_imp?add@Hunspell@@QEAAHPEBD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::generate(char * * *,char const *,char const *)" (_imp?generate@Hunspell@@QEAAHPEAPEAPEADPEBD1@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::generate(char * * *,char const *,char * *,int)" (_imp?generate@Hunspell@@QEAAHPEAPEAPEADPEBDPEAPEADH@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::stem(char * * *,char const *)" (_imp?stem@Hunspell@@QEAAHPEAPEAPEADPEBD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::analyze(char * * *,char const *)" (_imp?analyze@Hunspell@@QEAAHPEAPEAPEADPEBD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: char * __cdecl Hunspell::get_dic_encoding(void)" (_imp?get_dic_encoding@Hunspell@@QEAAPEADXZ)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: void __cdecl Hunspell::free_list(char * * *,int)" (_imp?free_list@Hunspell@@QEAAXPEAPEAPEADH@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::suggest(char * * *,char const *)" (_imp?suggest@Hunspell@@QEAAHPEAPEAPEADPEBD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::spell(char const *,int *,char * *)" (_imp?spell@Hunspell@@QEAAHPEBDPEAHPEAPEAD@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: int __cdecl Hunspell::add_dic(char const *,char const *)" (_imp?add_dic@Hunspell@@QEAAHPEBD0@Z)
hunspell.obj : error LNK2001: símbolos externos indefinidos "__declspec(dllimport) public: __cdecl Hunspell::Hunspell(char const *,char const *,char const *)" (_imp??0Hunspell@@qeaa@PEBD00@Z)
build\lib.win-amd64-3.6\hunspell.cp36-win_amd64.pyd : fatal error LNK1120: 13 externo não resolvidos
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\link.exe' failed with exit status 1120

----------------------------------------

Command "C:\Users\esther.reis\AppData\Local\Programs\Python\Python36\python.exe -u -c "import setuptools, tokenize;file='C:\Users\ESTHER1.REI\AppData\Local\Temp\pip-install-wwpostjh\hunspell\setup.py';f=getattr(tokenize, 'open', open)(file);code=f
.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\ESTHER
1.REI\AppData\Local\Temp\pip-record-qzjz10kn\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\ESTH
ER~1.REI\AppData\Local\Temp\pip-install-wwpostjh\hunspell
`

Fixed hunspell version?

Your requirements.txt specifies a fixed version for the hunspell package. Is this necessary, or could it be modified to allow hunspell>=0.5.0?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.