Giter Club home page Giter Club logo

nuspell's Introduction

About Nuspell

Nuspell is a fast and safe spelling checker software program. It is designed for languages with rich morphology and complex word compounding. Nuspell is written in modern C++ and it supports Hunspell dictionaries.

Main features of Nuspell spelling checker:

  • Provides software library and command-line tool.
  • Suggests high-quality spelling corrections.
  • Backward compatibility with Hunspell dictionary file format.
  • Up to 3.5 times faster than Hunspell.
  • Full Unicode support backed by ICU.
  • Twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.).
  • Supports complex compounds (for example, Hungarian, German and Dutch).
  • Supports advanced features, for example: special casing rules (Turkish dotted i or German sharp s), conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms.
  • Free and open source software. Licensed under GNU LGPL v3 or later.

Building Nuspell

Dependencies

Build-only dependencies:

  • C++ 17 compiler with support for std::filesystem, e.g. GCC >= v9
  • CMake >= v3.12
  • Catch2 >= v3.1.1 (It is only needed when building the tests. If it is not available as a system package, then CMake will download it using FetchContent.)
  • Getopt (It is needed only on Windows + MSVC and only when the CLI tool or the tests are built. It is available in vcpkg. Other platforms provide it out of the box.)

Run-time (and build-time) dependencies:

  • ICU4C

Recommended tools for developers: qtcreator, ninja, clang-format, gdb, vim, doxygen.

Building on GNU/Linux and Unixes

We first need to download the dependencies. Some may already be preinstalled.

For Ubuntu and Debian:

sudo apt install git cmake libicu-dev

Then run the following commands inside the Nuspell directory:

mkdir build
cd build
cmake ..
make
sudo make install

For faster build process run make -j, or use Ninja instead of Make.

If you are making a Linux distribution package (dep, rpm) you need some additional configurations on the CMake invocation. For example:

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr

Building on OSX and macOS

  1. Install Apple's Command-line tools.
  2. Install Homebrew package manager.
  3. Install dependencies with the next commands.
brew install cmake icu4c
export ICU_ROOT=$(brew --prefix icu4c)

Then run the standard cmake and make. See above. The ICU_ROOT variable is needed because icu4c is keg-only package in Homebrew and CMake can not find it by default. Alternatively, you can use -DICU_ROOT=... on the cmake command line.

If you want to build with GCC instead of Clang, you need to pull GCC with Homebrew and rebuild all the dependencies with it. See Homewbrew manuals.

Building on Windows

Compiling with Visual C++

  1. Install Visual Studio 2017 or newer. Alternatively, you can use Visual Studio Build Tools.
  2. Install Git for Windows and Cmake.
  3. Install Vcpkg in some folder, e.g. in c:\vcpkg.
  4. Run the commands bellow. Vcpkg will work in manifest mode and it will automatically install the dependencies.
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=c:\vcpkg\scripts\buildsystems\vcpkg.cmake -A x64
cmake --build .

Compiling with Mingw64 and MSYS2

Download MSYS2, update everything and install the following packages:

pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-icu \
          mingw-w64-x86_64-cmake

Then from inside the Nuspell folder run:

mkdir build
cd build
cmake .. -G "Unix Makefiles"
make
make install

Building in Cygwin environment

Download the above mentioned dependencies with Cygwin package manager. Then compile the same way as on Linux. Cygwin builds depend on Cygwin1.dll.

Building on FreeBSD

Install the following required packages

pkg cmake icu catch

Then run the standard cmake and make as on Linux. See above.

Using the software

Using the command-line tool

The main executable is located in src/nuspell.

After compiling and installing you can run the Nuspell spell checker with a Nuspell, Hunspell or Myspell dictionary:

nuspell -d en_US text.txt

For more details run nuspell --help.

Using the Library

Sample program:

#include <iostream>
#include <nuspell/dictionary.hxx>
#include <nuspell/finder.hxx>

using namespace std;

int main()
{
	auto dirs = vector<filesystem::path>();
	nuspell::append_default_dir_paths(dirs);
	auto dict_path = nuspell::search_dirs_for_one_dict(dirs, "en_US");
	if (empty(dict_path))
		return 1; // Return error because we can not find the requested
		          // dictionary.

	auto dict = nuspell::Dictionary();
	try {
		dict.load_aff_dic(dict_path);
	}
	catch (const nuspell::Dictionary_Loading_Error& e) {
		cerr << e.what() << '\n';
		return 1;
	}
	auto word = string();
	auto sugs = vector<string>();
	while (cin >> word) {
		if (dict.spell(word)) {
			cout << "Word \"" << word << "\" is ok.\n";
			continue;
		}

		cout << "Word \"" << word << "\" is incorrect.\n";
		dict.suggest(word, sugs);
		if (sugs.empty())
			continue;
		cout << "  Suggestions are: ";
		for (auto& sug : sugs)
			cout << sug << ' ';
		cout << '\n';
	}
}

On the command line you can link like this:

g++ example.cxx -std=c++17 -lnuspell -licuuc -licudata
# or better, use pkg-config
g++ example.cxx -std=c++17 $(pkg-config --cflags --libs nuspell)

Within Cmake you can use find_package() to link. For example:

find_package(Nuspell)
add_executable(myprogram main.cpp)
target_link_libraries(myprogram Nuspell::nuspell)

Dictionaries

Myspell, Hunspell and Nuspell dictionaries:

https://github.com/nuspell/nuspell/wiki/Dictionaries-and-Contacts

Advanced topics

Debugging Nuspell

First, always install the debugger:

sudo apt install gdb

For debugging we need to create a debug build and then we need to start gdb.

mkdir debug
cd debug
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j
gdb src/nuspell/nuspell

We recommend debugging to be done with an IDE.

Testing

To run the tests, run the following command after building:

ctest

See also

Full documentation in the wiki.

API Documentation for developers can be generated from the source files by running:

doxygen

The result can be viewed by opening doxygen/html/index.html in a web browser.

nuspell's People

Contributors

aarondandy avatar be-we avatar changwoo avatar dimztimz avatar doughdemon avatar edwardbetts avatar fitojb avatar fxkr avatar hartzell avatar ibragimov avatar jeroen avatar jwilk avatar kaboomium avatar laszlonemeth avatar matt-lough avatar mhosken avatar orgads avatar pandermusubi avatar panderopentaal avatar phajdan avatar phcoder avatar piotrdrag avatar plicease avatar plusky avatar rffontenelle avatar rul avatar sanshinkaiaikido avatar stbergmann avatar tanzislam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nuspell's Issues

Output different from Hunspell

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Reporting bugs

When reporting a bug you must tell us the state of your system and the
steps to reproduce the bug. For the state please specify the following:

key value
OS, distro, version =
Hunspell version =
Dictionary,package name,version =
Command line tool or GUI app =

Steps to reproduce

Watch the difference in outputted characters for Nuspell and hunspell when affixes were used. Hunspell shows +stem, Nuspell does not.

ruud@ruud-laptop:~/Bureaublad/nuspell-master$ nuspell
INFO: Input  locale name=nl_NL.UTF-8, lang=nl, country=NL, enc=utf-8
INFO: Output locale name=nl_NL.UTF-8, lang=nl, country=NL, enc=utf-8
INFO: Pointed dictionary /usr/share/hunspell/nl_NL.{dic,aff}
gebogen
*
fietsjes
*
boog
*
boogje
*
^C
ruud@ruud-laptop:~/Bureaublad/nuspell-master$ hunspell
Hunspell 2.0.0
fietsje
+ fiets

fietsjes
+ fiets

boog
*

boogjes
+ boog

Change/feature request

This might be intentional, but could also be a mistake.

Nuspell now included as "app-text/nuspell" in Gentoo

Hi!

I ran into Sander's Nuspell talk at FOSDEM 2020 and its explicit Help wanted slide around 18m21s with a list that included Gentoo. So I felt like giving packaging Nuspell for Gentoo a shot. I think it's fair to say that it was a painless experience so far — thanks to you for that!

I would make wiki page "Nuspell packaged binaries" point to the upcoming package page https://packages.gentoo.org/packages/app-text/nuspell —might take a few minutes to go out of 404— for Gentoo but it seems I lack write permissions in the Nuspell wiki. Can you do that update for me?

Thanks and best, Sebastian

Compile error on Windows using Mingw64 and MSYS2

  • Bug reports
  • Change request or feature request
  • Others, questions

Reporting bugs

key value
OS, distro, version = Win 10
nuspell version = latest master branch

Current latest commit hash:

$ git rev-parse HEAD
7c075a2d77d3a1783fd52a2f0fe2a6aef4e231d0

Here is the build error:

will@DESKTOP-UCBSDOR MINGW64 ~/Desktop/nuspell/nuspell
$ make
make  all-recursive
make[1]: Entering directory '/c/Users/will/Desktop/nuspell/nuspell'
Making all in src
make[2]: Entering directory '/c/Users/will/Desktop/nuspell/nuspell/src'
Making all in hunspell
make[3]: Entering directory '/c/Users/will/Desktop/nuspell/nuspell/src/hunspell'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/c/Users/will/Desktop/nuspell/nuspell/src/hunspell'
Making all in parsers
make[3]: Entering directory '/c/Users/will/Desktop/nuspell/nuspell/src/parsers'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/c/Users/will/Desktop/nuspell/nuspell/src/parsers'
Making all in tools
make[3]: Entering directory '/c/Users/will/Desktop/nuspell/nuspell/src/tools'
g++ -DHAVE_CONFIG_H -I. -I../..  -I../../src/hunspell -I../../src/hunspell -I../ 
../src/parsers  -I/mingw64/include -std=c++14  -g -O2 -MT hunspell.o -MD -MP -MF
.deps/hunspell.Tpo -c -o hunspell.o hunspell.cxx
hunspell.cxx: In function 'char* mymkdtemp(char*)':
hunspell.cxx:649:10: error: 'mkdtemp' was not declared in this scope
   return mkdtemp(templ);
          ^~~~~~~
hunspell.cxx:649:10: note: suggested alternative: 'mktemp'
   return mkdtemp(templ);
          ^~~~~~~
          mktemp
make[3]: *** [Makefile:476: hunspell.o] Error 1
make[3]: Leaving directory '/c/Users/will/Desktop/nuspell/nuspell/src/tools'
make[2]: *** [Makefile:386: all-recursive] Error 1
make[2]: Leaving directory '/c/Users/will/Desktop/nuspell/nuspell/src'
make[1]: *** [Makefile:491: all-recursive] Error 1
make[1]: Leaving directory '/c/Users/will/Desktop/nuspell/nuspell'
make: *** [Makefile:400: all] Error 2

will@DESKTOP-UCBSDOR MINGW64 ~/Desktop/nuspell/nuspell
$                                                                       ^

Steps to reproduce

  1. Install https://www.msys2.org/ and follow installation steps.

  2. Clone latest from master git clone https://github.com/hunspell/nuspell.git
    into a folder that doesn't need Admin rights such as your Desktop.

  3. Run Msys64 and install pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool mingw-w64-x86_64-boost

  4. build:

autoreconf -vfi
./configure
make

Make should fail with a build error.

Checklist 1 basic checking and affixing

1. Documenting old code

Write the comments in the libreoffice docs. Use examples. If something is not clear,
run a debug session with test data. Even better, run a "debug session" in
your head and write state on paper.

  • HunspellImpl::spell
    • HunspellImpl::checksharps
    • HunspellImpl::checkword
  • Affixmgr::check_affix
    • AffixMgr::prefix_check
    • AffixMgr::prefix_check_twosfx
    • AffixMgr::suffix_check
    • AffixMgr::suffix_check_twosfx
    • PfxEntry::checkword
    • PfxEntry::check_twosfx
    • SfxEntry::checkword
    • SfxEntry::check_twosfx

2. Implementation

  • spell
    • spell_break
    • spell_casing
    • spell_uppercase
    • spell_titlecase
    • checkword
    • homonym support (use multimap instead of map)
    • hidden uppercased homonym for mixed case wrods
  • affixing
    • strip only suffix
    • strip only prefix
    • strip prefix then suffix
    • strip suffix then prefix
    • strip two suffixes
    • strip two prefixes (COMPLEXPREFIXES)
    • strip prefix, suffix, suffix
    • strip suffix, prefix, suffix
    • strip suffix, suffix, prefix
    • strip suffix, prefix, prefix (COMPLEXPREFIXES)
    • strip prefix, suffix, prefix (COMPLEXPREFIXES)
    • strip prefix, prefix, suffix (COMPLEXPREFIXES)

Tries to build bundled Catch2 even if system-wide copy is found

key value
Operating system, distribution, version = FreeBSD 11.3
Nuspell version = cacd0c8

Steps to reproduce

$ pkg install catch cmake icu boost-libs
$ fetch https://github.com/nuspell/nuspell/archive/master.tar.gz
$ tar xf master.tar.gz
$ cd nuspell-master
$ cmake -DBUILD_TESTING:BOOL=true .
-- The C compiler identification is Clang 8.0.0
-- The CXX compiler identification is Clang 8.0.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found the following ICU libraries:
--   uc (required)
--   data (required)
-- Found ICU: /usr/local/include (found version "64.2")
-- Boost version: 1.70.0
-- Found the following Boost libraries:
--   locale
CMake Warning at docs/CMakeLists.txt:4 (message):
  Ronn not found, generating man-pages will be skipped


CMake Error at CMakeLists.txt:40 (add_subdirectory):
  The source directory

    /tmp/nuspell-master/external/Catch2

  does not contain a CMakeLists.txt file.


-- Configuring incomplete, errors occurred!
See also "/tmp/nuspell-master/CMakeFiles/CMakeOutput.log".

Bugged behavior (output)

CMake Error at CMakeLists.txt:40 (add_subdirectory):
  The source directory

    /tmp/nuspell-master/external/Catch2

  does not contain a CMakeLists.txt file.

Expected behavior (output)

-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/nuspell-master

Change/feature request

Hide add_subdirectory behind NOT Catch2_FOUND or similar.

Boost Locale od OS X does not uses ICU backend

The issues can be of three types (put an X between the brackets):

  • Bug reports

Steps to reproduce

See tests for to_upper and to_title https://travis-ci.org/hunspell/nuspell/builds/332942906

Possible solutions

  1. Configure Boost locale to link with icu4c on OS X.
  2. Disable OS X from CI and handle everything at the end. OS X is not a good platform for development, just disturbs. Solve OS X issues at the end.
  3. Don't use boost::locale::to_upper, to_title. Handle them manually like in hunspell v1. Not that big deal.

Logo proposal for NUSPELL

Hello, I'm a graphic designer and I like to collaborate with open source projects. I would like to design a logo for your Project I will be happy to collaborate with you :)

compile error "void value not ignored" (Ubuntu 16.04)

I've checked out the latest code, then called ./configure --with-boost=/prg/boost_1_67_0, then make. It fails with this message:

/usr/include/c++/5/bits/stl_algo.h:566:32:   required from ‘_IIter std::find_if_not(_IIter, _IIter, _Predicate) [with _IIter = const wchar_t*; _Predicate = nuspell::my_ctype<wchar_t>::do_scan_not(std::ctype_base::mask, const char_type*, const char_type*) const::<lambda(auto:3)>]’
locale_utils.cxx:713:57:   required from here
/usr/include/c++/5/bits/predefined_ops.h:296:31: error: void value not ignored as it ought to be
  { return !bool(_M_pred(*__it)); }
                               ^

Any idea?

Use namespaced exported target in CMake

Change/feature request

Two things need to be done, and those are:

  1. change the command install(EXPORT ...) to use its feature NAMESPACE.
  2. Add additional target that has namespace and is alias to the regular target.

The namespace will be Nuspell:: (first letter is uppercase, ends with two colons). The uppercase is consistent with the built-in modules for find_package. They all have the first letter in uppercase. Additionally, it is consistent with our configuration file for find_package, which starts with uppercase, too.

The reasons for using namespace at all are consistency with other find_package modules and better error reporting for the downstream users if they try to use the library but the library is not installed.

dictionary issues in hunspell

I've recently run into issues with the german dictionary of hunspell, as integrated in LibeOffice.
I am probably here in the wrong place, asking for help with e.g. things like this:
photo_2020-01-20_01-04-35(1)
photo_2020-01-20_01-04-55(1)
I also saw a suggestion to replace https in URLs with http.

Which direction should I go, to contribute to the german dictionary?

Questions

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Since I use Hunspell with over 40 languages, I could give it a try to check all collected words for all languages and use -G and -L to list correct and incorrect words with Hunspell as well as Nuspell, and list differences. If this would be of any help, that is.

Apart from the 'to do list', I am also curious about the functionality already implemented.

Does Nuspell support infixes?

Other issues

I am trying to make a spell checker for Kurdish. The problem is, Kurdish relies a lot on infixes (mostly because of clitic pronouns). I'd appreciate it if your provide any guidance on what's the best approach for a language like that.

If Nuspell supports Infixes

That's great news, It'd rather create a Nuspell dictionary than a custom library on my own.

If Nuspell doesn't support Infixes and there aren't any reasonable ways to workaround that limitation

I have noticed that Hunspell uses very little memory and is quite fast. So if I want to create a custom library for Kurdish, I want to know which algorithms Hunspell uses.

Here is a general idea of what I am trying to accomplish:

Consider this word: Bexshin (Forgiving), It can come in these forms:

  • Bimbexshe => [You] Forgive me
  • Bitbexshm => [I] Forgive you
  • Biyanbexshe => [You] Forgive them
  • And many more!

Instead of a list of words, we can have a list of patterns like so: Bi{pronoun}bexsh{pronoun}

More Examples:

Eat (Dexo{pronoun})

  • I eat => Dexom
  • We eat => Dexoyn
  • They eat => Dexon

Can be represented as:

Work (Kar{pronoun}dekrid)

  • I worked => Karmdekird
  • He Worked => Karîdekird
  • They worked => Karyandekird

So I need an algorithm to very quickly tell me what are the closest matching patterns, and then I can expand only those patterns and based on the Levenshtein distance to the input word give back a list of suggestions.

I know that I can read the source code, and I will. But it'd make my job much easier if you gave me a few leads on which algorithms can be useful based on your experience.

[master] Two half-working scripts clang-format.sh?

Hi!

I came across two files called clang-format.sh in here:

# cat clang-format.sh 
cd $(dirname "$0")
clang-format -style=file -i src/nuspell/*.[ch]xx tests/*.[ch]xx

# cat src/nuspell/clang-format.sh 
clang-format -style=file -i *.[ch]xx

To my surprise

  • do not come with a portable (or any) shebang line like #! /usr/bin/env bash
  • continue despite failure of cd $(dirname "$0") —due to lack of quoting—
  • …in case that directory name happens to contain one or more spaces.

Does that mean these scripts are not used in practice?

Install instruction possibly incomplete: configure: error: Package requirements (icu-uc) were not met:

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Reporting bugs

When reporting a bug you must tell us the state of your system and the
steps to reproduce the bug. For the state please specify the following:

Steps to reproduce

Bugged behavior (output)

ruud@ruud-laptop:/Bureaublad/nuspell-master$ autoreconf -vfi
autoreconf: Entering directory .' autoreconf: configure.ac: not using Gettext autoreconf: running: aclocal --force -I m4 autoreconf: configure.ac: tracing autoreconf: running: libtoolize --copy --force libtoolize: putting auxiliary files in '.'. libtoolize: copying file './ltmain.sh' libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'. libtoolize: copying file 'm4/libtool.m4' libtoolize: copying file 'm4/ltoptions.m4' libtoolize: copying file 'm4/ltsugar.m4' libtoolize: copying file 'm4/ltversion.m4' libtoolize: copying file 'm4/lt~obsolete.m4' autoreconf: running: /usr/bin/autoconf --force autoreconf: running: /usr/bin/autoheader --force autoreconf: running: automake --add-missing --copy --force-missing configure.ac:19: installing './compile' configure.ac:20: installing './config.guess' configure.ac:20: installing './config.sub' configure.ac:10: installing './install-sh' configure.ac:10: installing './missing' configure.ac:30: installing './tap-driver.sh' Makefile.am: installing './INSTALL' src/hunspell/Makefile.am: installing './depcomp' parallel-tests: installing './test-driver' autoreconf: Leaving directory .'
ruud@ruud-laptop:
/Bureaublad/nuspell-master$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for a sed that does not truncate output... /bin/sed
checking whether to build with code coverage support... no
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... (cached) /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for gawk... (cached) gawk
checking for ronn... no
checking for ld used by gcc... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking for shared library run path origin... done
checking for iconv... yes
checking for working iconv... yes
checking for iconv declaration...
extern size_t iconv (iconv_t cd, char * *inbuf, size_t *inbytesleft, char * *outbuf, size_t *outbytesleft);
checking for Boost headers version >= 1.62.0... yes
checking for Boost's header version... 1_62
checking for the toolset name used by Boost for g++... configure: WARNING: could not figure out which toolset name to use for g++

checking boost/system/error_code.hpp usability... yes
checking boost/system/error_code.hpp presence... yes
checking for boost/system/error_code.hpp... yes
checking for the Boost system library... yes
checking boost/locale.hpp usability... yes
checking boost/locale.hpp presence... yes
checking for boost/locale.hpp... yes
checking for the Boost locale library... (cached) yes
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for ICU... no
configure: error: Package requirements (icu-uc) were not met:

No package 'icu-uc' found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables ICU_CFLAGS
and ICU_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.
ruud@ruud-laptop:~/Bureaublad/nuspell-master$

Expected behavior (output)

ending this with success.

Change/feature request

If you want current behavior to get changed, please explain how do you
want it changed. If it's completely new, please explain it how do you
want it, as verbose as possible.

Other issues

If you have just questions or some other type of issue, you have the
freedom to ask it in any way. Try to be as verbose as possible.

Autoconf dependencies from gettext

Even though most of gettext has been removed, still the following error occurs:

autoreconf -vfi
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
...
...
...
m4/iconv.m4:20: AM_ICONV_LINK is expanded from...
m4/iconv.m4:233: AM_ICONV is expanded from...
configure.ac:26: the top level
configure:16487: error: possibly undefined macro: AC_LIB_PREPARE_PREFIX
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure:16488: error: possibly undefined macro: AC_LIB_RPATH
configure:16493: error: possibly undefined macro: AC_LIB_LINKFLAGS_BODY
configure:16501: error: possibly undefined macro: AC_LIB_APPENDTOVAR
autoreconf: /usr/bin/autoconf failed with exit status: 1

This can be fixed with sudo apt-get install gettext but that means that gettext is still needed. So a better solution for this needs et be found.

Checklist 4 - Suggestions

  • Understand v1 code and give some docs on it
  • Implement the standard algorithm for suggestions (TRY).
  • Add the keyboard closeness factor to the algorithm (KEY).
  • Add support for manually written replacements of character sequences (REP).
  • Add the character closeness factor (MAP).
  • Add the phonetic closeness factor (PHONET).
  • Write suggestion tests.

How to build nuspell as a shared library?

I tried to compile nuspell on Linux as a shared library to replace Hunspell systemd-wide, but I only managed to obtain a .a file.

key value
Operating system, distribution, version = Arch Linux
Nuspell version 2.1.0

Nu suggestions on command line

The issues can be of three types (put an X between the brackets):

  • [ x] Bug reports
  • Change request or feature request
  • Others, questions

Reporting bugs

When reporting a bug you must tell us the state of your system and the
steps to reproduce the bug. For the state please specify the following:

key value
OS, distro, version = Kubuntu 16
Hunspell version = nuspell
Dictionary,package name,version =
Command line tool or GUI app = cmdline

Steps to reproduce

start Nuspell, settings to Dutch.

Bugged behavior (output)

ruud@ruud-laptop:~/Bureaublad/nuspell-master$ nuspell
INFO: Input locale name=nl_NL.UTF-8, lang=nl, country=NL, enc=utf-8
INFO: Output locale name=nl_NL.UTF-8, lang=nl, country=NL, enc=utf-8
INFO: Pointed dictionary /usr/share/hunspell/nl_NL.{dic,aff}
duer
&
deur
*

Expected behaviour (output)

Expected primitive user interface to deal and suggestions.

Change/feature request

If you want current behavior to get changed, please explain how do you
want it changed. If it's completely new, please explain it how do you
want it, as verbose as possible.

Other issues

If you have just questions or some other type of issue, you have the
freedom to ask it in any way. Try to be as verbose as possible.

Checklist 3 - Testing

3. Testing

  • Unit testing
    • Reuse old unit tests via CLI interface
    • Write new C++ unit tests for new code only, don't test old code, it already has tests.
    • Report code coverage unit tests
  • Regression testing
    • select, download, extract and report Ubuntu Hunspell language support packages
    • select, download, extract and report Ubuntu word list packages
    • gather words, split on separators, filter characters for each language
    • check spelling and report correctness and speed for Hunspell
    • optmize Hunspell for regression testing with extra parameter
    • check spelling and report correctness and speed for Nuspell commits
    • report regression for correctness with Nuspell also with regard to Hunspell reference
    • report regression for speed with Nuspell also with regard to Hunspell reference

Homebrew build options have been removed

(First of all congrats on the project – something like this was long overdue! Excited to see what you come up with for the next version)

Your instructions in the README for building on macOS are unfortunately no longer working. Due to Homebrew removing build options, the command brew install boost --with-icu4c fails with:

Error: invalid option: --with-icu4c

Doing just brew install boost and proceeding as usual will lead, as expected, to a cmake failure:

-- The following ICU libraries were not found:
--   uc (required)
--   data (required)
CMake Error at /usr/local/Cellar/cmake/3.14.4/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
if [ -f ~/.bashrc ]; then
  Failed to find all ICU components (missing: ICU_INCLUDE_DIR ICU_LIBRARY
  _ICU_REQUIRED_LIBS_FOUND)
Call Stack (most recent call first):
if [ -f ~/.bashrc ]; then
  /usr/local/Cellar/cmake/3.14.4/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/Cellar/cmake/3.14.4/share/cmake/Modules/FindICU.cmake:324 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  CMakeLists.txt:9 (find_package)


-- Configuring incomplete, errors occurred!

(I'm on macOS 10.14.4 and I'm trying to build master, but I doubt that's relevant)

Error while parsing updated Aragonese dictionary

Reported on Ubuntu 18.10 with Nuspell 2.2. Aragonese dictionary 'hunspell-an' version 0.2-3'. The output is:

INFO: I/O locale name=en_US.UTF-8, lang=en, country=US, enc=utf-8
INFO: Dictionary path=../1-support/files/hunspell-an/0.2-3/usr/share/hunspell/an_ES.{dic,aff}
Nuspell error: missing flags in line 4193

Either a fix is needed upstream in this dictionary or Nuspell should be able to handle whatever is causing this error. (Hunspell version 1.6.2 can load this dictionary without errors.)

missing -a parameter

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • [x ] Change request or feature request
  • Others, questions

Reporting bugs

The hunspell -a paramter is not in Nuspell; This parameter is used a lot by me, to collect suggestions for (wrong) words, and judging the corrections as well. (they can be added as forbiddenwords when nonsense)

key value
OS, distro, version = Kubuntu 16
Hunspell version = -
Dictionary,package name,version =
Command line tool or GUI app = command line

Steps to reproduce

try nuspell -a

Bugged behavior (output)

ruud@ruud-laptop:/Bureaublad/nuspell-master$ nuspell -a
Unrecognized option: '-a'
Invalid (combination of) arguments, try 'nuspell --help' for more information
ruud@ruud-laptop:
/Bureaublad/nuspell-master$ hunspell -a
@(#) International Ispell Version 3.2.06 (but really Hunspell 2.0.0)
duer
& duer 10 0: deur, dure, dier, duet, der, duwer, deer, duel, duur, perdue

^C
ruud@ruud-laptop:/Bureaublad/nuspell-master$ nuspell -a
Unrecognized option: '-a'
Invalid (combination of) arguments, try 'nuspell --help' for more information
ruud@ruud-laptop:
/Bureaublad/nuspell-master$

Expected behavior (output)

Change/feature request

add -a or somthing alike to get suggestions on command line.

Other issues

Move keepcase checks to lower levels

This is because our dictionaries are multisets/multimaps. See how we solved this for The hidden homonym check.

This is very low priority so it might not even be done.

question about the hyphenator

Issue type:

  • Others, questions

I would like to ask you if there are some documentation on the hyphenation algorithm, particularly on the NEXTLEVEL keyword.
I don't understand how it is used. It divides the patterns in two groups where the first is used in the hyphenation of a non-compounded word and the second on a compounded word? How can I know a word is compounded? I've read this https://github.com/hunspell/hyphen/blob/a7255913300734655691fc3e8ce20041d611fbdb/README.compound but I don't quite understand how the things going on.

When it is written "Hyphen, apostrophe and other characters may be word boundary characters, but they don't need (extra) hyphenation. [...] Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the previous settings, plus in UTF-8 encoding, endash (U+2013) and typographical apostrophe (U+2019) are NOHYPHEN characters, too." that means the hyphen, apostrophe (and additionally endash and typographical apostrophe if no NEXTLEVEL keyword is present) defines a break point by default, without checking any patterns?

When it is written

"ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL

Description:
1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters."

What is the meaning of "(extra)"? If I don't include the NOHYPHEN -,' part there will be an extra hyphen?

When it is written

"The algorithm is recursive: every word parts of a successful
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.

Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts."

That means that, if the NEXTLEVEL option is present, the algorithm scans two times the first set and, for the "sub-words" that were not re-splitted the second time, the second set is used? I understand correctly?

Thank you

Add a setting to not cut words containing at least one capital letter

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Change/feature request

Add a setting to not cut words containing at least one capital letter. Why not just the words beginning with a capital letter? Because, for example, there is “openSUSE”; in programming, the variable names may also not begin with a capital letter, but they may contain at least one.

→ This is the same thread as hunspell/hunspell#555, but I do not know if you stopped the development of Hunspell.

N-gram based suggestions

This method of suggestions is the last major piece that needs to be implemented. This is the thing to work on before any other minor issues.

functioning of morphological fields and other things

Other issues

Since I found no documentation whatsoever on hunspell format, and contacting László Németh for help is quite impossible, I would like to ask you how morphological fields works.

In particular, the affix rules in the .aff file can have "po:" (part-of-speech) fields? How "is:" and "ts:" interacts one with another? are there any other interactions between composition of other fields? can a morphological field be repeated as in "o po:noun is:singular is:femenine is:non_enumerable"?

I found in some place that:
"Derivational Suffix: stemming doesn't remove derivational suffixes (morphological generation depends on the order of the suffix fields)" >> that means that every other non-derivational suffixes are removed? what are the derivational suffixes?
"Inflectional Suffix: all inflectional suffixes are removed by stemming (morphological generation depends on the order of the suffix fields)" >> what are these inflectional suffixes other than "is:"?
"Terminal Suffix: inflectional suffix fields removed by additional (not terminal) suffixes, useful for zero morphemes and affixes removed by splitting rules" >> what are these non-terminal suffixes? all but "ts:"?

Regarding REP: are they applied just once? or a word can have multiple REP substitutions?

Thank you.

Errors while parsing updated dictionary for Catalan language

Reported on Ubuntu 18.10 with Nuspell 2.2. Catalan dictionary 'hunspell-ca' version '3.0.3+repack1-1'. The output is:

INFO: I/O locale name=en_US.UTF-8, lang=en, country=US, enc=utf-8
INFO: Dictionary path=../1-support/files/hunspell-ca/3.0.3+repack1-1/usr/share/hunspell/ca.{dic,aff}
Nuspell error: missing flags in line 675
Nuspell error: could not parse affix file line 675: SFX 00 r t/ ar
Nuspell error: missing flags in line 676
Nuspell error: could not parse affix file line 676: SFX 00 r da/ ar
Nuspell error: missing flags in line 677
Nuspell error: could not parse affix file line 677: SFX 00 r ts/ ar
Nuspell error: missing flags in line 678
Nuspell error: could not parse affix file line 678: SFX 00 r des/ ar
Nuspell error: missing flags in line 679
Nuspell error: could not parse affix file line 679: SFX 00 ar o/ ar
Nuspell error: missing flags in line 680
Nuspell error: could not parse affix file line 680: SFX 00 ar e/ ar
Nuspell error: missing flags in line 681
Nuspell error: could not parse affix file line 681: SFX 00 ar 0/ ar
Nuspell error: missing flags in line 682
Nuspell error: could not parse affix file line 682: SFX 00 ar es/ ar
Nuspell error: missing flags in line 683
Nuspell error: could not parse affix file line 683: SFX 00 r 0/ ar
Nuspell error: missing flags in line 684
Nuspell error: could not parse affix file line 684: SFX 00 ar em/ ar
Nuspell error: missing flags in line 685
Nuspell error: could not parse affix file line 685: SFX 00 r m/ ar
Nuspell error: missing flags in line 686
Nuspell error: could not parse affix file line 686: SFX 00 ar eu/ ar
Nuspell error: missing flags in line 687
Nuspell error: could not parse affix file line 687: SFX 00 r u/ ar
Nuspell error: missing flags in line 688
Nuspell error: could not parse affix file line 688: SFX 00 ar en/ ar
Nuspell error: missing flags in line 689
Nuspell error: could not parse affix file line 689: SFX 00 r va/ ar
Nuspell error: missing flags in line 690
Nuspell error: could not parse affix file line 690: SFX 00 r ves/ ar
Nuspell error: missing flags in line 691
Nuspell error: could not parse affix file line 691: SFX 00 r va/ ar
Nuspell error: missing flags in line 692
Nuspell error: could not parse affix file line 692: SFX 00 ar àvem/ ar
Nuspell error: missing flags in line 693
Nuspell error: could not parse affix file line 693: SFX 00 ar àveu/ ar
Nuspell error: missing flags in line 694
Nuspell error: could not parse affix file line 694: SFX 00 r ven/ ar
Nuspell error: missing flags in line 695
Nuspell error: could not parse affix file line 695: SFX 00 ar í/ ar
Nuspell error: missing flags in line 696
Nuspell error: could not parse affix file line 696: SFX 00 r res/ ar
Nuspell error: missing flags in line 697
Nuspell error: could not parse affix file line 697: SFX 00 ar à/ ar
Nuspell error: missing flags in line 698
Nuspell error: could not parse affix file line 698: SFX 00 ar àrem/ ar
Nuspell error: missing flags in line 699
Nuspell error: could not parse affix file line 699: SFX 00 ar àreu/ ar
Nuspell error: missing flags in line 700
Nuspell error: could not parse affix file line 700: SFX 00 r ren/ ar
Nuspell error: missing flags in line 701
Nuspell error: could not parse affix file line 701: SFX 00 r ré/ ar
Nuspell error: missing flags in line 702
Nuspell error: could not parse affix file line 702: SFX 00 r ràs/ ar
Nuspell error: missing flags in line 703
Nuspell error: could not parse affix file line 703: SFX 00 r rà/ ar
Nuspell error: missing flags in line 704
Nuspell error: could not parse affix file line 704: SFX 00 r rem/ ar
Nuspell error: missing flags in line 705
Nuspell error: could not parse affix file line 705: SFX 00 r reu/ ar
Nuspell error: missing flags in line 706
Nuspell error: could not parse affix file line 706: SFX 00 r ran/ ar
Nuspell error: missing flags in line 707
Nuspell error: could not parse affix file line 707: SFX 00 r ria/ ar
Nuspell error: missing flags in line 708
Nuspell error: could not parse affix file line 708: SFX 00 r ries/ ar
Nuspell error: missing flags in line 709
Nuspell error: could not parse affix file line 709: SFX 00 r ria/ ar
Nuspell error: missing flags in line 710
Nuspell error: could not parse affix file line 710: SFX 00 r ríem/ ar
Nuspell error: missing flags in line 711
Nuspell error: could not parse affix file line 711: SFX 00 r ríeu/ ar
Nuspell error: missing flags in line 712
Nuspell error: could not parse affix file line 712: SFX 00 r rien/ ar
Nuspell error: missing flags in line 713
Nuspell error: could not parse affix file line 713: SFX 00 ar e/ ar
Nuspell error: missing flags in line 714
Nuspell error: could not parse affix file line 714: SFX 00 ar i/ ar
Nuspell error: missing flags in line 715
Nuspell error: could not parse affix file line 715: SFX 00 ar es/ ar
Nuspell error: missing flags in line 716
Nuspell error: could not parse affix file line 716: SFX 00 ar is/ ar
Nuspell error: missing flags in line 717
Nuspell error: could not parse affix file line 717: SFX 00 ar e/ ar
Nuspell error: missing flags in line 718
Nuspell error: could not parse affix file line 718: SFX 00 ar i/ ar
Nuspell error: missing flags in line 719
Nuspell error: could not parse affix file line 719: SFX 00 ar em/ ar
Nuspell error: missing flags in line 720
Nuspell error: could not parse affix file line 720: SFX 00 ar eu/ ar
Nuspell error: missing flags in line 721
Nuspell error: could not parse affix file line 721: SFX 00 ar en/ ar
Nuspell error: missing flags in line 722
Nuspell error: could not parse affix file line 722: SFX 00 ar in/ ar
Nuspell error: missing flags in line 723
Nuspell error: could not parse affix file line 723: SFX 00 ar és/ ar
Nuspell error: missing flags in line 724
Nuspell error: could not parse affix file line 724: SFX 00 r ra/ ar
Nuspell error: missing flags in line 725
Nuspell error: could not parse affix file line 725: SFX 00 ar às/ ar
Nuspell error: missing flags in line 726
Nuspell error: could not parse affix file line 726: SFX 00 ar essis/ ar
Nuspell error: missing flags in line 727
Nuspell error: could not parse affix file line 727: SFX 00 r res/ ar
Nuspell error: missing flags in line 728
Nuspell error: could not parse affix file line 728: SFX 00 r ssis/ ar
Nuspell error: missing flags in line 729
Nuspell error: could not parse affix file line 729: SFX 00 ar esses/ ar
Nuspell error: missing flags in line 730
Nuspell error: could not parse affix file line 730: SFX 00 r sses/ ar
Nuspell error: missing flags in line 731
Nuspell error: could not parse affix file line 731: SFX 00 ar és/ ar
Nuspell error: missing flags in line 732
Nuspell error: could not parse affix file line 732: SFX 00 r ra/ ar
Nuspell error: missing flags in line 733
Nuspell error: could not parse affix file line 733: SFX 00 ar às/ ar
Nuspell error: missing flags in line 734
Nuspell error: could not parse affix file line 734: SFX 00 ar éssim/ ar
Nuspell error: missing flags in line 735
Nuspell error: could not parse affix file line 735: SFX 00 ar àrem/ ar
Nuspell error: missing flags in line 736
Nuspell error: could not parse affix file line 736: SFX 00 ar àssim/ ar
Nuspell error: missing flags in line 737
Nuspell error: could not parse affix file line 737: SFX 00 ar éssem/ ar
Nuspell error: missing flags in line 738
Nuspell error: could not parse affix file line 738: SFX 00 ar àssem/ ar
Nuspell error: missing flags in line 739
Nuspell error: could not parse affix file line 739: SFX 00 ar éssiu/ ar
Nuspell error: missing flags in line 740
Nuspell error: could not parse affix file line 740: SFX 00 ar àreu/ ar
Nuspell error: missing flags in line 741
Nuspell error: could not parse affix file line 741: SFX 00 ar àssiu/ ar
Nuspell error: missing flags in line 742
Nuspell error: could not parse affix file line 742: SFX 00 ar ésseu/ ar
Nuspell error: missing flags in line 743
Nuspell error: could not parse affix file line 743: SFX 00 ar àsseu/ ar
Nuspell error: missing flags in line 744
Nuspell error: could not parse affix file line 744: SFX 00 ar essin/ ar
Nuspell error: missing flags in line 745
Nuspell error: could not parse affix file line 745: SFX 00 r ren/ ar
Nuspell error: missing flags in line 746
Nuspell error: could not parse affix file line 746: SFX 00 r ssin/ ar
Nuspell error: missing flags in line 747
Nuspell error: could not parse affix file line 747: SFX 00 ar essen/ ar
Nuspell error: missing flags in line 748
Nuspell error: could not parse affix file line 748: SFX 00 r ssen/ ar

Either a fix is needed upstream in this dictionary or Nuspell should be able to handle whatever is causing this error. (Hunspell version 1.6.2 can load this dictionary without errors.)

Similar errors are reported for the ca_ES-valencia dictionary in the same Debian package.

LGPL "v3 only" or "v3 or later"?

Dear nuspell developers,

I'm aware that we don't have LGPL v4; yet I noticed an inconsistency:

I think it would be nice to have all places say "v3 or later" consistently, provided that that's the actual license. What do you think?

Thanks and best, Sebastian

Javascript bindings via webassembly

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

(Probably more close to Others part)

Other issues

I am aware current development stage is very early and also language bindings are currently targeted post-2 initial MVP stage, but please allow raising questions bit earlier. Feel freely close issue as needed.

I am currently maintaining / using hunspell-asm, javascript bindings to hunspell and exposes near 1:1 interfaces to original hunspell's C api (near in terms for exposed interfaces, I omitted lot of interfaces I do not use currently). Basic idea is compile c/cpp code of hunspell via emscripten to compile web assembly binary, and provide glue / interface javascript wrapper to easily access those.

With new nuspell, when it comes to think of language bindings, how do you think about providing wasm bindings as js binding for official support? There are small handful of macro-defined emscripten specific to bind cpp classes to javascript (https://kripken.github.io/emscripten-site/docs/porting/connecting_cpp_and_javascript/embind.html), which could be possibly user-land by manually patching main repo code but curious if there's possibility those changes could go upstream and js binding lives as sort of official one instead.

I can definitely try out basic changes / CI setups if needed, I did similar things on hunspell-asm mostly isolated via docker images so could easily run to CI to produce build.

Proper Unicode case transformations of single code point

Most of the case transformations are done properly according to Unicode specs, e.g. a whole string is made in uppercase.

At few places I use u_toupper() and u_tolower(). Instead I should use something like to_lower_char_at() which is able to do 1 to many case transformations. (eg. sharp s to double S). Make it efficient by minimizing branches and byte copying.

Avoid lagging for edge cases while suggesting

Hunspell uses 3 techniques.

  1. Add time limits in the slow algorithms (O(n^2) or more). Hunspell uses this in only 3 of all 12 algorithms. It does the time check only after it checks 100 candidates. If the time limit was exceeded, it stops, otherwise it goes on for another 100 attempts and checks the time limit again and so on.
  2. Firstly, check the candidates of all 12 algorithms only as simple words, and only if there are no suggestions, rerun the same algorithms checking the candidates as compounds. This shuffling speeds up the case when there are suggestions, but it does not speed up the case when there are not any suggestions.
  3. Limit the number of suggestions. Hunspell has a hard limit to 15 suggestions. From my experience this condition is very rarely hit, so it is the least effective. It is probably for some pathological cases.

Reduce binary size by reducing templates

See the git branch less-templates for initial work.

Some more work can be done to avoid branches (if in C++). The compile-time branches became run-time branches in that code and that lowers the performance for a few percent. Some of those ifs are invariant in those hot loops and can be moved to higher levels.

Compiling error under MSVC

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Reporting bugs

When reporting a bug you must tell us the state of your system and the
steps to reproduce the bug. For the state please specify the following:

key value
OS, distro, version = Windows 7
Hunspell version =
Dictionary,package name,version =
Command line tool or GUI app = VS 2017 15.4

Steps to reproduce

Bugged behavior (output)

1>------ Build started: Project: Project1, Configuration: Debug Win32 ------
1>affentry.cxx
1>c:\dev\nuspell-master\src\hunspell\hunzip.hxx(70): warning C4251: 'Hunzip::fin': class 'std::basic_ifstream<char,std::char_traits>' needs to have dll-interface to be used by clients of class 'Hunzip'
1>c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.14.26428\include\iosfwd(608): note: see declaration of 'std::basic_ifstream<char,std::char_traits>'
1>c:\dev\nuspell-master\src\hunspell\hunzip.hxx(72): warning C4251: 'Hunzip::dec': class 'std::vector<bit,std::allocator<_Ty>>' needs to have dll-interface to be used by clients of class 'Hunzip'
1> with
1> [
1> _Ty=bit
1> ]
1>c:\dev\nuspell-master\src\hunspell\hunzip.hxx(72): note: see declaration of 'std::vector<bit,std::allocator<_Ty>>'
1> with
1> [
1> _Ty=bit
1> ]
1>c:\dev\nuspell-master\src\hunspell\affentry.hxx(121): warning C4267: 'return': conversion from 'size_t' to 'short', possible loss of data
1>c:\dev\nuspell-master\src\hunspell\affentry.hxx(202): warning C4267: 'return': conversion from 'size_t' to 'short', possible loss of data
1>dictionary.cxx
1>Info: Boost.Config is older than your compiler version - probably nothing bad will happen - but you may wish to look for an update Boost version. Define BOOST_CONFIG_SUPPRESS_OUTDATED_MESSAGE to suppress this message.
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(230): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(231): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(235): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(233): error C2091: function returns function
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(295): note: see reference to class template instantiation 'nuspell::String_Set' being compiled
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(234): error C2091: function returns function
1>finder.cxx
1>locale_utils.cxx
1>Info: Boost.Config is older than your compiler version - probably nothing bad will happen - but you may wish to look for an update Boost version. Define BOOST_CONFIG_SUPPRESS_OUTDATED_MESSAGE to suppress this message.
1>c:\dev\nuspell-master\src\nuspell\locale_utils.cxx(31): fatal error C1083: Cannot open include file: 'unicode/uchar.h': No such file or directory
1>main.cxx
1>Info: Boost.Config is older than your compiler version - probably nothing bad will happen - but you may wish to look for an update Boost version. Define BOOST_CONFIG_SUPPRESS_OUTDATED_MESSAGE to suppress this message.
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(230): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(231): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(235): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(233): error C2091: function returns function
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(295): note: see reference to class template instantiation 'nuspell::String_Set' being compiled
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(234): error C2091: function returns function
1>structures.cxx
1>Info: Boost.Config is older than your compiler version - probably nothing bad will happen - but you may wish to look for an update Boost version. Define BOOST_CONFIG_SUPPRESS_OUTDATED_MESSAGE to suppress this message.
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(230): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(231): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(235): warning C4068: unknown pragma
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(233): error C2091: function returns function
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(295): note: see reference to class template instantiation 'nuspell::String_Set' being compiled
1>c:\dev\nuspell-master\src\nuspell\structures.hxx(234): error C2091: function returns function
1>firstparser.cxx
1>Generating Code...
1>Done building project "Project1.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

image

New dictionary package Latvian results in core dump

Reported on Ubuntu 18.10 with Nuspell 2.2. Latvian dictionary recently got refactored moved from myspell-lv (which worded) to hunspell-lv version 1.3.0-2. This new package results in a core dump with the following output:

terminate called after throwing an instance of 'std::invalid_argument'
  what():  closing bracket has no matching opening bracket
Aborted (core dumped)

Either a fix is needed upstream in this dictionary or Nuspell should be able to handle whatever is causing this error. (Hunspell version 1.6.2 can load this dictionary.)

Building on FreeBSD

There is a small issue building Nuspell on FreeBSD and other flavors of BSD. When following https://github.com/hunspell/nuspell#building-on-freebsd-netbsd-openbsd-and-bsd-variants the build process stops after:

autoreconf -vfi
./configure --with-warnings CXXFLAGS='-g -O0 -Wall -Wextra'

with this message:

...
checking for Boost headers version >= 1.62.0... yes
checking for Boost's header version... 1_66
checking for the toolset name used by Boost for g++... configure: WARNING: could not figure out which toolset name to use for g++

checking boost/system/error_code.hpp usability... yes
checking boost/system/error_code.hpp presence... yes
checking for boost/system/error_code.hpp... yes
checking for the Boost system library... yes
checking boost/locale.hpp usability... yes
checking boost/locale.hpp presence... yes
checking for boost/locale.hpp... yes
checking for the Boost locale library... no
configure: error: cannot find the flags to link with Boost locale

and the file config.log contains errors such as:

...
configure:18007: g++ -o conftest -g -O0 -Wall -Wextra -I/usr/local/include   conftest.o -lboost_system--mt-1_66  >&5
/usr/local/bin/ld: cannot find -lboost_system--mt-1_66
collect2: error: ld returned 1 exit status
...
configure:18296: g++ -o conftest -g -O0 -Wall -Wextra -I/usr/local/include    conftest.o -lboost_locale  -lboost_system >&5
conftest.o: In function `boost::locale::generator::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const': /usr/local/include/boost/locale/generator.hpp:202: undefined reference to `boost::locale::generator::generate(std::__c xx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
collect2: error: ld returned 1 exit status
...

How can this be fixed?

Checklist 2 Compounding

  • Decipher v1 code
  • Simple compuding with BREAK
  • Advanced compounding. Has 3 major modes:
    • With flags only (COMPOUNDFLAG, COMPOUNDMIDDLE, ...) + CHECKCOMPOUND patterns without replacing.
    • With flags only + CHECKCOMPOUND patterns with replacing
    • COMPOUNDRULE

[regression] Fails to build on FreeBSD

key value
Operating system, distribution, version = FreeBSD 11.3
Nuspell version = cacd0c8

Steps to reproduce

$ pkg install git cmake ninja icu boost-libs
$ git clone https://github.com/nuspell/nuspell/
$ cd nuspell
$ git submodule update --init --recursive
$ cmake -GNinja .
$ ninja

Bugged behavior (output)

Regressed by 63a696f

[5/29] Building CXX object src/nuspell/CMakeFiles/nuspell.dir/locale_utils.cxx.o
FAILED: src/nuspell/CMakeFiles/nuspell.dir/locale_utils.cxx.o
/usr/bin/c++   -isystem /usr/local/include -std=gnu++14 -MD -MT src/nuspell/CMakeFiles/nuspell.dir/locale_utils.cxx.o -MF src/nuspell/CMakeFiles/nuspell.dir/locale_utils.cxx.o.d -o src/nuspell/CMakeFiles/nuspell.dir/locale_utils.cxx.o -c src/nuspell/locale_utils.cxx
src/nuspell/locale_utils.cxx:37:2: error: "Platform has poor Unicode support. wchar_t must be Unicode."
#error "Platform has poor Unicode support. wchar_t must be Unicode."
 ^
1 error generated.
ninja: build stopped: subcommand failed.

Expected behavior (output)

[29/29] Linking CXX executable tests/unit_test

Change/feature request

Restore && !defined(__FreeBSD__) and maybe add && !defined(__DragonFly__). None of ICU conditionals match even on development version of FreeBSD.

Hu move rule

I already did a lot code specific for Hungarian, and there is the last missing piece called "Hu-mov" rule which is out of scope for now. There is some code in compounding and very little in suggestions to be added.

Testing

The issues can be of three types (put an X between the brackets):

  • Bug reports
  • Change request or feature request
  • Others, questions

Other issues

Is there a alpha/beta release for nuspell in mac?

I can perform test for known test cases and create new test cases.

Binary dictionary

The issues can be of three types (delete two rows and leave only one):

  • Others, questions

Reporting bugs

When reporting a bug you must tell us the state of your system and the
steps to reproduce the bug. For the state please specify the following:

key value
Operating system, distribution, version = Any
Nuspell version = 2.2
Dictionary, package name, version = Any
Command-line tool or GUI application = Nul

Other issues

Can we create a binary dictionary for distribution?
If yes, can the binary dictionary be able to convert to UTF-8 format back?
How to create binary version of dictionary?
Is there is any performance loss in using binary dictionary?

Attempt to use Catch2 via find_package in CMake

Change/feature request

This is low priority because most Linux distros don't provide Catch2 in their package managers. The search for Catch2 in the Cmake scripts should be as follows:

  1. Try find_pacakge(Catch2)
  2. If that fails, try the current approach for git submodule + add_subdirectory()
  3. If that fails don't build the testsuite and warn.

We already have 2 and 3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.