Comments (1)
The algorithm is described in this paper, "Applying Conditional Random Fields to Japanese Morphological Analysis", though you'll probably have to look at the code with details.
The basic idea is it builds a lattice and then uses the Viterbi algorithm to find the cheapest path through the lattice. The tricky part is handling unknown words, which have dictionary entries (and costs) generated on the fly based on number and type of characters.
from mecab.
Related Issues (20)
- matrix right/left dimension checking is inconsistent (compiling user dictionary/assigning user dict costs) HOT 3
- mecab-dict-gen crashes after a long time
- Memoly leak when use python-wrapper and input string is too long
- Installing mecab HOT 1
- Meet a undefined reference to '__imp__ZN5MeCab12createTaggerEPKc' when running the example.cpp HOT 2
- Words do not get divided properly when small letters (捨て仮名) are included in word HOT 3
- Tag repo please HOT 1
- Support for Ruby2.7?
- Failure initializing Tagger has no error message
- 形容詞活用形「正しく」が副詞として扱われる HOT 1
- http://creativecommons.org/licenses/by-sa/3.0/
- Max Grouping Size off-by-one error
- “'gcc' failed with exit status 1” when trying to install Mecab with PyPy docker image HOT 1
- WPATH_FORCE() not defined on windows when compiling with msvc.
- Output Format HOT 1
- How to set --input-buffer-size when using -p option
- Build fails with LTO
- The validation of Dictionary::assignUserDictionaryCosts() is inappropriate
- Intentional fallthrough?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mecab.