Light

wangxigui / cnsegmenter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pp1230/cnsegmenter

0.0 2.0 0.0 2.75 MB

CN segmenter based on Hidden Markov Model.

Java 100.00%

cnsegmenter's Introduction

CNSegmenter

##模型本项目使用了隐马尔科夫模型分词，使用pku的语料库进行训练。马尔可夫模型（Markov Model）是一种统计模型，广泛应用在语音识别，词性自动标注，音字转换，概率文法等各个自然语言处理等应用领域。经过长期发展，尤其是在语音识别中的成功应用，使它成为一种通用的统计工具。公式：argmax C P(O|C)P(C)，其中：P(C1，C2，C3….Ci) = P(C1)P(C2|C1)P(C3|C2)…P(Ci|Ci-1)，我们称P(Cj|Ci)为i到j的状态转移概率，P(O|C) = P(O1|C) * P(O2|C)……P(Oi|C) = P(O1|C1) * P(O2|C2)……P(Oi|Ci)，P(O|C)是观察值数据的概率，都可以通过在大量语料中使用统计的方法求得。

##使用运行 SegmenterTest 的testSegment()方法即可分词。

##注意由于语料库有限，所以分词的准确度较低，供学习使用。

更多内容：PHY的博客

cnsegmenter's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.