Giter Club home page Giter Club logo

chinesenaturallanguageprocess's Introduction

ChineseNaturalLanguageProcess

中⽂⾃然语⾔处理(Chinese NLP)

任务1:中⽂分词(Text segmentation) 中⽂分词指的是使⽤计算机⾃动对中⽂⽂本进⾏词语的切分,即像英⽂那样使得中⽂句⼦中的词之间有空格以标识。中⽂分词被认为是中⽂⾃然语⾔处理中的⼀个最基本的环节。 例: 原始句:南京市⻓江⼤桥。 分词结果1:南京 / 市⻓ / 江⼤桥 / 。 分词结果2:南京市 / ⻓江 / ⼤桥 / 。 关键词:词典(dict),正向最⼤匹配(forward maximum matching),逆向最⼤匹配(backward maximum matching)。

任务2:中⽂新词发现(New word discovery) 处理中⽂时,词典举⾜轻重,但是词典并不全⾯,⽣僻词、新词没有收录,从⽽不能被正确切分出来。新词发现可以帮助分词器识别尚未收录进词典的⽣词,以提升分词性能;亦可⽤于舆情分析,发现热点信息。 例: 原始句:王尼玛表现出⼀脸蓝瘦⾹菇的样⼦。 提取出的新词:王尼玛,蓝瘦⾹菇。 如果没有使⽤新词发现技术,对于原始句的分词结果会是: 王 / 尼 / 玛 / 表现 / 出 / ⼀脸 / 蓝 / 瘦 / ⾹菇 / 的 / 样⼦ / 。 借助新词发现技术后进⾏分词的结果: 王尼玛 / 表现 / 出 / ⼀脸 / 蓝瘦⾹菇 / 的 / 样⼦ / 。 关键词:信息熵(information entropy)。

chinesenaturallanguageprocess's People

Contributors

cogode avatar dantian223 avatar huashao-lang avatar lllllkg avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

dantian223

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.