Giter Club home page Giter Club logo

harmonious_dictionary's Introduction

和谐宝典

和谐宝典用于检查输入是否包含中文或英文敏感词,并可替换为特殊字符。生活在天朝,和谐宝典必须人手必备。

特点

  • 速度快,比常规的正则匹配要快10倍以上,具体性能可运行benchmark查看
  • 可以输出检测到的敏感词,请看初始
  • 简单,可根据需要方便的调整敏感词字库

默认使用

validate_harmonious_of *attr_names
  • 检查输入是否包含敏感词 HarmoniousDictionary.clean?(your_input)
  • 检查包含的敏感词 HarmoniousDictionary.harmonious_words(your_input)
  • 替换包含的敏感词为* HarmoniousDictionary.clean(your_input)

自定义使用

你可以使用相应模型下敏感词库,例如

validate_harmonious_of [:title, :body], model: post

这时将会使用 post_chinese_dictionary.hashpost_english_dictionary.yml 敏感词库

同理使用这两个敏感词库

  • 检查输入是否包含敏感词 HarmoniousDictionary.clean?(your_input, 'post')
  • 检查包含的敏感词 HarmoniousDictionary.harmonius_words(your_input, 'post')
  • 替换包含的敏感词为 HarmoniousDictionary.clean(your_input, 'post')

(注:自定义使用时必须有相对应的铭感词库)

安装

准备Gemfile

gem harmonious_dictionary

创建必要的配置文件

rails g harmonious_dictionary:setup

然后你需要准备敏感词字库,如果你已经有自己的敏感词库,请把中文和英文词对应复制到项目目录config/harmonious_dictionary/下的chinese_dictionary.txtenglish_dictionary.txt。如果还没有自己的词库,就用和谐宝典预备的好了,到 https://github.com/downloads/wear/harmonious_dictionary/dictionaries.zip 下载,解压,并替换

最后需要生成序列化的词库,默认使用直接运行

rake harmonious_dictionary:generate 

将会生成 harmonious.hashharmonious_english.yml 词库

自定义使用运行

rake harmonious_dictionary:generate model = post

将会生成 post_harmonious.hashpost_harmonious_english.yml 词库

ruby 1.8下使用

对于rails2系列,请使用1.8的branch,以plugin形式使用。敏感词库的配置都在插件harmonious_dictionary里配置

原理

不同与常规敏感词检查正则匹配,和谐宝典对输入的中文以给定敏感词字库做为分词词库做分词处理,把里里面的内容找出来。算法采用自yzhang的https://github.com/yzhang/rseg,和谐宝典对算法做了简化处理以提高效率。

贡献者

License

MIT license

联系我

[email protected]

harmonious_dictionary's People

Contributors

jyootai avatar kzjeef avatar wear avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.