Giter Club home page Giter Club logo

errortextdetection's Introduction

拒识文本分类

这里部署工程级别的项目

关键词:

须知

每次测试的结果将写入result.csv

0. 数据准备

至少包含 [sent, target] data.csv

1. 预处理

python preprocess.py

生成[sent,sent_chars,sent_words,target]

data_train.csv data_test.csv

2. 规则过滤

python FilterRules.py

(这将会增加一列['isFilter'],默认为None若被过滤则显示违反的规则,如_islen

  • 使用探索模式(评估过滤器效果),则在__main__中将filtering函数注释,并去掉exploring的注释
  • 如需更改过滤器的规则,则更改toFilter函数
python FilterRules.py -task exploring

探索模式将会评估当前规则的准确率

3.1 来自语言模型的特征

(1) 训练语言模型

python LangModelMgr.py
python LangModelMgr.py -n 2 -dtype words -dsource std -dname weibo

(2) 特征工程

python FeatureEngr.py

data_train_feat.csv data_test_feat.csv

(3) 特征筛选

python Visualization.py

生成关于特征和标签之间的 皮尔森相关系数热力图

python Visualization.py -plot len l3_neg_ppl
python FeatureEngr.py -del len 

(4) 判别式模型

python DiscriminantModel.py

/Model *.model文件

基于词向量

(1) 获得词向量

python ToVectorMgr.py

data_train_chars_d2v.vec data_test__chars_d2v.vec

  • 这里默认使用文档级的 Doc2Vec
  • 文档级别的Word2Vec (尚未实现)
  • 词表级别的WordList2Vec (尚未实现)

(2) 生成式模型

python GenerativeModel.py
  • 默认使用SVM模型,可选LR或MLP

神经网络

python DeepNet.py
  • 默认使用fasttext
python DeepNet.py -net textcnn

集成学习

python Ensenmble.py

errortextdetection's People

Contributors

sixingyan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.