Giter Club home page Giter Club logo

bixin's Introduction

bixin

Build Status PyPI

Chinese Sentiment Analysis base on dictionary and rules.

CHANGELOG

prior to v0.0.4, bixin depends on cppjieba-py, which requires a c++ 11 compillation makes hard to use, I decided to use jieba_fast.

it will solve the following problems:

  • hard to install the dependency cppjieba-py
  • can't load user dictionary
  • word segment difference from jieba

but it slower than use cppjieba-py

Installation

> pip3 install bixin

Usage

    from bixin import predict
    text ="幸福每时每刻都会像路边的乞丐一样出现在你面前。要是你觉得你所梦想的幸福不是这样的,因而断言你的幸福已死亡,你只接受符合你的原则和心愿的幸福,那么你就会落得不幸。"
    # 出自安德烈·纪德《人间食粮》
    predict(text)
    # sentiment score: 0.42

sentiment score is in the range of -1 to 1

predict will load dictionary data at first time,to load it manually use predict.classifier.initialize()

Accuracy

Test with 6226 taged corpus mixed up with shopping reviews 、Sina Weibo tweets 、hotel reviews 、news and financial news

accuracy: 0.827771

Notice:neutral texts are all ignored.

details about test dataset see wiki 关于测试数据集

Development

> pip3 install -e ".[dev]" git+https://github.com/bung87/bixin

./dictionaries dictionaries from vary sources
./data processed dictionaries through ./scripts/tagger.py
./scripts/release_data.py release data to package

./scripts/score.py

all data archives: https://github.com/bung87/bixin/releases/tag/v0.0.1

run accuray testing with all .txt files under test_data directory sentence per line end with a space and a tag n or p

Test

nosetests -c nose.cfg for single python version
tox for multiple python versions

Acknowledgments

bixin was inspired by dongyuanxin's DictEmotionAlgorithm

Support me

支付宝:

支付宝

License

MIT © bung

bixin's People

Contributors

bung87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bixin's Issues

关于引用

请问作者能否提供以下引用信息?

Failed to install

I tried to install bixin on Win10, but it always shows the error "Failed building wheel for cppjieba_py" even after reinstalling and updating wheel and trying --no-cache-dir. I am a novice and have no idea what to do. Thanks!

Permission Denied. What to do?

PermissionError: [Errno 13] Permission denied:

~\Anaconda3\lib\site-packages\bixin_init_.py in initialize(self, include_evalution_dict, include_tc)
71 # places = data["places"]
72 self._initialize(pos_emotion, pos_evaluation, neg_emotion,
---> 73 neg_evaluation, degrees, negations)
74
75 def predict(self, news, debug=False):

~\Anaconda3\lib\site-packages\bixin_init_.py in _initialize(self, pos_emotion, pos_evaluation, neg_emotion, neg_evaluation, degrees, negations)
53 with tempfile.NamedTemporaryFile(suffix=".txt",mode="w",encoding="utf-8") as f:
54 f.write("\n".join(pos_neg.union(pos_neg_eva)))
---> 55 tokenizer.load_userdict(f.name)
56
57 self.initialized = True

~\Anaconda3\lib\site-packages\jieba_fast_init_.py in load_userdict(self, f)
379 if isinstance(f, string_types):
380 f_name = f
--> 381 f = open(f, 'rb')
382 else:
383 f_name = resolve_filename(f)

load_userdict failure

Hello, I installed bixin and tried to use it. However, I can't do a prediction as there's no tmpy34pb3x7.txt file in the Temp folder. How to fix this problem? Thank you.

安装失败

(venv) C:\ub16_prj\bixin>pip3 install -e ".[dev]" git+https://github.com/bung87/bixin
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting git+https://github.com/bung87/bixin
Cloning https://github.com/bung87/bixin to c:\temp\pip-req-build-f06eh4tp
Obtaining file:///C:/ub16_prj/bixin
Collecting cppjieba-py (from bixin==0.0.2)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/53/ed41d2fa14a6fa38850eec69a5f97d0097fe104ae4faa7ebca8b166fee4d/cppjieba_py-0.0.10.tar.gz (5.0MB)
100% |████████████████████████████████| 5.0MB 3.4MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Temp\pip-install-n7xdrboy\cppjieba-py\setup.py", line 129, in
long_description= open("README.md").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 153: illegal multibyte sequence

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.