The pytorch-sentiment-analysis-kor from huffon

Sentiment Analysis PyTorch implementations

This repo contains various sequential models used to classify sentiment of sentence.

Base codes are based on this great sentiment-analysis tutorial.

In this project, I specially used Korean corpus NSMC (Naver Sentiment Movie Corpus) to apply torchtext into Korean dataset.

And I also used soynlp library which is used to tokenize Korean sentence. It is really nice and easy to use, you should try if you handle Korean sentences :)

Overview

Number of train data: 105,000
Number of validation data: 45,000
Number of test data: 50,000
Number of possible class: 2 (pos / neg)

Example:
{
  'text': '['액션', '이', '없는', '데도', '재미', '있는', '몇안되는', '영화'], 
  'label': 'pos'
}

Requirements

Following libraries are fundamental to this repo. Since I used conda environment requirements.txt has much more dependent libraries.
If you encounters any dependency problem, just use following command
- pip install -r requirements.txt

numpy==1.16.4
pandas==0.25.1
scikit-learn==0.21.3
soynlp==0.0.493
torch==1.2.0
torchtext==0.4.0

Models

In this repository, following models are implemented to analyze sentiment of input sentence. Other famous classification also models will be updated!

Usage

Before training the model, you should train soynlp tokenizer on your training dataset and build vocabulary using following code.
By running following code, you will get tokenizer.pickle, text.pickle and label.pickle which are used to train, test model and predict user's input sentence

python build_pickle.py

For training, run main.py with train mode (which default option)

python main.py --model MODEL_NAME

For testing, run main.py with test mode

python main.py --model MODEL_NAME --mode test

For predicting, run predict.py with your Korean input sentence.
Don't forget to wrap your input with double quotation mark !

python predict.py --model MODEL_NAME --input "YOUR_INPUT"

Example

[in]  >> 노잼 뻔한 스토리 뻔한 결말...
[out] >> 0.84 % : Negative

[in]  >> 마음도 따뜻.마요미의 진가. 그리고 감동. 뭐 힐링타임용으로 무난한 가족영화탄생~^^
[out] >> 97.64 % : Positive

[in]  >> 클리쉐 덩어리 예산도 적게들었을듯 한데 마지막 관중조차 CG
[out] >> 26.68 % : Negative

You can test trained model using following code

curl -X POST https://us-central1-nlp-api-252209.cloudfunctions.net/sentiment 
 -H 'Content-Type:application/json' 
 -d '{"input":"YOUR INPUT IN KOREAN"}

huffon / pytorch-sentiment-analysis-kor Goto Github PK

pytorch-sentiment-analysis-kor's Introduction

Sentiment Analysis PyTorch implementations

Overview

Requirements

Models

Usage

Example

pytorch-sentiment-analysis-kor's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent