Giter Club home page Giter Club logo

pytorch-sentiment-analysis-kor's Introduction

Sentiment Analysis PyTorch implementations

This repo contains various sequential models used to classify sentiment of sentence.

Base codes are based on this great sentiment-analysis tutorial.

In this project, I specially used Korean corpus NSMC (Naver Sentiment Movie Corpus) to apply torchtext into Korean dataset.

And I also used soynlp library which is used to tokenize Korean sentence. It is really nice and easy to use, you should try if you handle Korean sentences :)


Overview

  • Number of train data: 105,000
  • Number of validation data: 45,000
  • Number of test data: 50,000
  • Number of possible class: 2 (pos / neg)
Example:
{
  'text': '['액션', '이', '없는', '데도', '재미', '있는', '몇안되는', '영화'], 
  'label': 'pos'
}

Requirements

  • Following libraries are fundamental to this repo. Since I used conda environment requirements.txt has much more dependent libraries.
  • If you encounters any dependency problem, just use following command
    • pip install -r requirements.txt
numpy==1.16.4
pandas==0.25.1
scikit-learn==0.21.3
soynlp==0.0.493
torch==1.2.0
torchtext==0.4.0

Models


Usage

  • Before training the model, you should train soynlp tokenizer on your training dataset and build vocabulary using following code.
  • By running following code, you will get tokenizer.pickle, text.pickle and label.pickle which are used to train, test model and predict user's input sentence
python build_pickle.py
  • For training, run main.py with train mode (which default option)
python main.py --model MODEL_NAME
  • For testing, run main.py with test mode
python main.py --model MODEL_NAME --mode test 
  • For predicting, run predict.py with your Korean input sentence.
  • Don't forget to wrap your input with double quotation mark !
python predict.py --model MODEL_NAME --input "YOUR_INPUT"

Example

[in]  >> 노잼 뻔한 스토리 뻔한 결말...
[out] >> 0.84 % : Negative

[in]  >> 마음도 따뜻.마요미의 진가. 그리고 감동. 뭐 힐링타임용으로 무난한 가족영화탄생~^^
[out] >> 97.64 % : Positive

[in]  >> 클리쉐 덩어리 예산도 적게들었을듯 한데 마지막 관중조차 CG
[out] >> 26.68 % : Negative

  • You can test trained model using following code
curl -X POST https://us-central1-nlp-api-252209.cloudfunctions.net/sentiment 
 -H 'Content-Type:application/json' 
 -d '{"input":"YOUR INPUT IN KOREAN"}

pytorch-sentiment-analysis-kor's People

Contributors

huffon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.