This repo contains various sequential models used to classify sentiment of sentence.
Base codes are based on this great sentiment-analysis tutorial.
In this project, I specially used Korean corpus NSMC (Naver Sentiment Movie Corpus) to apply torchtext into Korean dataset.
And I also used soynlp library which is used to tokenize Korean sentence. It is really nice and easy to use, you should try if you handle Korean sentences :)
- Number of train data: 105,000
- Number of validation data: 45,000
- Number of test data: 50,000
- Number of possible class: 2 (pos / neg)
Example:
{
'text': '['액션', '이', '없는', '데도', '재미', '있는', '몇안되는', '영화'],
'label': 'pos'
}
- Following libraries are fundamental to this repo. Since I used conda environment
requirements.txt
has much more dependent libraries. - If you encounters any dependency problem, just use following command
pip install -r requirements.txt
numpy==1.16.4
pandas==0.25.1
scikit-learn==0.21.3
soynlp==0.0.493
torch==1.2.0
torchtext==0.4.0
- In this repository, following models are implemented to analyze sentiment of input sentence. Other famous classification also models will be updated!
- Before training the model, you should train
soynlp tokenizer
on your training dataset and build vocabulary using following code. - By running following code, you will get
tokenizer.pickle
,text.pickle
andlabel.pickle
which are used to train, test model and predict user's input sentence
python build_pickle.py
- For training, run
main.py
with train mode (which default option)
python main.py --model MODEL_NAME
- For testing, run
main.py
with test mode
python main.py --model MODEL_NAME --mode test
- For predicting, run
predict.py
with your Korean input sentence. - Don't forget to wrap your input with double quotation mark !
python predict.py --model MODEL_NAME --input "YOUR_INPUT"
[in] >> 노잼 뻔한 스토리 뻔한 결말...
[out] >> 0.84 % : Negative
[in] >> 마음도 따뜻.마요미의 진가. 그리고 감동. 뭐 힐링타임용으로 무난한 가족영화탄생~^^
[out] >> 97.64 % : Positive
[in] >> 클리쉐 덩어리 예산도 적게들었을듯 한데 마지막 관중조차 CG
[out] >> 26.68 % : Negative
- You can test trained model using following code
curl -X POST https://us-central1-nlp-api-252209.cloudfunctions.net/sentiment
-H 'Content-Type:application/json'
-d '{"input":"YOUR INPUT IN KOREAN"}