This is a machine learning algorithum to detect emotions (Angry - Happy - Sad - Neutral - Love) from humans' posts.
Kindly use this deployed model to test: https://emotion-analysis-from-text.herokuapp.com/
you have to install python > 3.7.0, you can download python for windows from here: https://www.python.org/downloads/windows/
pip install --upgrade virtualenv
virtualenv envname
cd envname/
cd bin/
activate
cd ..
cd ..
pip install -r requirements.txt
streamlit run app.py
python train.py
python predict.py --text use_your_text
First I used dataset about English Twitter posts with six basic emotions: anger, fear, joy, love, sadness, and surprise.
The authors constructed a set of hashtags to collect a separate dataset of English tweets from the Twitter API belonging to six basic emotions.
Dataset link: https://www.kaggle.com/parulpandey/emotion-dataset
Second dataset is a collection of tweets annotated with the emotions behind them with 13 emotions
Dataset link: https://www.kaggle.com/pashupatigupta/emotion-detection-from-text
I used a merged version between the two datasets to provide a data set with the following emotion (Angry - Happy - Sad - Neutral - Love)
I merged the two datasets to get a full dataset with the four emotions and you can find the code of merging in
Final Approach/create new data set .ipynb
and result is saved to Final Approach/merged_dataset
The result dataset in conatin :
1- train.csv consists of: 16200 records
2- test.csv consists of: 2200 records
3- val.cs conists of: 2200 records
The dataset labels is as the following {0:'sad', 1:'happy', 2:'love', 3:'anger', 4:'anger', 5:'surprise', 6:'neutral')
1-Remove any HTML tags in the text by using function clean_html
in the code
2-Convert all text to lowercase using function convert_lower
in the code
3- Remove any tags in posts like @name using function cleaning_tags
in the code
4-Remove punctuations in posts using function cleaning_punctuations
in the code
5-Remove any URLS in text using function cleaning_URLs
in the code
6- Remove any number in text using function cleaning_numbers
in the code
7- Remove any stop words in text like (the - this - any -etc.)
8- Stem each word in the text which mean reducing a word to its word stem that affixes to suffixes and prefixes
9- using TFIDF algorithum to to transfer the text to numerical vectors in which the mahcine learning model can under stand it
I made a mchine learning model on the first dataset to detect 6 emotions (Sad - Happy - Love - Anger - Fear - Surprise)
I used the first dataset and train a Lazzycalssifier on cloud to test 27 mahcine learning models for me,
and it give me that the following three models is the best ones:
1- PassiveAggressiveClassifier which gave me accuracy = 87.29%
2- LogisticRegression which gave me accuracy = 85.5%
3-XGBClassifier which gave me accuracy = 87.56%
In this solution the XGBClassifier seem to be the best model for the problem
you can find the model in the following folder: /First Trial Without Neutral CLass - Accuracy 88%/first classical model .ipynb
The problem with the first model was it detect 5 emotions but now Neutral emotion between them so after mergeing two datasets,
I get a data wit 7 emotions but as tha result of mergeing is Unbalanced data I had to drop out the (Fear - Surprise ) emotions,
so this model will predict only the following emotions ( Sad - Happy - Love - Anger - Neutral )
also I run lazzy classfier and give me the same 3 model which are the following:
1- PassiveAggressiveClassifier which gave me accuracy = 91.15%
2- LogisticRegression which gave me accuracy = 85.87%
3-XGBClassifier which gave me accuracy = 89.54%
as we can see the accuracy is increased a little and the XGBClassifier is doing well with this problem.
you can find the code in the following folder: Final Approach/Best Classical model.ipynb
Also I saved the wieghts of the model for future use you can find it in the following folder Final Approach/finalized_model.sav
I tried to build a deep learning model hopping to increase accuracy of the model
I used keras library to build my deep learning archticture and searched for a good archticture and found a one
suitable for this problem and use it but it give me accuracy about 88.60% only and this was disappointing for me
you can find the model in the following folder Final models/deep learnig model.ipynb
the second classical machine learning which is working on Passive Aggressive Classfier algorithum was the best model between all models which gave me the best accuracy which is :91.15%
it works very good to detect the following emotions ( Sad - Happy - Love - Anger - Neutral )
so I used this model for deployment and for final results
I used Heroku cloud application platform to deploy a Streamlit application to test the model on it
you will find the model here : https://emotion-analysis-from-text.herokuapp.com/
Datasets References :
1- https://www.kaggle.com/parulpandey/emotion-dataset
2-https://www.kaggle.com/pashupatigupta/emotion-detection-from-text
Article that help me to clean the data :
https://www.analyticsvidhya.com/blog/2021/06/twitter-sentiment-analysis-a-nlp-use-case-for-beginners/
Deep learning model that I have used:
https://www.kaggle.com/muratkarakurt/emotion-detect-comment-97