BachelorThesis - Sentiment analysis on Amazon product reviews

Summary

Text mining has proved to be a crucial tool for companies in order to know their customers opinion. By deriving high-quality information from large volumes of text, it is possible to understand how the market preferences evolve beyond sales statistics. For this reason, the goal of this bachelor thesis is to perform an accurate sentiment analysis on Amazon product reviews. Three different review datasets (ebooks, toys and video games) configure the starting point to extract and quantify affective states by applying natural language processing techniques. The aforementioned datasets are provided by Kaggle, a collaborative data science platform. Thus, supervised learning algorithms, including deep learning approaches, have been employed to predict the overall sentiment and the usefulness behind a product review. In addition, a topic-based categorization has been also carried out in order to classify unseen reviews into one specific product type.

Goals

Prediction of the dominant sentiment behind each review: Considering that each review has its own overall field, we can use it to evaluate the accuracy of our model by comparing our output with the real value.
Prediction of the helpfulness of a review: This is a similar case as the previous one, but now we are going to predict how helpful a given review can be based on its body of text. We can use the helpful field to evaluate our model accuracy as well.
Topic categorization of a review: Its goal is to determine the topic, i.e. the product type, of unseen reviews.

Usage

The structure of the code is organized as follows:

Datasets folder
Pickled (or prestored) variables folder
Non deep learning approaches folder:

Data Visualization.py
OverallPrediction.py
HelpfulnessPrediction.py

-- TopicPrediction.py

Deep learning approaches folder:

MLP folder: Overall, Helpfulness and Topic Prediction Python files based on MLP
CNN folder: Overall, Helpfulness and Topic Prediction Python files based on MLP

Please note that most of this code is not still ready to be executed in a friendly way, and it may output errors depending on your set up. However, it contains all the methods and resources that have been employed in order to accomplish the goals of this project.

However, the most important script here is "OverallSentimentPrediction":

OverallSentimentPrediction: This script lets the user test several precomputed classifiers in the three different classification tasks. First of all, the user chooses the classification task along with the desired classifier. Then, the program expects a text review that will be accordingly classified. Finally, the user can check if the predicted results is the correct one or not. The pickled variables inside pickled_vars folder can be downloaded through this Google Drive folder. This second link also includes the final report of the project, and a summarized presentation. Finally, the video demo below shows the main features of the program.

langzippkk / amazon-sentiment-analysis Goto Github PK

amazon-sentiment-analysis's Introduction

BachelorThesis - Sentiment analysis on Amazon product reviews

Summary

Goals

Usage

amazon-sentiment-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent