Giter Club home page Giter Club logo

amazon-sentiment-analysis's Introduction

BachelorThesis - Sentiment analysis on Amazon product reviews

Summary

Text mining has proved to be a crucial tool for companies in order to know their customers opinion. By deriving high-quality information from large volumes of text, it is possible to understand how the market preferences evolve beyond sales statistics. For this reason, the goal of this bachelor thesis is to perform an accurate sentiment analysis on Amazon product reviews. Three different review datasets (ebooks, toys and video games) configure the starting point to extract and quantify affective states by applying natural language processing techniques. The aforementioned datasets are provided by Kaggle, a collaborative data science platform. Thus, supervised learning algorithms, including deep learning approaches, have been employed to predict the overall sentiment and the usefulness behind a product review. In addition, a topic-based categorization has been also carried out in order to classify unseen reviews into one specific product type.

Goals

  1. Prediction of the dominant sentiment behind each review: Considering that each review has its own overall field, we can use it to evaluate the accuracy of our model by comparing our output with the real value.

  2. Prediction of the helpfulness of a review: This is a similar case as the previous one, but now we are going to predict how helpful a given review can be based on its body of text. We can use the helpful field to evaluate our model accuracy as well.

  3. Topic categorization of a review: Its goal is to determine the topic, i.e. the product type, of unseen reviews.

Usage

The structure of the code is organized as follows:

  • Datasets folder

  • Pickled (or prestored) variables folder

  • Non deep learning approaches folder:

  1. Data Visualization.py

  2. OverallPrediction.py

  3. HelpfulnessPrediction.py

-- TopicPrediction.py

Deep learning approaches folder:

  1. MLP folder: Overall, Helpfulness and Topic Prediction Python files based on MLP

  2. CNN folder: Overall, Helpfulness and Topic Prediction Python files based on MLP

Please note that most of this code is not still ready to be executed in a friendly way, and it may output errors depending on your set up. However, it contains all the methods and resources that have been employed in order to accomplish the goals of this project.

However, the most important script here is "OverallSentimentPrediction":

OverallSentimentPrediction: This script lets the user test several precomputed classifiers in the three different classification tasks. First of all, the user chooses the classification task along with the desired classifier. Then, the program expects a text review that will be accordingly classified. Finally, the user can check if the predicted results is the correct one or not. The pickled variables inside pickled_vars folder can be downloaded through this Google Drive folder. This second link also includes the final report of the project, and a summarized presentation. Finally, the video demo below shows the main features of the program.

amazon-sentiment-analysis's People

Contributors

enricmartos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.