Giter Club home page Giter Club logo

amazon_vine_analysis's Introduction

Amazon_Vine_Analysis

ETL and Sentiment Analysis of Amazon reviews with AWS, PySpark, postgresql, NLP.

Analysis Overview

In this project an analysis is performed on Amaozon Vine program to check if there exist a bias toward favorable reviews from Vine memebers. In order to complete the analysis PySpark is used to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, load the transformed data into pgAdmin and calculate different metrics. It is noted that we focused on the US reviews for video games.

Resources

Results

A paid and unpaid DataFrame is created, after the data is cleaned and sorted. The first 20 rows of each is shown below:

Paid Dataframe

Paid DataFrame

Unpaid DataFrame

Unpaid DataFrame

A count of each DataFrame yielded 90 paid (Vine) reviews and 37,831 unpaid reviews.

Paid and Unpaid Review Count

The number of 5 Star reviews was calculated for each, 44 5 Star paid reviews and 14704 5 Star unpaid reviews. Five Star Reviews

And the percentage of 5 Star reviews was 48.89% for paid and 38.87% for unpaid. Five Star Review Percentage

Summary

The conclusions that can be obtained from the dataset include: there apears to be some form of bias for posisitve paid reviews, with a 10% higher for paid versus unpaid reviews. It is well noted that there is 90 paid reviews, compared to the much higher 37,000 unpaid. To put it into persepctive there is one piad review for each 420 unpaid ones; therefore seeking a larger paid sample set may be better for the analhysis.

To further improve our analysis we might be intrested in stdying another category of reviews other than video games such as electronics to help mitigate numerical differences.

amazon_vine_analysis's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.