Giter Club home page Giter Club logo

hadoop_sentiment_analysis's Introduction

Sentiment Analysis using Hadoop

  • This program performs sentiment analysis on large Yelp datasets of customer reviews from on kaggle using MapReduce and the Hadoop Distributed System.
  • The data is processed as a long string and then tokenized to obtain for each record the Business ID, Review ID, Review text, and Star rating of the yelp reviews dataset.
  • The program checks the accuracy of the star rating of a given restaurant by going through the written reviews and calculating the sentiment of each business and giving out a predicted star rating. Then, it calculates the average star difference for each business. Finally, calculate the difference for each business. This will make it so business with only few reviews are taken into account more for the difference.

Phase 1

  • Mapper: map the Review text using Review ID as key.
  • Reducer: perform sentiment analysis on the array of Review text for each business, then output the predicted star rating for each review ID, actual star rating and business ID.

Phase 2:

  • Mapper: map the predicted star rating, actual star rating to business ID.
  • Reducer: for each review, calculate the difference between predicted star rating and actual star rating then average them for each business ID, output them as business ID, starDiff.

Phase 3:

  • Mapper: Map all the starDiff to a reducer.
  • Reducer: calculate the average of all the starDiff of the business ID.

hadoop_sentiment_analysis's People

Contributors

aakarshs avatar bovojon avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

aakarshs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.