Giter Club home page Giter Club logo

buzzmining's Introduction

BuzzMining

Final Project for Real-time and Big Data Analytics

Designed an application which allowed users to enter query words and returned their popularity analysis and summary of public attitudes towards them, based on real-time data crawled from Reddit, Twitter and Yelp.

Implemented spider program using Python and Scrapy Framework to crawl websites; used Java, MapReduce and Pig to clean crawled data; end results included graphs showing popularity trends, pie charts showing opinion distribution, list of top words associated with the searched word, prediction of future popularity etc.

Please see the following report for more details:

https://drive.google.com/open?id=0B_bKdJl2aPq_Qnd6ZFRRY2NWMlE

Run Configuration:

spark_twitter.py is used to parse twitter data and applies Spark technique.

yelp_merge.py, yelp_parse_business_list.py and yelp_parse_tip_list.py are used to parse yelp data.

parse_reddit.cpp is used to parse reddit data.

Sentiment.java, SentimentMapper.java and SentimentReducer.java in the folder named MapReduce are used to perform the major sentiment analytics on parsed data and apply MapReduce technique.

PostProcess.java is used to process the result from MapReduce and generate report.

run.sh is the shell script to run the whole project and generate report to std io

To run the shell script, please put Sentiment.jar which is generated by Sentiment.java, SentimentMapper.java and SentimentReducer.java at the folder same as run.sh

Please put PostProcess.java at the folder same as run.sh

If search reddit data, please put data file named input_r to hdfs If search yelp data, please put data file named input_y to hdfs If search twitter data, please put data file named input_t to hdfs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.