Giter Club home page Giter Club logo

realtimetwitteranalytics's Introduction

RealTimeAnalyticsSystem

Dynamic visualization of tweets is quite trending now a days. This project mainly focuses on dynamic visualization of the tweets which contain world wide topN hash tags. N can be any number eg : 1,10,20 etc.

Goal

Real-time dynamic visualization of tweets that contain worldwide top hashtags.

Design:

Started with a storm topology that calculates TopN Hashtags and then wrote new Bolt that takes TopN Hash tags with a streaming join of real-time tweets. Finally visualized only tweets that contain the TopN Hashtags.

Technology stack

Area Technology
Front-End HTML, CSS, D3.js , Java script
Back-End Flask (Python)
Distributed Real Time Computation System Apache Storm (Java)
In-Memory Caching / Datastore Redis
Other APIs Used Twitter streaming using Twitter 4j api

Storm Topology Used:

ParseTweetBolt: receives every tweet emitted by the TweetSpout, parses them to find hashtags and emits each of them (if any).

Count HashtagBolt: has an HashMap for counting every hashtag (the key is the hashtag and the value is the number of times it appeared in a tweet); it receives an hashtag from the preceding bolt and updates the map consequently.

IntermediateRankginsBolt: the one used in the course with no modifications.

TotalRankingsBolt (total-ranker): the one used in the course with no modifications.

TweetsWithTopHashtagsBolt (top-tweets): this bolt receives data both from the TweetSpout and the TotalRankingsBolt; if the data come from the TotalRankingBolt, it updates an internal Rankings object with the one received, so that it has always the updated hashtags rankings. If the data come from the TweetSpout, it checks if the tweet contains one or more hashtags and if at least one of them is contained in the rankings object: if true emits the tweet to the ReportBolt, otherwise it will be discarded. The tweet is emitted to the next bolt along with the size of the font to display it: the higher rank the hashtag has, the bigger the font will be (this value is normalized against the max rank to avoid too big fonts).

ReportBolt: this Bolt publishes the data to the Redis instance.

Instructions for Project Execution :

Install Vagrant (See the below link)

https://www.vagrantup.com/docs/installation/

Install VirtualBox (See the below link)

https://www.virtualbox.org/manual/ch01.html

Create a twitter app (See the below link)

http://docs.inboundnow.com/guide/create-twitter-application/

Open the TopNTweetTopology file which is located in the below path

AnalyticsSystem - src - jvm - udacity - storm

locate the below code in the source code

// now create the tweet spout with the credentials TweetSpout tweetSpout = new TweetSpout( "", "", "", "" );

Then enter below mentioned in the string literals (follow the order)

Customer Key

Secret Key

Access Token

Access Secret

Then Navigate to the project root directory through terminal

Execute below commands:

vagrant up   

vagrant ssh

cd /vagrant/AnalyticsSystem

mvn package

storm jar target/udacity-storm-lesson3_stage5-0.0.1-SNAPSHOT-jar-with-dependencies.jar udacity.storm.TopNTweetTopology

open another terminal

Then Navigate to the project root directory through terminal

Execute below commands:

vagrant up   

vagrant ssh

cd /vagrant/viz

python app.py  --- launches the visualization. Built by using Flask python based micro frame work

Now navigate to the browser and enter url as http://localhost:5000

Visualization output:

Future Scope:

Display more number of tweets in the visualization rather than single dynamic tweet which contains TopN hash tags

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.