Giter Club home page Giter Club logo

w205_ex2's Introduction

MIDS W205 Exercise 2

Lei Yang - 2015/12/12

System Requirements:

  1. Python libraries streamparse, tweepy, and psycopg2 are installed and properly configured.
  2. Postgres is running with postgres user.

Deployment steps:

  1. as w205, checkout repo:
$ git clone [email protected]:leiyang-mids/w205_ex2.git 
  1. as w205, under /setup, create Postgres database and table:
$ python create_db.py
  1. bash screen should show:

[w205@ip-172-31-6-39 setup]$ python create_db.py
creating tcount database in postgres ...
database tcount is successfully created!
creating tweetwordcount table in tcount ...
table tweetwordcount is successfully created!
postgres setup completed, you are good to go, happy streaming :)
  1. as w205, under /EX2Tweetwordcount, start streaming:
$ sparse run 
  • note: to avoid log flood, we only log count every 100 counts for each word, it could be several seconds before you see the first count shows up
  • e.g. 37262 [Thread-41] INFO backtype.storm.task.ShellBolt - ShellLog pid:4265, name:count-bolt weather: 100
  1. to stop streaming, press Ctrl+C at any time.

Checking results (under /serving_scripts):

  • check all words count:
$ python finalresults.py 
  • check a specific word:
$ python finalresults.py weather 
  • check histogram with specified range:
$ python histogram.py 600 1000 
  • perform customized query on Postgres:

$ [w205@ip-172-31-6-39 ~]$ psql -U postgres
psql (8.4.20)
Type "help" for help.
postgres=# \\c tcount
psql (8.4.20)
You are now connected to database "tcount".
tcount=# select * from tweetwordcount order by count desc limit 20
tcount-# ;
  word   | count
---------+-------
 like    |   700
 one     |   610
 love    |   524
 will    |   443
 amp     |   430
 know    |   421
 want    |   390
 weather |   370
 see     |   360
 time    |   340
 people  |   331
 good    |   330
 make    |   310
 new     |   309
 thank   |   308
 day     |   282
 much    |   282
 need    |   277
 back    |   274
 really  |   270
(20 rows)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.