Giter Club home page Giter Club logo

rdrreddit's Introduction

rdrreddit

Materials in support of this post: http://toddwschneider.com/posts/the-reddit-front-page-is-not-a-meritocracy/

There are 3 main components to the repo:

1. Rails application that grabs the top 100 items from reddit every 5 minutes

The app is not intended to be used as a web server, just as a clock process and delayed job worker. You can run it with:

bundle exec foreman start -f Procfile.clockandworker

The clock dumps a blob of serialized text into the reddit_observations table every 5 minutes, then a delayed job worker processes each of those blobs into the posts and observations tables. Some additional methods cache a few attributes on those tables, and fetch data fromt the Imgur API -- these methods are run manually from the Rails console

2. R scripts for data analysis

reddit_analysis.R does the heavy lifting

3. Postgres database dump file

rdr_seed.dump contains data from the reddit top 100 between September 15 and October 31, 2014

It includes only the posts and observations tables -- the raw content in reddit_observations table would take up too much space, and none of the analysis depends on that table anyway. You can restore the database on your local machine with pg_restore (you have to install postgres first if you haven't yet):

pg_restore --verbose --clean --no-acl --no-owner -h localhost -d rdrreddit_development /path/to/rdr_seed.dump

The dump file is about 25 mb compressed, and will take up 175 mb on disk once fully restored

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.