Giter Club home page Giter Club logo

pinterest_crawler's Introduction

Pinterest Crawler

How to run

These rake tasks create new files or override exisiting json files, so make sure if you have old data back them up. the results are some of these: boards.json, pins.json or users.json

Boards & Pins

Get all the boards and pins from user mdoroudi. Replace the username mdoroudi whith whatever username you want. this creates two result files: pins.json and boards.json

$ rake crawl:pins_boards:from_seed seed=mdoroudi

Get first 50 pins of the main page this creates one result file: pins.json

$ rake crawl:pins_boards:pins_from_homepage 

From the first page get the user of all the first 50 pins and crawl their boards and pins this crates two result files: pins.json and boards.json

$ rake crawl:pins_boards:from_homepage_deep

Users

Given a user slug get all it's fololowers and followings, and for each get their follower and followings, the limit right no is 500 users

$ rake crawl:users:from_seed seed=mdoroudi

Load data into your mysql database

To analyze the data further you might want to load the data into mysql database, (right now it only pins and boards).

Create Tables

before creating tables, make sure you have a config/database.yml file that almost looks like this but has your info in it

Database

adapter: mysql2
encoding: utf8
host: localhost
database: pinterest
user: root
password: 

and also create your database, in my case it's called pinterest

> create database pinterest

Tables

This process creates the following three table:

  • users
  • pins
  • boards
$ rake create_tables:all

Load data

loads the json data into the corresponding tables

$ rake load_data:all 

TODO / Work in Progress

  • User: creating following/follower relationship and chaning the user_crawler code to respect it (following_followers_rel branch)
  • User: add has_many pins & has_many baords
  • User: add a rake task to load them into database just like pins and boards
  • Pins & Boards: add belong_to user, remove username from table (following_followers_rel branch)
  • Pins: work on is_video
  • Bring code/datastructures/graph.rb here so can be used

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.