Giter Club home page Giter Club logo

allgithub's Introduction

allgithub

Crawling github data for https://github.com/anvaka/pm/

usage

Prerequisites:

  1. Make sure redis is installed and running on default port
  2. Register github token and set it into GH_TOKEN environment variable.
  3. Install the crawler:
git clone https://github.com/anvaka/ghcrawl
cd ghcrawl
npm i

Now we are ready to index.

Find all users with more than 2 followers

This will use a search API and will go through all users on GitHub who have more than two followers. At the moment there are more than 400k users.

Each search request can return up to 100 records per page, which gives us 400,000 / 100 = 4,000 requests to make. Search API is rate limited at 30 requests per minute. Which means the indexing will take 4,000/30 = 133 - more than two hours:

node findUsersWithFollowers.js

Find all followers

Now that we have all users who have more than two followers, let's index those followers. Bad news we will have to make one request per user. Good news, rate limit is 5,000 requests per hour, which gives us estimated amount of work: 400,000/5,000 = 80 - more than 80 hours of work:

node indexUserFollowers.js

Time to get the graph

Now that we have all users indexed, we can construct the graph:

node makeFollowersGraph.js > github.dot

Layout

Convert graph to binary format:

node --max-old-space-size=4096 ./toBinary.js

Then use ngraph.native for faster graph layout.

license

MIT

allgithub's People

Contributors

anvaka avatar ckross01 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.