Giter Club home page Giter Club logo

github-awards's Introduction

GitHub Awards

GitHub Awards gives your ranking on GitHub by language and by location (city, country and worldwide) based on the number of stars of your repos.

How does it work ?

In order to calculate your ranking on GitHub we need to :

  • Get all GitHub users with their location
  • Geocode their location
  • Get all GitHub repositories with language and number of stars

With this informations we are able to compute your ranking for a given language in a given city.

Step 1 : Get all users and repositories

There are over 10 Millions users and 15 Millions repositories on GitHub, we cannot just call the GitHub API for each user and repos.

However the GitHub list API returns 100 results at a time with basic informations :

With this you can get up to 500k user / repo per hour : this is enough to get the entire list of users and repositories with basic informations (username, repo name, etc).

Rake task are :

rake user:crawl
rake repo:crawl

Now we need to get detailed informations such as location, language, number of stars.

Step 2 : Use Google Big Query to get details about active users and repositories

GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

The GitHub Archive dataset is public, with Google Big Query we can filter the dataset to get only the latest event for each repo and users. Unfortunatly the GitHub Archives events starts from 2011, so we won't get ranking informations for users and repos that have been inactive since 2011.

  • Request for repositories :

users.sql

  • Request for users :

repos.sql

We can then download the results as JSON, parse the result, and fill missing informations about users and repos

Rake task are :

rake user:parse_users
rake repo:parse_repos

We now have users location, and repositories language and number of stars. In order to get country and world rank we need to geocode user locations

Step 3 : Geocoding user locations

Location on GitHub is a plain text field, there are about 1 million profil with location on GitHub. Free geocoding APIs usually have a hard rate limiting. First step is to geocode only distinct location, which leaves about 100k location to geocode. A solution to speed up the geocoding is to use a combination of :

Rake task is :

rake user:geocode_locations

We now have all informations we need to compute ranking.

Step 4 : Compute rankings by language and by location (city/country/world)

To get rankings we first calculate a score for each user in each language using this formula :

sum(stars) + (1.0 - 1.0/count(repositories))

Then we use Postgres ROW_NUMBER() function to get ranks compared to other developers with repositories in the same languages, in the same location (by city, by country or worldwide).

Ok, now we have all GitHub users ranking :)

In order to speed up queries based on user ranks, we create a table with all rankings informations. Once we have all rankings informations on a single table we can properly index it, and get acceptable response time when we query it from a web application.

The query to create the language_rankings table can be found here :

rank.sql

Step 5 : VOILA ! Look for your ranking and have fun :)

Next steps :

  • Github connect
  • Manually refresh your informations
  • Automating data update
  • Improve UI

Contributing :

  • Fork it https://github.com/vdaubry/github-awards/fork
  • Create your feature branch git checkout -b my-new-feature
  • Commit your changes git commit -am 'Add some feature'
  • Push to the branch git push origin my-new-feature
  • Create a new Pull Request

License

This project is available under the MIT license. See the license file for more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.