Giter Club home page Giter Club logo

giggity's Introduction

Giggity - grab hierarchical data about a github organization, user, or repo

Get information about an organization, user, or repo on github. Stores all data in a json file, organized in a tree of dictionaries for easy database transfer or data analysis. All done through the github api, with or without authentication (authentication highly recommended).

Warning: Github API still supports v3 (what this script uses), however they are making the shift to graphql in v4. Here is a post on migrating from v3. I don't know if I will update this script before it becomes deprecated, but beware.

Setup

pip3 install giggity

or clone the repository and run:

pip3 install -r requirements.txt

Operation

giggity.py [-h] [-v] [-a] [-u] [-o] [-O OUTPUT] path

positional arguments:
  path                  name of organization or user (or url of repository)

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -a, --authenticate    allows github authentication to avoid ratelimiting
  -u, --user            denotes that given input is a user
  -o, --org             denotes that given input is an organization
  -O OUTPUT, --outfile OUTPUT
                        location to put generated json file

Example of Scraping a User

python3 giggity.py -a -O needmorecowbell.json -v -u needmorecowbell
  • This will ask for authentication credentials, put the program into verbose mode, scrape github for the user needmorecowbell, then put the results into needmorecowbell.json

Example of Scraping an Organization

python3 giggity.py -a -o github -O github.json
  • This will ask for authentication, scrape the github organization on github, then put out the results in github.json

Giggity as a Module

  • giggity can also be used as a module -- all data is stored within orgTree as a nested dict.
from giggity import giggity

g = giggity("username","password")
data = g.getUsers("organization-name", followers=True)

print("List of users in organization: ")
for user, info in data.items():
    print(user)

data = g.getEmails("username", verbose=True) # Get any emails found

Other examples of how to use giggity are available in the util folder.

Example Output

When python3 giggity.py -a -u geohot -O output.json is used..

output.json contains:

{
    "emails": [
        "[email protected]",
        "other emails taken out of example"
    
    ],
    "names": [
        "Charles Ellis",
        "George Hotz"
    ],
    "repos": {
        "ORB_SLAM2": {
            "created_at": "2017-04-08T00:21:13Z",
            "description": "ORBSLAM2 running on Mac OS X cause I was on a plane and bored and maybe useful for someone?",
            "fork": true,
            "name": "ORB_SLAM2",
            "updated_at": "2018-10-22T23:51:28Z",
            "url": "https://github.com/geohot/ORB_SLAM2"
        },
        { Many more repositories cut out for the example },
        "xnu-deps-linux": {
            "created_at": "2013-10-02T00:36:29Z",
            "description": "Mig/cctools for Linux combo makefile thingy",
            "fork": true,
            "name": "xnu-deps-linux",
            "updated_at": "2016-05-01T16:04:45Z",
            "url": "https://github.com/geohot/xnu-deps-linux"
        }
    }
}

giggity's People

Contributors

adi928 avatar mikemeliz avatar needmorecowbell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

giggity's Issues

Publish on pypi

This is necessary so I can start using this module in the hamburglar project without having to request it in some weird way.

Include a depth of followers option

There is a getFollowers() function already in giggity. I would like to add the ability to include followers, then followers of followers, and so on to a depth of x. This would be added as a separate nested dictionary added as a key to the user level called followers.

In this way we should be able to find mutual followers to a certain organization or user.

Add a license

I personally recommend Apache 2.0, but you need to have one.

Have a write to database feature (sqlite possibly?)

Write json file to database. I think it would be best if this was stored in an actual file, so sqlite seems like the way to go, although I would be fine with a mysql database too.

This would mean there would need to be a setup script for making the database.

Fix the copyright in your license

It currently reads

"Copyright [yyyy] [name of copyright owner]"

Unless your name is "[name of copyright owner]", you should probably change this.

remove unnecessary keys in tree

many of the keys are just links to other endpoints. Not much of this is useful, so it would make things cleaner to just not include them. Here is an example of what might get deleted:

This is edited from the results given from a request to the repo endpoint of the api

Repo Structure (This is a fairly liberal list, I think it could probably be cut down more)

{
    "id": 59394602,
    "name": "AaronConversation",
    "full_name": "needmorecowbell/AaronConversation",
    "private": false,
    "owner": {
      "login": "needmorecowbell"
    },
    "html_url": "https://github.com/needmorecowbell/AaronConversation",
    "description": "Old code from a fun postal service experience",
    "fork": false,
    "url": "https://api.github.com/repos/needmorecowbell/AaronConversation",
    "forks_url": "https://api.github.com/repos/needmorecowbell/AaronConversation/forks",
    "created_at": "2016-05-22T04:20:41Z",
    "updated_at": "2016-05-22T04:21:56Z",
    "pushed_at": "2016-05-22T04:21:55Z",
    "size": 9,
    "stargazers_count": 0,
    "watchers_count": 0,
    "language": "Java",
    "has_issues": true,
    "has_projects": true,
    "has_downloads": true,
    "has_wiki": true,
    "has_pages": false,
    "forks_count": 0,
    "open_issues_count": 0,
    "license": null,
    "forks": 0,
    "open_issues": 0,
    "watchers": 0,
    "default_branch": "master"
 }

Add tests

Testing makes sure that what you think your code does is actually what it does. It also helps to document your code.

Kenneth Reitz (i.e. the guy behind requests, which you use in this project) wrote an article on testing & Python, which I recommend.

Depth missing

[root@vm giggity]# python3.5 giggity.py -O username.json -v -u username   


   __ _(_) __ _  __ _(_) |_ _   _ 
  / _` | |/ _` |/ _` | | __| | | |
 | (_| | | (_| | (_| | | |_| |_| |
  \__, |_|\__, |\__, |_|\__|\__, |
  |___/   |___/ |___/       |___/ 


Traceback (most recent call last):
  File "giggity.py", line 272, in <module>
    g = giggity()
  File "giggity.py", line 10, in __init__
    self.depth = depth
NameError: name 'depth' is not defined

Depth isn't in the argument's list.

class giggity():

    def __init__(self, auth_usr="", auth_pss=""):
        self.depth = depth
if (args.depth):
    g = giggity(user, psswd, int(args.depth))
else:
    g = giggity(user, psswd)

Seems it was added here but not updated in the arglist 9a9bf0f

ratelimit errors aren't handled appropriately

Github limits you to 5000 requests per hour when authenticated and 60 when unauthenticated. Giggity should be able to check for the different error messages and handle them appropriately. This is now an important issue because of how many queries are being made.

For example:

Abuse and limited responses will look like this, and should be handled by telling the script to ping every 15 minutes to see if it can start scraping again.

HTTP/1.1 403 Forbidden
Content-Type: application/json; charset=utf-8
Connection: close
{
  "message": "You have triggered an abuse detection mechanism and have been temporarily blocked from content creation. Please retry your request again later.",
  "documentation_url": "https://developer.github.com/v3/#abuse-rate-limits"
}

HTTP/1.1 403 Forbidden
Date: Tue, 20 Aug 2013 14:50:41 GMT
Status: 403 Forbidden
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1377013266
{
   "message": "API rate limit exceeded for xxx.xxx.xxx.xxx. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
   "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}

start a wiki

Needs more attention than just a readme at this point. A wiki would be a good addition to clearly detail the project's features.

generate visualization of network

I want to be able to generate a graphic of the pulled organization. Think maltego, something with the org as the head node, branching down. Should be fairly obvious for org and and org's members, but it would be interesting to see interconnected relationships like mutual followers of the same individual or repo. Because of how many results might exist, I think color coded dots with popups showing the details of the node would be a good way of handling it.

Something that looks a little like this --

Provide example usage output

In your README.md, I'm expecting to see both example input as well as example output.

In essence, you want to make sure that your target audience has no doubts about how they would integrate into their workflow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.