needmorecowbell / giggity Goto Github PK

Wraps github api for openly available information about an organization, user, or repo

License: Apache License 2.0

Python 96.88% Shell 3.12%

giggity's Introduction

Giggity - grab hierarchical data about a github organization, user, or repo

Get information about an organization, user, or repo on github. Stores all data in a json file, organized in a tree of dictionaries for easy database transfer or data analysis. All done through the github api, with or without authentication (authentication highly recommended).

Warning: Github API still supports v3 (what this script uses), however they are making the shift to graphql in v4. Here is a post on migrating from v3. I don't know if I will update this script before it becomes deprecated, but beware.

Setup

pip3 install giggity

or clone the repository and run:

pip3 install -r requirements.txt

Operation

giggity.py [-h] [-v] [-a] [-u] [-o] [-O OUTPUT] path

positional arguments:
  path                  name of organization or user (or url of repository)

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  -a, --authenticate    allows github authentication to avoid ratelimiting
  -u, --user            denotes that given input is a user
  -o, --org             denotes that given input is an organization
  -O OUTPUT, --outfile OUTPUT
                        location to put generated json file

Example of Scraping a User

python3 giggity.py -a -O needmorecowbell.json -v -u needmorecowbell

This will ask for authentication credentials, put the program into verbose mode, scrape github for the user needmorecowbell, then put the results into needmorecowbell.json

Example of Scraping an Organization

python3 giggity.py -a -o github -O github.json

This will ask for authentication, scrape the github organization on github, then put out the results in github.json

Giggity as a Module

giggity can also be used as a module -- all data is stored within orgTree as a nested dict.

from giggity import giggity

g = giggity("username","password")
data = g.getUsers("organization-name", followers=True)

print("List of users in organization: ")
for user, info in data.items():
    print(user)

data = g.getEmails("username", verbose=True) # Get any emails found

Other examples of how to use giggity are available in the util folder.

Example Output

When python3 giggity.py -a -u geohot -O output.json is used..

output.json contains:

{
    "emails": [
        "[email protected]",
        "other emails taken out of example"
    
    ],
    "names": [
        "Charles Ellis",
        "George Hotz"
    ],
    "repos": {
        "ORB_SLAM2": {
            "created_at": "2017-04-08T00:21:13Z",
            "description": "ORBSLAM2 running on Mac OS X cause I was on a plane and bored and maybe useful for someone?",
            "fork": true,
            "name": "ORB_SLAM2",
            "updated_at": "2018-10-22T23:51:28Z",
            "url": "https://github.com/geohot/ORB_SLAM2"
        },
        { Many more repositories cut out for the example },
        "xnu-deps-linux": {
            "created_at": "2013-10-02T00:36:29Z",
            "description": "Mig/cctools for Linux combo makefile thingy",
            "fork": true,
            "name": "xnu-deps-linux",
            "updated_at": "2016-05-01T16:04:45Z",
            "url": "https://github.com/geohot/xnu-deps-linux"
        }
    }
}

giggity's People

Contributors

Stargazers

Watchers

Forkers

w00t3k ariel-shin mikemeliz p3t3rp4rk3r adi928 skyrant bbhunter w1ck3dth1ngs poeblu zorb dvaun arthurgtllr shantanu561993 sasqwatch 5l1v3r1 polling-repo-continua victorcuralea

giggity's Issues

Publish on pypi

This is necessary so I can start using this module in the hamburglar project without having to request it in some weird way.

Include a depth of followers option

There is a getFollowers() function already in giggity. I would like to add the ability to include followers, then followers of followers, and so on to a depth of x. This would be added as a separate nested dictionary added as a key to the user level called followers.

In this way we should be able to find mutual followers to a certain organization or user.

Add a license

I personally recommend Apache 2.0, but you need to have one.

Have a write to database feature (sqlite possibly?)

Write json file to database. I think it would be best if this was stored in an actual file, so sqlite seems like the way to go, although I would be fine with a mysql database too.

This would mean there would need to be a setup script for making the database.

Fix the copyright in your license

It currently reads

"Copyright [yyyy] [name of copyright owner]"

Unless your name is "[name of copyright owner]", you should probably change this.

remove unnecessary keys in tree

many of the keys are just links to other endpoints. Not much of this is useful, so it would make things cleaner to just not include them. Here is an example of what might get deleted:

This is edited from the results given from a request to the repo endpoint of the api

Repo Structure (This is a fairly liberal list, I think it could probably be cut down more)

{
    "id": 59394602,
    "name": "AaronConversation",
    "full_name": "needmorecowbell/AaronConversation",
    "private": false,
    "owner": {
      "login": "needmorecowbell"
    },
    "html_url": "https://github.com/needmorecowbell/AaronConversation",
    "description": "Old code from a fun postal service experience",
    "fork": false,
    "url": "https://api.github.com/repos/needmorecowbell/AaronConversation",
    "forks_url": "https://api.github.com/repos/needmorecowbell/AaronConversation/forks",
    "created_at": "2016-05-22T04:20:41Z",
    "updated_at": "2016-05-22T04:21:56Z",
    "pushed_at": "2016-05-22T04:21:55Z",
    "size": 9,
    "stargazers_count": 0,
    "watchers_count": 0,
    "language": "Java",
    "has_issues": true,
    "has_projects": true,
    "has_downloads": true,
    "has_wiki": true,
    "has_pages": false,
    "forks_count": 0,
    "open_issues_count": 0,
    "license": null,
    "forks": 0,
    "open_issues": 0,
    "watchers": 0,
    "default_branch": "master"
 }

Add tests

Testing makes sure that what you think your code does is actually what it does. It also helps to document your code.

Kenneth Reitz (i.e. the guy behind requests, which you use in this project) wrote an article on testing & Python, which I recommend.

god's sake

Glenn Quagmire--giggity

Depth missing

[root@vm giggity]# python3.5 giggity.py -O username.json -v -u username   


   __ _(_) __ _  __ _(_) |_ _   _ 
  / _` | |/ _` |/ _` | | __| | | |
 | (_| | | (_| | (_| | | |_| |_| |
  \__, |_|\__, |\__, |_|\__|\__, |
  |___/   |___/ |___/       |___/ 


Traceback (most recent call last):
  File "giggity.py", line 272, in <module>
    g = giggity()
  File "giggity.py", line 10, in __init__
    self.depth = depth
NameError: name 'depth' is not defined

Depth isn't in the argument's list.

class giggity():

    def __init__(self, auth_usr="", auth_pss=""):
        self.depth = depth

if (args.depth):
    g = giggity(user, psswd, int(args.depth))
else:
    g = giggity(user, psswd)

Seems it was added here but not updated in the arglist 9a9bf0f

Add a setup.py

Kenneth Reitz gain provides a good example: https://github.com/kennethreitz/setup.py

You could also go with Poetry instead, as a way to track your dependencies if you wanted to try the hip new thing.

ratelimit errors aren't handled appropriately

Github limits you to 5000 requests per hour when authenticated and 60 when unauthenticated. Giggity should be able to check for the different error messages and handle them appropriately. This is now an important issue because of how many queries are being made.

For example:

Abuse and limited responses will look like this, and should be handled by telling the script to ping every 15 minutes to see if it can start scraping again.

HTTP/1.1 403 Forbidden
Content-Type: application/json; charset=utf-8
Connection: close
{
  "message": "You have triggered an abuse detection mechanism and have been temporarily blocked from content creation. Please retry your request again later.",
  "documentation_url": "https://developer.github.com/v3/#abuse-rate-limits"
}

HTTP/1.1 403 Forbidden
Date: Tue, 20 Aug 2013 14:50:41 GMT
Status: 403 Forbidden
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1377013266
{
   "message": "API rate limit exceeded for xxx.xxx.xxx.xxx. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
   "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}

start a wiki

Needs more attention than just a readme at this point. A wiki would be a good addition to clearly detail the project's features.

create simple accessor methods for commonly used queries

for example...

get members of organization (ie the code in the readme just put into a method in giggity)
get list of followers to user or organization

request only gets 30 results at a time

Here is a solution to the issue: https://stackoverflow.com/questions/30656761/github-search-api-only-return-30-results

The request needs to be made and page incremented until there are no more results left.

generate visualization of network

I want to be able to generate a graphic of the pulled organization. Think maltego, something with the org as the head node, branching down. Should be fairly obvious for org and and org's members, but it would be interesting to see interconnected relationships like mutual followers of the same individual or repo. Because of how many results might exist, I think color coded dots with popups showing the details of the node would be a good way of handling it.

Something that looks a little like this --

Provide example usage output

In your README.md, I'm expecting to see both example input as well as example output.

In essence, you want to make sure that your target audience has no doubts about how they would integrate into their workflow.