Giter Club home page Giter Club logo

ghexport's Introduction

Export your Github personal data: issues, PRs, comments, followers and followings, etc.

Note: this only deals with metadata. If you want a download of actual git repositories, I recommend using python-github-backup.

Setting up

  1. The easiest way is pip3 install --user git+https://github.com/karlicoss/ghexport.

    Alternatively, use git clone --recursive, or git pull && git submodule update --init. After that, you can use pip3 install --editable.

  2. To use the API, you need to get a personal access token from settings. Note that you need to use repo scope.

Exporting

Usage:

Recommended: create secrets.py keeping your api parameters, e.g.:

token = "TOKEN"

After that, use:

python3 -m ghexport.export --secrets /path/to/secrets.py

That way you type less and have control over where you keep your plaintext secrets.

Alternatively, you can pass parameters directly, e.g.

python3 -m ghexport.export --token <token>

However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.

You can also import ghexport.export as a module and call get_json function directly to get raw JSON.

I highly recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!

Extra export options

  • you can control specific data you want to export via --include option (see --help for available fields)

    By default, all data will be included in the export.

  • you can include or exclude repository traffic data via --include-repos-traffic or --exclude-repos-traffic.

    Currently it’s included by default.

    You might want to exclude it if you have some issues with traffic API endpoint (it tends to be flakier than other endpoints).

API limitations

WARNING: github API limits extent to which you can retrieve certain data, e.g. events you can only get events from the past 90 days, and not more than 300 events.

I highly recommend to export regularly and keep old exports. Easy way to achieve it is command like this:

python3 -m ghexport.export --secrets /path/to/secrets.py >"export-$(date -I).json"

Or, you can use arctee that automates this.

To get your older data past 90 days, you can request a manual export in your account settings.

Known Issues

The requests (and therefore PyGithub) modules on which this depends seems to sometimes fail to login if a ~/.netrc file is present, see here for context.

Using the data

You can use ghexport.dal (stands for “Data Access/Abstraction Layer”) to access your exported data, even offline. I elaborate on motivation behind it here.

  • main usecase is to be imported as python module to allow for programmatic access to your data.

    You can find some inspiration in =my.= package that I’m using as an API to all my personal data.

  • to test it against your export, simply run: python3 -m ghexport.dal --source /path/to/export
  • you can also try it interactively: python3 -m ghexport.dal --source /path/to/export --interactive

Example output:

Your events:
Counter({'PushEvent': 181,
         'WatchEvent': 27,
         'CreateEvent': 22,
         'IssueCommentEvent': 20,
         'PullRequestEvent': 15,
         'IssuesEvent': 5,
         'DeleteEvent': 5,
         'ForkEvent': 3,
         'PullRequestReviewCommentEvent': 1})

ghexport's People

Contributors

karlicoss avatar seanbreckenridge avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.