Giter Club home page Giter Club logo

github-repo-info's Introduction

github-repo-info

This is a set of Python scripts for getting some repository metadata from the GitHub API and then doing stuff with it (that stuff is described below). The GitHub REST API is very powerful and this simple use-case barely scratches the surface.

get_gh_data.py requests data from the GitHub API and saves a set of data CSV files listing repositories, licenses, and topics. Accessing the GitHub API is done with this separate script, so it is not necessary to keep hitting the API while working on the other scripts.

qry_gh_data.py reads the data CSV files and writes another set of CSV files that serve as reports (easily loaded into LibreOffice Calc or Excel).

topics_md.py reads the data CSV files and writes Markdown files that use collapsed-sections to show Repositories by License and Repositories by Topic. It can also insert those sections into another Markdown document, such as a README.md file.


get_gh_data.py

This script uses the PyGithub module to access the GitHub API.

A personal access token is required to use the GitHub API. You can provide the path to a file containing the token using the --key-file parameter. The file must be formatted as follows:

key="YOUR_TOKEN_HERE"

If you do not provide a --key-file argument, the script looks for the file ~/KeepLocal/get_gh_data-settings.txt (where ~/ expands to the user's home directory).

The screenshot below shows the scopes I selected. It may be possible to use a more restrictive repo scope if only public repository data is requested1. I wanted to include private repos as well in the data being retrieved.

screenshot of creating a personal access token

Command-Line Usage

usage: get_gh_data.py [-h] [-k KEYFILE]

Queries the GitHub API for metadata about a user's repositories and saves it
into CSV files.

optional arguments:
  -h, --help            show this help message and exit
  -k KEYFILE, --key-file KEYFILE
                        Name of the file containing the GitHub Personal Access
                        Token needed to query the API.

qry_gh_data.py

This script reads the data CSV files and writes the following CSV files to an output sub-directory:

  • repos-langs.csv
  • repos-private.csv
  • repos-public.csv
  • repos-public-md.csv
  • repos-topics.csv

topics_md.py

This script reads the data CSV files and writes the following Markdown files to an output sub-directory:

  • repos-by-license.md
  • repos-by-topic.md

Writing into another Markdown file

The Repositories by License and Repositories by Topic Markdown text can also be insert into another Markdown document, such as a README.md file.

If you use the --insert-into parameter, the script looks for specific HTML comment tags, in the target file, that serve as begin and end markers for inserting the sections as shown below. The tags must be exactly as shown, including the underscores, and be on separate lines with no other text. If the tags are not found, then the document is not changed. Also, it is not necessary to use both sections.

...
`<!-- Begin_Repositories_by_Topic -->`
  ('Repositories by Topic' section inserted/replaced here.)
`<!-- End_Repositories_by_Topic -->`
...
`<!-- Begin_Repositories_by_License -->`
  ('Repositories by License' section inserted/replaced here.)
`<!-- End_Repositories_by_License -->`
...

Command-Line Usage

usage: topics_md.py [-h] [--insert-into INTO_FILE] [-o OUTDIR]

Reads GitHub repository metadata from CSV files saved by get_gh_data.py and
writes Markdown files listing 'Repositories by Topics' and 'Repositories by
License'. Can also insert those as sections into another Markdown document
(such as a README.md).

optional arguments:
  -h, --help            show this help message and exit
  --insert-into INTO_FILE
                        Optional. File in which to insert the Markdown
                        sections.
  -o OUTDIR, --output-to OUTDIR
                        Directory in which to create output files. Optional.
                        By default the output is written to a directory named
                        'output' under the current working directory.

Notes

If you are using Visual Studio Code, the Rainbow CSV extension is very helpful for seeing the individual data fields when viewing CSV files in the editor. It's much quicker than opening the file in Calc or Excel when trying to spot the fields in plain text starts making your eyes cross.

Footnotes

  1. TODO: Explore access scopes in more detail. โ†ฉ

github-repo-info's People

Contributors

wmelvin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.