Giter Club home page Giter Club logo

gitstractor's Introduction

GitStractor - Git Repository Analysis Tool

Project by Matt Eland (@IntegerMan)

This project is built for extracting commit, author, and file data from local git repositories in order to visualize repository history and trends to provide insight to software development teams and their stakeholders.

Here are a few examples of the types of visualizations that can be generated using GitStractor:

Stacked bar chart of # of commits per month by day of week: Accessible AI Blog Posts by Month

Tree map of the # of commits per file: GitStractor # Commits by File

Scatter plot of files in each commit: GitStractor # Files per Commit

Disclaimer

This project is in technical preview and will have bugs, inaccuracies, and numerous other issues. Use at your own risk.

Additionally, this project is intended to help you get a sense of the overall trends in your source code. It should not be used for performance evaluation purposes as its data is not yet known to be reliable and git history is not a good indicator of an indivual's performance.

Usage Instructions

GitStractor currently has two components:

  • A command line tool for extracting data from a local git repository
  • A Jupyter notebook for visualizing the extracted data

Data is extracted using the GitStractor-Extract tool. You can build this by opening GitStractor\GitStractor.sln in Visual Studio and setting GitStractor-Extract as the startup project. See Extracting Commit Data below for more information.

Once data is extracted, you can visualize it in a Jupyter Notebook. See Visualizing Commit Data in Jupyter Notebooks below for more information.

Extracting Commit Data

To get started, you'll need to run the GitStractor project in the GitStractor\GitStractor.sln solution.

You can either run the program from the command line once you've built it, or you can customize launchSettings.json to meet your needs.

Some common usage scenarios:

Extract git information from the git repository at C:\Dev\Interactive and store the resulting CSV files in C:\GitStractor

GitStractor-Extract --source C:\Dev\Interactive --destination C:\GitStractor

Extract git information from the git repository at C:\Dev\Interactive and store the resulting CSV files in C:\GitStractor. Ignore .gif, .txt, .json, and .d.ts files.

GitStractor-Extract -s C:\Dev\Interactive -d C:\GitStractor --ignore .gif,.txt,.json,.d.ts

Extract git information from the git repository at C:\Dev\Interactive and store the resulting CSV files in C:\GitStractor. Do not analyze commits off the current branch. This will analyze merge commits and may misattribute commits to the user that merged them. It is also considerably faster.

GitStractor-Extract -s C:\Dev\Interactive -d C:\GitStractor --includebranches false

Extract git information from the git repository at C:\Dev\Interactive and store the resulting CSV files in C:\GitStractor. Uses an authormap.json file to rename or merge together users. This is handy when you have users that have committed under different E-Mail addresses or changed their names.

GitStractor-Extract -s C:\Dev\Interactive -d C:\GitStractor --authormap C:\Dev\AuthorMap.json

An AuthorMap.json file should be structured like this:

[
    {
        "name": "Matt Eland",
        "emails": ["[email protected]", "[email protected]"]
    },
    {
        "name": "GitStractor",
        "emails": ["[email protected]"],
        "bot": true
    }
]

Visualizing Commit Data in Jupyter Notebooks

To view the full range of data visualizations in Jupyter Notebooks, open the GitStractor.ipynb Jupyter notebook in the Notebooks folder.

Once there, change the project_name variable to reflect your project and change the data_dir to indicate the directory your GitStractor .csv files from the prievious step are located.

From there, run the notebook from top to bottom to generate recommendations.

In order to work with Jupyter Notebooks, I recommend you install Anaconda and VS Code.

What's Next?

Future efforts on this project will focus on:

  • Creating a desktop application for extracting and visualizing data
  • Expanding the range of visualizations available in the Jupyter notebook
  • Improving the user experience pulling data from larger repositories
  • Adding machine learning capabilities for commit classification and clustering

If you'd like to submit a feature request or view the current backlog, please visit the GitHub Issues tab

Contact

Contact Matt Eland for general questions and feedback.

Please open an issue for enhancement requests and bug reports.

gitstractor's People

Contributors

dependabot[bot] avatar integerman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gitstractor's Issues

Mark authors as no longer on project

I'd like to be able to specify via authormap or auto-detect contributors who are not active on a project (no commits in 3 months) so that we can highlight areas that are owned by people who are no longer on the project.

Updated Jupyter Notebook for Displaying Commit Information

I want a good Jupyter Notebook that works with standardized GitStractor files to visualize information about commit activity.

Existing notebooks do this, but not on the current file format and not with sufficient contextual information.

Power BI Template - Commits

The Power BI template needs to be updated to work with more recent commit formats so people who don't want to use Jupyter Notebooks can use the application.

Ignore Certain Files

I want to be able to provide a series of regular expressions or file extensions that a filename will be evaluated against. If any of these match, the file will not be tracked in the files or commit files observers. This will help me ignore irrelevant files during the review process.

Extract - User Mapping

When mapping long projects, it's normal to see multiple authors with slightly different names or E-Mail addresses. Presently these are aggregated separately leading to messy reports and inaccurate aggregation metrics.

I'd like the ability to provide a JSON mapping from E-Mail addresses to new E-Mail addresses. Additionally, it'd be nice to be able to customize the display name of the user when using this approach so the preferred name is used.

Updated Jupyter Notebook for Displaying Author Information

I want a good Jupyter Notebook that works with standardized GitStractor files to visualize information about commit authors.

Existing notebooks do this, but not on the current file format and not with sufficient contextual information.

Track Lines Added / Removed at a Per-Commit Level

Tracking the # lines added and removed at each commit and file commit enables a lot more interesting visualizations with scatter plots in particular and is a legacy feature I rather enjoyed. This also will help compare different author behavior profiles.

Aggregate Data for Files

I'd like to be able to use gitstractor to show aggregate information about files such as:

  • Date Created
  • Last Date Modified
  • Creating Author
  • Last Modifying Author
  • Num. Commits
  • Num. Authors

This likely is a "second-pass" dataset or something that could be calculated from the existing files datasets

Build a work items CSV file

In order to track issues, bugs, and tasks mentioned in commit files, I want to make sure they're extracted to a separate WorkItems.csv file. Aggregate information for commits and filecommits should also include a count of work items by type.

Power BI Template - Authors

The Power BI template needs to be updated to work with more recent author file formats so people who don't want to use Jupyter Notebooks can use the application.

Deploy CLI Applications for Momentum

As a developer, I'd like a handy CLI app or multiple applications that would let me run the acquire and/or extract processes from the command line by downloading executable files. This should be done for the Momentum 2023 Milestone.

Power BI Template - Files

The Power BI template needs to be updated to work with more recent commit file formats so people who don't want to use Jupyter Notebooks can use the application.

GitStract CLI should report progress

When running gitstract.exe it's difficult to see how far along you are, leading the user to wonder if the program is running successfully.

A periodic indication of % complete would be helpful along with a final listing of files created.

Updated Jupyter Notebook for Displaying File Information

I want a good Jupyter Notebook that works with standardized GitStractor files to visualize information about files.

Existing notebooks do this, but not on the current file format and not with sufficient contextual information.

Remove unneeded files for Momentum

As a Matt, I want to keep the repo focused on the version being delivered at Momentum so that people can find a clear path to adoption.

GitStractor Desktop - Author Trends

I'd like to be able to visualize author trends in code from GitStractor Desktop so I can understand the ebb and flow of code activity in my projects

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.