Giter Club home page Giter Club logo

repofactor's Introduction

Finding the causes of repository bloat

This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.

Here is a sample sequence of commands showing typical usage:

  • Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.

  • Add these tools to your PATH or use a full path to each script or executable.

  • Run these tools from the repository undergoing analysis and cleaning.

  • Work out a suitable threshold size by running generate-larger-than with experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git".

  • Generate a sorted list of objects with file information

    generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt

  • Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information

    report-on-large-objects ../largeobjs.txt

Filtering out large blobs

  • Create a temporary work directory and export RFWORK_DIR to point to this directory (defaults to the current directory).

  • Again, run all commands from the repository being analysed.

  • From the above report, edit down a list of blob ids that can be eliminated. Call this large-objects.txt.

  • Generate a remove script

    make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl
    chmod +x "$RFWORK_DIR"/remove-blobs.pl
    
  • Optionally edit the remove script to filter out any paths that are not required at the same time

  • Run the filter branch

    run-filter-branch

  • Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history

    make-mtnh >"$RFWORK_DIR"/move-to-new-history

  • Push the rewritten refs and the rewrite-commit-map branch to all central repositories

  • Deploy move-to-new-history for users to use

repofactor's People

Contributors

hashpling avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.