Giter Club home page Giter Club logo

git_filter's Introduction

git_filter

This is a utility for filtering a git branch.

What, how, why?

It outputs a new branch with only the things we want included, in all revisions. It writes a new history preserving the part of it which is relevant for the new branch. It resulted from the efforts to do the same thing using git filter-branch and then directly using git plumbing commands when git filter-branch was found to be too slow. On my test repository with 100k commits the original git filter-branch based filtering took around 2 days to finish one filter set. The same thing with git_filter takes a minute on the same hardware.

The input is a positive list of files and directories to be included.

Advanced features

The tool can make several filterings simultaneously. This is normal, of course, where you want to split a large git up into smaller ones and you want two or more disjoint sets of data which together contain the whole original repository.

git_filter produces a lot of loose objects when it is finished, so it is a very good idea to repack the repository when it is done (e.g. git repack -ad) before continuing working with the resultant repository.

In addition to the new branches git_filter outputs a .revinfo text file per branch with a line per new revision showing correspondance to the original revision it is derived from. The purpose of this is to allow recreation of tag information.

The purpose of the git_filter program for me was to generate final repositories which contain none of the original commits. To do this I needed to do some further work.

The push_clean_repos script creates a clean repository for each of the filtered branches generated by the git_filter run. Each new repository has the same name as the corresponding branch. It takes the same configuration file as argument as git_filter. The newtags.py uses the .revinfo files from git_filter and tag information in the source repository to map the tags in the source to each of the destination repositories.

An example

I have a git repository repo I want to split up. It is located in the current directory.

Then run:

./git_filter git_filter.cfg && ./push_clean_repos git_filter.cfg

git_filter saves the necessary state (in the .git directory) to allow a full history processing to be resumed without generating all the initial commits again. We can run it once on the entire history and then run it incrementally on new commits and produce the same result as starting from scratch each time. This results in much shorter processing times. Tell git_filter to do this by adding the option continue on the command line after the configuration file, thus:

./git_filter git_filter.cfg continue

Building the script

Just a plain

make

should be enough to build the git_filter. It automatically downloads libgit2 and builds it as part of the process. It has been tested to compile on Mac (with Xcode installed) and on Ubuntu Linux. Neither of these systems had a pre installed libgit2.

Config File Syntax

Look at the filter.cfg example, it is commented.

Config items and data

The config file parser is very simple, so a single space is the only allowed separator. The parameter names should be exactly 4 characters followed by colon and a space. Lines beginning with a # are comment lines and are ignored.

  • REPO: <repo>

    The configuration file should contain one REPO tag with the location of the repository to filter.

  • REVN: [range|ref] <refspec>

    A revision specification. Either a range, e.g. master~1000..master or a (branch) reference, e.g. refs/heads/master.

  • BASE: <dir>

    A base directory for the filter file lists.

  • FILT: <name> <file>

    Space separated name and filter file pair.

  • TPFX: <tag prefix>

    The prefix for tags and output repo names, prepended to the filter set name.

License

GNU GPL v2

git_filter's People

Contributors

mk-pmb avatar slobobaby avatar tmannerm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.