Giter Club home page Giter Club logo

Comments (3)

newren avatar newren commented on July 1, 2024

Yeah, it seems to process commits kind of slowly on a clone of the freebsd/freebsd repo from github. The --analyze mode tends to be much slower than doing the actual transformations (it's gathering a lot more detailed data) but I still would have expected it to complete in less time than "overnight" for a repo of that size. Note that if I just skip doing the analysis and instead just prune the empty commits, it completes in a little over 13 minutes for me on a clone of the freebsd/freebsd repo:

$ time git filter-repo --prune-empty always
Parsed 768877 commits
New history written in 325.61 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 8c41dd32e985 Remove duplicate dbufs accounting.
Enumerating objects: 3835827, done.
Counting objects: 100% (3835827/3835827), done.
Delta compression using up to 8 threads
Compressing objects: 100% (1222739/1222739), done.
Writing objects: 100% (3835827/3835827), done.
Total 3835827 (delta 2559799), reused 3835143 (delta 2559123)
Completely finished after 797.85 seconds.

real	13m20.930s
user	11m13.120s
sys	2m5.064s

And it looks like it drops the overall number of commits from 768877 to 765913 (or just restricting to HEAD, it drops the overall number of commits from 266441 to 265512), so it prunes out about 3000 empty commits (1000 if you only care about the history of HEAD).

On another repo of similar size, and using the --analyze mode, I see

$ time git filter-repo --analyze
Processed 5972834 blob sizes
Processed 486510 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 7299 and retry the command.
Processed 487182 commits
Writing reports to .git/filter-repo/analysis...done.

real	31m23.658s
user	31m40.684s
sys	1m27.296s

But I'm not yet sure why the --analyze mode takes so long for the freebsd repo. Early digging suggests it's related to attempting to gather information about renames (which is specific to the --analyze mode) and perhaps related to the checks for whether it needs to break the "equivalence classes" for renames (if someone renames A to B, but then introduces another A later, I don't want to claim that all versions of A & B were just different revisions of the same file). I'm digging in...

from git-filter-repo.

newren avatar newren commented on July 1, 2024

I pushed a couple new commits to speed things up. For the similarly-sized repo I mentioned above it provides a modest improvement, dropping the overall time from 31m23.658s to 23m55.685s. For the FreeBSD repo, it provides a far more dramatic improvement, dropping the time for --analyze from takes-more-time-than-you-or-I-have-patience-for down to just 19m29.999s on my machine.

Thanks for the report and the testcase!

from git-filter-repo.

uqs avatar uqs commented on July 1, 2024

Woa, that's amazing, thank you!
I've aborted the old run after it racked up 40h of runtime and with your changes the --analyze now finished after 2000s :)

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.