Hi there, this is not a bug but I'm wondering what runtime to expect from git-filt

Expected runtime on large repos about git-filter-repo HOT 3 CLOSED

uqs commented on July 1, 2024

Expected runtime on large repos

from git-filter-repo.

Comments (3)

newren commented on July 1, 2024

Yeah, it seems to process commits kind of slowly on a clone of the freebsd/freebsd repo from github. The --analyze mode tends to be much slower than doing the actual transformations (it's gathering a lot more detailed data) but I still would have expected it to complete in less time than "overnight" for a repo of that size. Note that if I just skip doing the analysis and instead just prune the empty commits, it completes in a little over 13 minutes for me on a clone of the freebsd/freebsd repo:

$ time git filter-repo --prune-empty always
Parsed 768877 commits
New history written in 325.61 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 8c41dd32e985 Remove duplicate dbufs accounting.
Enumerating objects: 3835827, done.
Counting objects: 100% (3835827/3835827), done.
Delta compression using up to 8 threads
Compressing objects: 100% (1222739/1222739), done.
Writing objects: 100% (3835827/3835827), done.
Total 3835827 (delta 2559799), reused 3835143 (delta 2559123)
Completely finished after 797.85 seconds.

real	13m20.930s
user	11m13.120s
sys	2m5.064s

And it looks like it drops the overall number of commits from 768877 to 765913 (or just restricting to HEAD, it drops the overall number of commits from 266441 to 265512), so it prunes out about 3000 empty commits (1000 if you only care about the history of HEAD).

On another repo of similar size, and using the --analyze mode, I see

$ time git filter-repo --analyze
Processed 5972834 blob sizes
Processed 486510 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 7299 and retry the command.
Processed 487182 commits
Writing reports to .git/filter-repo/analysis...done.

real	31m23.658s
user	31m40.684s
sys	1m27.296s

But I'm not yet sure why the --analyze mode takes so long for the freebsd repo. Early digging suggests it's related to attempting to gather information about renames (which is specific to the --analyze mode) and perhaps related to the checks for whether it needs to break the "equivalence classes" for renames (if someone renames A to B, but then introduces another A later, I don't want to claim that all versions of A & B were just different revisions of the same file). I'm digging in...

from git-filter-repo.

newren commented on July 1, 2024

I pushed a couple new commits to speed things up. For the similarly-sized repo I mentioned above it provides a modest improvement, dropping the overall time from 31m23.658s to 23m55.685s. For the FreeBSD repo, it provides a far more dramatic improvement, dropping the time for --analyze from takes-more-time-than-you-or-I-have-patience-for down to just 19m29.999s on my machine.

Thanks for the report and the testcase!

from git-filter-repo.

uqs commented on July 1, 2024

Woa, that's amazing, thank you!
I've aborted the old run after it racked up 40h of runtime and with your changes the --analyze now finished after 2000s :)

from git-filter-repo.

Expected runtime on large repos about git-filter-repo HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent