Giter Club home page Giter Club logo

Comments (1)

newren avatar newren commented on June 19, 2024 3

Am I holding it wrong? I would have thought from "extracting wanted paths and their history (stripping everything else)" that the full history of the content at the given path at HEAD would be retained, even if the content at the given path at HEAD had not always been located at the given path.

Yeah, I can see how it'd be nice to have this different functionality, but it's not what was intended by the wording or what I even tried to implement. Note that the wording you quote was not for any specific command line flag, it was just generic what-can-the-tool-do wording. If you look at the actual documentation from the manual for the --path option, it states that it is for "Exact paths (files or directories) to include in filtered history.". The builtin documentation uses the same wording. I would have at a minimum left off the word "Exact" in both places if rename or copy detection was implied; the "Exact" is there just to reinforce that.

Also, this is consistent with how the rev-list, log, and fast-export git subcommands work. E.g. git log -- src/ledger/bin/app/app.cc won't show any history for other paths that this file was renamed or copied from (or for which parts of it came from). You used the --follow flag specifically, which is a big hack as even noted in the git log documentation (it mentions that it only works when a single file is specified). If rev-list/log/fast-export, etc. had a --follow option that followed renames, I could simply expose it from filter-repo, but despite the desire for such an option no one has implemented it in many years. There's some good challenges there too, e.g. we'd probably want to traverse in topological order and we may need two passes -- one to create the topological ordering, and the second to build up additional paths from renames. (A case where this might be necessary: some branch builds on top of 'master' and has some paths within the specified pathspec that came from a rename of something outside the pathspec at the time 'master' existed. If 'master' was traversed before the other branch, then we'd have already picked the more limited pathspec and miss the extra needed paths.)

But even if --follow implemented following of renames for multiple files or a directory or more, that still wouldn't necessarily be sufficient because perhaps the user needs copy detection (i.e. it wasn't a file renamed from somewhere else, rather it was copied). But with copy detection it's not as clear if you want the full history of the original; I can imagine that in some cases you would but not others.

And if we start doing either rename or copy detection, then we're moving from well-defined correct behavior to heuristics. For diffs or logs or even merges that's fine, because the results will be interpreted by a user (even in merge, if the detection is wrong, the user can fixup conflicts and make other edits). Here, we'd record the results of the heuristics in stone. That's a bit worrying to me...and it also means we'd have to open up a pile of knobs (at the very least a similarity percentage, and whether copies are wanted in addition to renames) for configuration.

All that said, I wanted something like that when I was using it too. The best compromise I came up with was to have people run 'git filter-repo --analyze' beforehand, look at the renames sub-report, and pick out additional paths by hand based on that to feed to their filter-repo run. The --analyze option still had a few caveats with the rename detection, but that was mostly fundamental to the problem. Providing it and letting the user decide what to include (though I didn't even bother with copy detection), seemed like the best option I had available.

I know that's probably not what you wanted to hear, but I hope the --analyze hint and the renames information it generates is at least helpful.

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.