Comments (1)
Am I holding it wrong? I would have thought from "extracting wanted paths and their history (stripping everything else)" that the full history of the content at the given path at HEAD would be retained, even if the content at the given path at HEAD had not always been located at the given path.
Yeah, I can see how it'd be nice to have this different functionality, but it's not what was intended by the wording or what I even tried to implement. Note that the wording you quote was not for any specific command line flag, it was just generic what-can-the-tool-do wording. If you look at the actual documentation from the manual for the --path option, it states that it is for "Exact paths (files or directories) to include in filtered history.". The builtin documentation uses the same wording. I would have at a minimum left off the word "Exact" in both places if rename or copy detection was implied; the "Exact" is there just to reinforce that.
Also, this is consistent with how the rev-list, log, and fast-export git subcommands work. E.g. git log -- src/ledger/bin/app/app.cc
won't show any history for other paths that this file was renamed or copied from (or for which parts of it came from). You used the --follow flag specifically, which is a big hack as even noted in the git log documentation (it mentions that it only works when a single file is specified). If rev-list/log/fast-export, etc. had a --follow option that followed renames, I could simply expose it from filter-repo, but despite the desire for such an option no one has implemented it in many years. There's some good challenges there too, e.g. we'd probably want to traverse in topological order and we may need two passes -- one to create the topological ordering, and the second to build up additional paths from renames. (A case where this might be necessary: some branch builds on top of 'master' and has some paths within the specified pathspec that came from a rename of something outside the pathspec at the time 'master' existed. If 'master' was traversed before the other branch, then we'd have already picked the more limited pathspec and miss the extra needed paths.)
But even if --follow implemented following of renames for multiple files or a directory or more, that still wouldn't necessarily be sufficient because perhaps the user needs copy detection (i.e. it wasn't a file renamed from somewhere else, rather it was copied). But with copy detection it's not as clear if you want the full history of the original; I can imagine that in some cases you would but not others.
And if we start doing either rename or copy detection, then we're moving from well-defined correct behavior to heuristics. For diffs or logs or even merges that's fine, because the results will be interpreted by a user (even in merge, if the detection is wrong, the user can fixup conflicts and make other edits). Here, we'd record the results of the heuristics in stone. That's a bit worrying to me...and it also means we'd have to open up a pile of knobs (at the very least a similarity percentage, and whether copies are wanted in addition to renames) for configuration.
All that said, I wanted something like that when I was using it too. The best compromise I came up with was to have people run 'git filter-repo --analyze' beforehand, look at the renames sub-report, and pick out additional paths by hand based on that to feed to their filter-repo run. The --analyze option still had a few caveats with the rename detection, but that was mostly fundamental to the problem. Providing it and letting the user decide what to include (though I didn't even bother with copy detection), seemed like the best option I had available.
I know that's probably not what you wanted to hear, but I hope the --analyze hint and the renames information it generates is at least helpful.
from git-filter-repo.
Related Issues (20)
- Update submodule hashes?
- Renaming paths into pre-existing path causes double-nested paths (sometimes)
- Keep last 3 months of package-lock.json diffs only HOT 1
- Question: prune lfs files
- remark: Pity that this tool can't run scripts/programs and it is not clearly stated HOT 1
- minor: Logic error with `_commits_referenced_but_removed` on a GitHub Gist web url in commit message 😂
- Crash when path contains emoji
- Question: Recommended way to log the usage of git filter-repo and related changes? HOT 1
- Callback that gives both filename and blob
- Breaking change in git 2.43 or 2.44 HOT 1
- lint-history: --refs argument not working at all
- Test suite succedes with Python 3.11 but has multiple failures with Python 3.12 HOT 4
- Turns out my assumption was wrong: `git lfs migrate export --everything --include="*"` does rewrite the whole history, across all branches, reinjecting all the large files' consecutive versions ([see here](https://github.com/git-lfs/git-lfs/issues/910#issuecomment-551566315)). Awesome! HOT 1
- Support for SHA256
- FR: Filter into new repo
- clean-ignore of filter-repo-demos does not handle utf-8 characters
- Trouble with Lock File HOT 1
- Really-43e2c HOT 2
- > main HOT 1
- Renaming path and then renaming it back to the original name deletes path rather than renaming it HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git-filter-repo.