Comments (5)
I'm glad --paths-from-file
worked for you, that's great.
I'm very surprised by your claim, though; I've tested that multiple times. Are you sure you provided the same paths in both cases? What OS are you on? What do your command lines look like, and what are the contents of the --paths-from-file argument you used?
In particular, if I make a couple throw-away clones of git-filter-repo itself, and run
git filter-repo --debug --path Documentation --path release/
in one of them and run
git filter-repo --debug --paths-from-file <(echo -e "Documentation\nrelease/")
in the other, then after the rewrite the two give identical results: two directories, four total files, 17 commits, and HEAD pointing to commit af917c37559c33e7c09f40b45418b44bec02ac0b. Perhaps more interesting is that in the debug output, the first contains:
path_changes=[('filter', 'match', b'Documentation'), ('filter', 'match', b'release/')]
while the latter contains
path_changes=[['filter', 'match', b'Documentation'], ['filter', 'match', b'release/']]
which reflects the fact that --paths-from-file literally was just meant to set up data structures such that it looked like --path had been invoked multiple times -- though I botched it slightly using a list instead of a tuple. However, the later code simply indexed into the data it received and thus there was no difference.
from git-filter-repo.
Tried with a couple of clones of git-filter-repo, and there it behaves as expected.
Did another try with my project, also. Here is the command using --path:
git filter-repo --path excel-eksport/pom.xml --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java --path excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java --path felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java --path grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java --path grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java --path grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java --force --replace-refs delete-no-add --prune-empty always
And here is the file used as input:
excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java
excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java
excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java
excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java
felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java
grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java
grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java
grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java
And this command:
git filter-repo --paths-from-file ../FILES.txt --force --replace-refs delete-no-add --prune-empty always
I'm observing the same as yesterday, the first leaves a completely empty repo, while the latter preserves the files in question.
Here is the output using --debug on the first command:
Namespace(analyze=False, blob_callback=None, commit_callback=None, debug=True, dry_run=False, email_callback=None, filename_callback=None, force=True, help=False, inclusive=True, mailmap=None, max_blob_size=0, message_callback=None, name_callback=None, no_ff=False, partial=False, path_changes=[('filter', 'match', b'excel-eksport/pom.xml --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java --path excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java --path felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java --path grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java --path grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java --path grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java')], preserve_commit_encoding=False, preserve_commit_hashes=False, prune_degenerate='auto', prune_empty='always', quiet=False, refname_callback=None, refs=['--all'], replace_refs='delete-no-add', replace_text=None, reset_callback=None, source=None, state_branch=None, stdin=False, strip_blobs_with_ids=set(), subdirectory_filter=None, tag_callback=None, tag_rename=None, target=None, to_subdirectory_filter=None, use_base_name=False, version=False)
[DEBUG] Migrating refs/remotes/origin/* -> refs/heads/*
[DEBUG] Removing 'origin' remote (rewritten history will no
longer be related; consider re-pushing it elsewhere.
[DEBUG] Running: git fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --no-data --use-done-feature --mark-tags --reencode=yes --all
(saving a copy of the output at .git/filter-repo/fast-export.original)
[DEBUG] Running: git -c core.ignorecase=false fast-import --force --quiet
(using the following file as input: .git/filter-repo/fast-export.filtered)
Parsed 34362 commits
New history written in 3.77 seconds; now repacking/cleaning...
[DEBUG] Running: git reset --hard
[DEBUG] Running: git reflog expire --expire=now --all
[DEBUG] Running: git gc --prune=now
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), done.
Total 1 (delta 0), reused 1 (delta 0)
Completely finished after 4.12 seconds.
Seems something is off with the match, it just read the first --path and treats everything that comes after as the value.
I tried removing the space after the first --path, and suddenly it works. Think it might have something to do with me trying to solve everything in a fancy one-liner:
find excel-eksport -type f | xargs -n1 git log --format="" --name-only --follow -p -- | sort | uniq | paste -sd "~" - | sed 's/~/ --path /g' | xargs -I{} git filter-repo --debug --path {} --force --replace-refs delete-no-add --prune-empty always
The full command using --path above is just this with echo
in front of git filter-repo.
It might be xargs doing something strange, but the weird part is that fails even when I copy the echo output. I can't seem to find anything wrong with any character either.
from git-filter-repo.
Your pipeline doesn't do what you think it does. Let me demonstrate; with the following simple python program named arguments.py:
#!/usr/bin/env python3
import sys
print("Arguments (1 per line):\n " + "\n ".join(sys.argv))
Then it'll report output like the following:
$ ./arguments.py foo bar 'quoted argument' baz
Arguments (1 per line):
./arguments.py
foo
bar
quoted argument
baz
Now we can use this to see how xargs will behave:
$ echo -e "first line\nsecond line\nthird line has more text" | xargs -I{} arguments.py --path {}
Arguments (1 per line):
./arguments.py
--path
first line
Arguments (1 per line):
./arguments.py
--path
second line
Arguments (1 per line):
./arguments.py
--path
third line has more text
Note that arguments.py was invoked three times, and that it didn't separate "first" and "line" into separate arguments.
Of course, this isn't quite like your example, so let's make one that's a little more like your pipeline:
$ echo -e "file1\nfile2\nfile3" | paste -sd "~" - | sed 's/~/ --path /g' | xargs -I{} ./arguments.py git filter-repo --debug --path {} --force --replace-refs delete-no-add --prune-empty always
Arguments (1 per line):
./arguments.py
git
filter-repo
--debug
--path
file1 --path file2 --path file3
--force
--replace-refs
delete-no-add
--prune-empty
always
So, this would clearly attempt to only select a file named something like 'file1 --path file2 --path file3', which of course doesn't exist.
There's another hazard too: xargs without the -n and without a combination of spaces and newlines in the input will usually result in just one program invocation; however, depending on operating system command line length limits and such, it may instead invoke the program multiple times, each time with a subset of the arguments. If that happens, then the first time you delete everything but the first N files, the second time you delete everything but the second N files, etc. Since the first N files are disjoin with the second N files, you get an empty repository after the second filtering operation.
So, this doesn't look like a filter-repo issue, but a shell pipeline issue.
One quick question, though: Any reason you are repeatedly calling git log --follow ...
instead of running git filter-repo --anlayze
and then making use of the .git/filter-repo/analysis/renames.txt file?
from git-filter-repo.
I'll go ahead and close out. Hopefully what I wrote made sense and let you fixup your xargs pipeline. Thanks for trying out filter-repo and providing feedback!
from git-filter-repo.
Thanks for your reply.
I was not aware of analyze and renames.txt, will try to incorporate it in my script 👍
from git-filter-repo.
Related Issues (20)
- Update submodule hashes?
- Renaming paths into pre-existing path causes double-nested paths (sometimes)
- Keep last 3 months of package-lock.json diffs only HOT 1
- Question: prune lfs files
- remark: Pity that this tool can't run scripts/programs and it is not clearly stated HOT 1
- minor: Logic error with `_commits_referenced_but_removed` on a GitHub Gist web url in commit message 😂
- Crash when path contains emoji HOT 1
- Question: Recommended way to log the usage of git filter-repo and related changes? HOT 2
- Callback that gives both filename and blob
- Breaking change in git 2.43 or 2.44 HOT 2
- lint-history: --refs argument not working at all HOT 1
- Test suite succedes with Python 3.11 but has multiple failures with Python 3.12 HOT 7
- Turns out my assumption was wrong: `git lfs migrate export --everything --include="*"` does rewrite the whole history, across all branches, reinjecting all the large files' consecutive versions ([see here](https://github.com/git-lfs/git-lfs/issues/910#issuecomment-551566315)). Awesome! HOT 1
- Support for SHA256 HOT 1
- FR: Filter into new repo HOT 1
- clean-ignore of filter-repo-demos does not handle utf-8 characters HOT 1
- Trouble with Lock File HOT 3
- Really-43e2c HOT 2
- > main HOT 1
- Renaming path and then renaming it back to the original name deletes path rather than renaming it (renaming to another path that exists has weird side effects) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git-filter-repo.