Giter Club home page Giter Club logo

Comments (5)

newren avatar newren commented on August 18, 2024

I'm glad --paths-from-file worked for you, that's great.

I'm very surprised by your claim, though; I've tested that multiple times. Are you sure you provided the same paths in both cases? What OS are you on? What do your command lines look like, and what are the contents of the --paths-from-file argument you used?

In particular, if I make a couple throw-away clones of git-filter-repo itself, and run

git filter-repo --debug --path Documentation --path release/

in one of them and run

git filter-repo --debug --paths-from-file <(echo -e "Documentation\nrelease/")

in the other, then after the rewrite the two give identical results: two directories, four total files, 17 commits, and HEAD pointing to commit af917c37559c33e7c09f40b45418b44bec02ac0b. Perhaps more interesting is that in the debug output, the first contains:

path_changes=[('filter', 'match', b'Documentation'), ('filter', 'match', b'release/')]

while the latter contains

path_changes=[['filter', 'match', b'Documentation'], ['filter', 'match', b'release/']]

which reflects the fact that --paths-from-file literally was just meant to set up data structures such that it looked like --path had been invoked multiple times -- though I botched it slightly using a list instead of a tuple. However, the later code simply indexed into the data it received and thus there was no difference.

from git-filter-repo.

torbjornsk avatar torbjornsk commented on August 18, 2024

Tried with a couple of clones of git-filter-repo, and there it behaves as expected.

Did another try with my project, also. Here is the command using --path:
git filter-repo --path excel-eksport/pom.xml --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java --path excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java --path felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java --path grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java --path grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java --path grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java --force --replace-refs delete-no-add --prune-empty always

And here is the file used as input:

excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java
excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java
excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java
excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java
felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java
felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java
grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java
grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java
grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java

And this command:
git filter-repo --paths-from-file ../FILES.txt --force --replace-refs delete-no-add --prune-empty always

I'm observing the same as yesterday, the first leaves a completely empty repo, while the latter preserves the files in question.

Here is the output using --debug on the first command:

Namespace(analyze=False, blob_callback=None, commit_callback=None, debug=True, dry_run=False, email_callback=None, filename_callback=None, force=True, help=False, inclusive=True, mailmap=None, max_blob_size=0, message_callback=None, name_callback=None, no_ff=False, partial=False, path_changes=[('filter', 'match', b'excel-eksport/pom.xml --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelarkData.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksporterer.java --path excel-eksport/src/main/java/no/forskningsradet/exceleksport/ExcelEksport.java --path excel-eksport/src/test/java/no/forskningsradet/exceleksport/ExcelEksportTest.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelarkData.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksporterer.java --path felles/src/main/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksport.java --path felles/src/test/java/no/forskningsradet/ebehandling/exceleksport/ExcelEksportTest.java --path grupperegister/grupperegister-domene/src/main/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksport.java --path grupperegister/grupperegister-domene/src/test/java/no/forskningsradet/ebehandling/grupperegister/gruppe/medlemskap/ExcelEksportTest.java --path grupperegister/grupperegister-integrasjon/src/main/java/no/forskningsradet/ebehandling/grupperegister/export/ExcelEksporterer.java')], preserve_commit_encoding=False, preserve_commit_hashes=False, prune_degenerate='auto', prune_empty='always', quiet=False, refname_callback=None, refs=['--all'], replace_refs='delete-no-add', replace_text=None, reset_callback=None, source=None, state_branch=None, stdin=False, strip_blobs_with_ids=set(), subdirectory_filter=None, tag_callback=None, tag_rename=None, target=None, to_subdirectory_filter=None, use_base_name=False, version=False)
[DEBUG] Migrating refs/remotes/origin/* -> refs/heads/*
[DEBUG] Removing 'origin' remote (rewritten history will no 
        longer be related; consider re-pushing it elsewhere.
[DEBUG] Running: git fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --no-data --use-done-feature --mark-tags --reencode=yes --all
  (saving a copy of the output at .git/filter-repo/fast-export.original)
[DEBUG] Running: git -c core.ignorecase=false fast-import --force --quiet
  (using the following file as input: .git/filter-repo/fast-export.filtered)
Parsed 34362 commits
New history written in 3.77 seconds; now repacking/cleaning...
[DEBUG] Running: git reset --hard
[DEBUG] Running: git reflog expire --expire=now --all
[DEBUG] Running: git gc --prune=now
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), done.
Total 1 (delta 0), reused 1 (delta 0)
Completely finished after 4.12 seconds.

Seems something is off with the match, it just read the first --path and treats everything that comes after as the value.

I tried removing the space after the first --path, and suddenly it works. Think it might have something to do with me trying to solve everything in a fancy one-liner:

find excel-eksport -type f | xargs -n1 git log --format="" --name-only --follow -p -- | sort | uniq | paste -sd "~" - | sed 's/~/ --path /g' | xargs -I{} git filter-repo --debug --path {} --force --replace-refs delete-no-add --prune-empty always

The full command using --path above is just this with echo in front of git filter-repo.

It might be xargs doing something strange, but the weird part is that fails even when I copy the echo output. I can't seem to find anything wrong with any character either.

from git-filter-repo.

newren avatar newren commented on August 18, 2024

Your pipeline doesn't do what you think it does. Let me demonstrate; with the following simple python program named arguments.py:

#!/usr/bin/env python3

import sys
print("Arguments (1 per line):\n  " + "\n  ".join(sys.argv))

Then it'll report output like the following:

$ ./arguments.py foo bar 'quoted argument' baz
Arguments (1 per line):
  ./arguments.py
  foo
  bar
  quoted argument
  baz

Now we can use this to see how xargs will behave:

$ echo -e "first line\nsecond line\nthird line has more text" | xargs -I{} arguments.py --path {}
Arguments (1 per line):
  ./arguments.py
  --path
  first line
Arguments (1 per line):
  ./arguments.py
  --path
  second line
Arguments (1 per line):
  ./arguments.py
  --path
  third line has more text

Note that arguments.py was invoked three times, and that it didn't separate "first" and "line" into separate arguments.

Of course, this isn't quite like your example, so let's make one that's a little more like your pipeline:

$ echo -e "file1\nfile2\nfile3" | paste -sd "~" - | sed 's/~/ --path /g' | xargs -I{} ./arguments.py git filter-repo --debug --path {} --force --replace-refs delete-no-add --prune-empty always
Arguments (1 per line):
  ./arguments.py
  git
  filter-repo
  --debug
  --path
  file1 --path file2 --path file3
  --force
  --replace-refs
  delete-no-add
  --prune-empty
  always

So, this would clearly attempt to only select a file named something like 'file1 --path file2 --path file3', which of course doesn't exist.

There's another hazard too: xargs without the -n and without a combination of spaces and newlines in the input will usually result in just one program invocation; however, depending on operating system command line length limits and such, it may instead invoke the program multiple times, each time with a subset of the arguments. If that happens, then the first time you delete everything but the first N files, the second time you delete everything but the second N files, etc. Since the first N files are disjoin with the second N files, you get an empty repository after the second filtering operation.

So, this doesn't look like a filter-repo issue, but a shell pipeline issue.

One quick question, though: Any reason you are repeatedly calling git log --follow ... instead of running git filter-repo --anlayze and then making use of the .git/filter-repo/analysis/renames.txt file?

from git-filter-repo.

newren avatar newren commented on August 18, 2024

I'll go ahead and close out. Hopefully what I wrote made sense and let you fixup your xargs pipeline. Thanks for trying out filter-repo and providing feedback!

from git-filter-repo.

torbjornsk avatar torbjornsk commented on August 18, 2024

Thanks for your reply.

I was not aware of analyze and renames.txt, will try to incorporate it in my script 👍

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.