Giter Club home page Giter Club logo

Comments (23)

glensc avatar glensc commented on July 18, 2024 6

for now, I've found that I can recover the notes from the replace refs:

git for-each-ref refs/replace/ --format='%(objectname) %(objecttype) %(refname:lstrip=2)' \
| while read new type old; do
    if [ "$type" != "commit" ]; then
        exit 1
    fi

    git notes copy $old $new
    git notes remove $old
done

from git-filter-repo.

woopla avatar woopla commented on July 18, 2024 3

@glensc nice suggestion! I used the following to go from 24h to 1h runtime:

tail -n +2 .git/filter-repo/commit-map | git notes --ref=cvs copy --force --stdin
tail -n +2 .git/filter-repo/commit-map | awk '{print $1}' | git notes --ref=cvs remove --ignore-missing --stdin

Some of that time might actually be the 'removing note for object' messages that can be improved by redirecting to /dev/null.

from git-filter-repo.

uqs avatar uqs commented on July 18, 2024 2

Hi folks, I'm investigation the use of this, but for a repo with about 350k commits that have their notes replaced, the workaround is simply too slow. I recon it would take about 24h to complete.

from git-filter-repo.

ymartin59 avatar ymartin59 commented on July 18, 2024 2

I have a trouble with latest git filter-repo and git 2.29
A simple "--path" processing has discarded refs/notes/commits and its related tree.
Is there a reason and a work-around to this behaviour?

from git-filter-repo.

glensc avatar glensc commented on July 18, 2024 1

You can add this to your global git config. I believe it fetches notes and leaves clone "pristine'.

run git config -e --global and insert this:

[remote "origin"]
    fetch = +refs/notes/*:refs/notes/*
    fetch = +refs/pull/*/head:refs/remotes/origin/pr/*
    fetch = +refs/merge-requests/*/head:refs/remotes/origin/mr/*

aside, how does git-filter-repo consider repo clean? inspects reflog? perhaps do the reverse and clear reflog?

from git-filter-repo.

newren avatar newren commented on July 18, 2024 1

Yeah, notes are a bit of a problem; they really should be handled differently as special objects because they have paths that look like hashes with a semi-random number of slashes inserted to fan them out into directories.

I suspect the easiest workaround to the --path issue would be to (1) do filtering as normal, then (2) from the filtered repository manually fetch the notes from the original repository (git fetch original_url refs/notes/*:refs/notes/*; this should be safe from the normal risks of mixing old and new history due to the fact that notes have a completely independent commit history from normal commits; the only tie between notes and the normal history is that the filenames in the notes history refer to commits from the normal history), and then finally (3) using the workarounds suggested above by @glensc or @woopla.

Sorry for taking so long to get back to this issue. merge machinery, rename detection, directory traversal stuff, etc., etc. have all taken way more time than expected...

from git-filter-repo.

newren avatar newren commented on July 18, 2024

Right, the notes aren't discarded, they simply continue to refer to the old commit ids, i.e. they are not connected to the rewritten history. This is a known issue that I had been tracking in my personal todo list, which I probably should have opened a ticket for. Thanks for opening it, and for providing a workaround for others.

Challenges here (mostly notes to self):

  • Because of the way notes are internally structured, fast-export will likely be emitted before the corresponding commits that they are a note for, meaning filter-repo will not have sufficient information to rewrite them to the new commit id at the time we are processing the note.
  • When I queried some of the other git developers about making fast-export do two passes over the refs, doing notes later, Peff made a few alternate suggestions:
    • Re-use the init_display_notes()/format_display_notes() machinery that the revision walking already uses
    • Add new export and import types to fast-export/fast-import specifically for notes
  • Unfortunately, format_display_notes() assumes the exact format that would be used by log, which doesn't match what fast-export/fast-import would need to use; it doesn't provide generic callbacks for formatting it some other way.
  • The logic where information such as the author/committer etc. of the note is buried a few levels deep; trying to pipe that out to upper levels will require a bit of rework.

Additionally:

  • Usually all people care about with a note is the contents of the blobs, i.e. the notes they wrote for whatever commit. However, the notes are structured as full commit objects, recording the author and committer of the note, when the note was created, what notes existed before it, etc. This presents two issues: (1) Folks generally expect a fast-export | fast-import (without any filtering in the middle) to not change commit and reference ids unnecessarily, meaning we need to preserve author, committer, etc., (2) While git has made it be a bit more work to discover who authored various notes and when, people totally could be doing it, and we don't want to break them...suggesting we need to preserve author, committer, etc.
  • Having people specify author/committer etc. for each note seems like a bunch of overhead for the fast-import format, so maybe we should make it optional? But what should be used for author/committer if unspecified?
  • I could just implement a hack that moves notes to their new location, though that has a few downsides as well: (1) Invoking 2N git notes processes (where N is the number of notes) is going to be an annoying overhead, (2) This doesn't fix the fact that fast-export and fast-import don't provide good support for notes objects as we'd expect (i.e. it's making my tool better, but not making git better), (3) it wouldn't work well with things like a note-specific callback which feels like something we should have, (4) it could get slightly confusing if our normal filters end up affecting the notes (e.g. if someone says to delete files matching [0-9a-f]{20,} they probably don't mean for notes to get deleted too.

Anyway, a proper fix probably requires a fair amount of git surgery to the notes, revision walking, fast-export, and fast-import areas of the code (and I'm not that familiar with the former two). And I'm not yet sure on the exact structure that should take.

from git-filter-repo.

galsi avatar galsi commented on July 18, 2024

for now, I've found that I can recover the notes from the replace refs:

git for-each-ref refs/replace/ --format='%(objectname) %(objecttype) %(refname:lstrip=2)' \
| while read new type old; do
    if [ "$type" != "commit" ]; then
        exit 1
    fi

    git notes copy $old $new
    git notes remove $old
done

Hi
I am new to Git scripting - how do i ignore commits with no notes ?

from git-filter-repo.

glensc avatar glensc commented on July 18, 2024

@galsi depending on the error you get, if you're using my script from the above you can just ignore the error from git notes copy and skip git notes remove

git notes copy $old $new && git notes remove $old

from git-filter-repo.

galsi avatar galsi commented on July 18, 2024

@galsi depending on the error you get, if you're using my script from the above you can just ignore the error from git notes copy and skip git notes remove

git notes copy $old $new && git notes remove $old

Hi
thanks for quick answer.
i am getting the following
error: missing notes on source object 0023c2404229f355868857b3ae7bcdec33f7a9c6. Cannot copy.
Is there an option to select only the commits with notes , this could save time
i am working on a repo with more then 125000 commits

from git-filter-repo.

woopla avatar woopla commented on July 18, 2024

@uqs same here. It took around 24h for the same amount of commits, so your estimate is correct. I thought I'd be smart and xargs -P8 the whole thing, but there are locks in git notes add that make this a pain to handle...

from git-filter-repo.

glensc avatar glensc commented on July 18, 2024

maybe, as a quicker solution (but still workaround) is that git-filter-repo creates map file which could be applied as bulk instead of two git command invocations for each replaced commit. 💸

from git-filter-repo.

newren avatar newren commented on July 18, 2024

maybe, as a quicker solution (but still workaround) is that git-filter-repo creates map file which could be applied as bulk instead of two git command invocations for each replaced commit.

I started some work on an actual solution about a month ago, involving changes to both core git and git-filter-repo. The majority of the work is in core git, and the current work in progress can be seen on my fast-export-notes branch of the newren/git repo on github (and by work in progress I mean it won't work for anyone yet, it just has some useful changes). As filter-repo is almost completely maintained in my free time, I'll finish it when I finish it; probably in a month or two and make it part of the 2.29.0 release (i.e. not the upcoming 2.28.0 release).

That all said, git-filter-repo already creates a map file at the end of its run, and it recently became documented: see #117.

from git-filter-repo.

glensc avatar glensc commented on July 18, 2024

@ymartin59 did you even read the comments?

the second comment explains the recovery method:

from git-filter-repo.

Fantabrain avatar Fantabrain commented on July 18, 2024

@ymartin59 did you even read the comments?

the second comment explains the recovery method:

* [#22 (comment)](https://github.com/newren/git-filter-repo/issues/22#issuecomment-558693518)

@glensc You've brushed off the complaint from @ymartin59 without understanding the issue.

Your workaround assumes that the notes are still present under refs/notes/commits, just targeting the wrong commit IDs. However, this will not be the case if filter-repo was run with --path to include only certain paths in the filtered repo. The reason is, a note is actually a commit object which contains the note as a file whose name is the commit ID. But such files will be excluded by --path and so the note commits will be discarded entirely.

I just ran git filter-repo --path someFile and there is literally no refs/notes/commits ref at all in the new repo.

Being that I have only a few notes, I thought I could correct this by redoing the filtration (from a new clone of the original) to include the relevant commit IDs as paths, such as:

git filter-repo --path 20e60aa8caf74c9ca4a4207ad7924f6aec0989b9 --path someFile

However, this for some reason did not work and there is still no refs/notes/commits chain in the new repo. [Edit: this was because git clone didn't even copy it; see next comment]

As such, there is no point to even run your script [where these issues are not handled first] because it will inevitably fail because the notes are actually totally gone from the repo.

I have no idea how to fix this, other than maybe filter paths by exclusion instead of inclusion, which is tedious when you're trying to make a new repo out of only one file from a parent repo with many files, deleted files, etc.

from git-filter-repo.

Fantabrain avatar Fantabrain commented on July 18, 2024

Further to what I noted to @glensc about refs/notes/commits potentially being missing due to filter-repo being run with --path (and without --invert-paths), there is another reason the note commits may be totally missing.

git clone normally does not copy notes. It will if you use --mirror, but this also implies --bare, which is often not what you want. To overcome, you can run the following before filter-repo:

git fetch origin refs/notes/*:refs/notes/*

Unfortunately, this causes the fresh clone check to fail, so then you have to use --force to filter-repo, which creates a potential for data loss from user error. I've noted this as a separate issue #254.

Once you have a clone that does contain the refs/notes/commits chain, then you can use the script from @glensc to transfer the notes onto the rewritten commits.

from git-filter-repo.

uqs avatar uqs commented on July 18, 2024

@Fantabrain the notes objects might be under a subdir hierarchy. Please check with git ls-tree refs/notes/commits, or just try all of the following paths.

20e60aa8caf74c9ca4a4207ad7924f6aec0989b9
20/e60aa8caf74c9ca4a4207ad7924f6aec0989b9
20/e6/0aa8caf74c9ca4a4207ad7924f6aec0989b9

from git-filter-repo.

Fantabrain avatar Fantabrain commented on July 18, 2024

@uqs thank you for the suggestion. What I ended up doing was just getting a list of all files from filter-repo --analyze, removing from it the desired files including all the notes, and then passing it to filter-repo as the list of files to remove.

I did this before discovering that the clones didn't even contain refs/notes/commits, so I was already using subtractive path mode before anything would have worked. But I highly suspect that using this for each note would have also worked:

--path 20e60aa8caf74c9ca4a4207ad7924f6aec0989b9

I really don't think you'd need notation such as 20/e6/0aa8... or indeed that this would even work.

If you look at the paths in refs/notes/commits even just via gitk --all, you see they're just the 40 digit hashes at the root level. They don't appear to use the aa/bb/ccdd... scheme that the storage back-end uses to store objects physically. These are basically user file paths within the repo (stored the same way user files are stored), so that scheme probably has no benefit given how Git deals with such paths. That scheme is mainly intended to overcome limitations and inefficiencies of some physical filesystems, which wouldn't apply here.

from git-filter-repo.

uqs avatar uqs commented on July 18, 2024

That just means you don't have many notes objects. See this for what happens when you have 400k notes or so: https://cgit.freebsd.org/src/tree/?h=refs/notes/commits

The code that creates these additional levels is here: https://sourcegraph.com/github.com/git/git/-/blob/notes.c#L497-535

But I guess it's all mood anyway.

from git-filter-repo.

Fantabrain avatar Fantabrain commented on July 18, 2024

@uqs Ah, ok, I would not have expected that, but I guess maybe Git's internal handling of repo paths does have its own bottlenecks on very large directories. Or else I'm not sure why they would have done that, given the added complexity. Since these paths are in permanent commits in refs/notes/commits, I don't even want to think about what happens when the "fanout" changes; either a commit that moves all the paths, or having to account for different notes being on different fanouts.

from git-filter-repo.

GrantEdwards avatar GrantEdwards commented on July 18, 2024

I've read through all of the comments, and I'm still stumped. I can't figure out if there is a solution to this problem or not. Much of it was a little over my head, so feel free to point me to the right documentation.

I'm doing a

git filter-repo --subdirectory-filter $Dir

That appears to work as expected:

+ git filter-repo --subdirectory-filter dirname1
Parsed 1201 commits
New history written in 0.91 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at b2da92c Remove library when cleaning
Enumerating objects: 251, done.
Counting objects: 100% (251/251), done.
Delta compression using up to 8 threads
Compressing objects: 100% (129/129), done.
Writing objects: 100% (251/251), done.
Total 251 (delta 122), reused 183 (delta 117), pack-reused 0
Completely finished after 1.07 seconds.

If I do a git log immediately before doing the filter-repo, every commit has a note as expected.

After the filter-repo, git log shows no notes.

I've tried the original work-around script, and all it does is print errors like this for every iteration of the loop:

error: missing notes on source object 8f278b90f20b19f40edc0a74fc7543757a89c622. Cannot copy.
Object 8f278b90f20b19f40edc0a74fc7543757a89c622 has no note

I've also tried this

git fetch $SrcRepo refs/notes/*:refs/notes/*

And that appears to work OK

+ git fetch ../orig.bare 'refs/notes/*:refs/notes/*'
remote: Enumerating objects: 2310, done.
remote: Counting objects: 100% (552/552), done.
remote: Compressing objects: 100% (414/414), done.
remote: Total 2310 (delta 285), reused 0 (delta 0), pack-reused 1758
Receiving objects: 100% (2310/2310), 221.52 KiB | 14.77 MiB/s, done.
Resolving deltas: 100% (1309/1309), done.
From ../apps-common.bare
 * [new ref]         refs/notes/commits -> refs/notes/commits

But there are still no notes when I do a git log, because I assume I'm missing the suggested step "(3) using the workarounds suggested above by @glensc or @woopla", but I don't know what workarounds are referred to in step 3.

from git-filter-repo.

GrantEdwards avatar GrantEdwards commented on July 18, 2024

I've tried the original work-around script, and all it does is print errors like this for every iteration of the loop:

[...]

I've also tried this

git fetch $SrcRepo refs/notes/*:refs/notes/*

[...]

But there are still no notes when I do a git log, because I assume I'm missing the suggested step "(3) using the workarounds suggested above by @glensc or @woopla", but I don't know what workarounds are referred to in step 3.

Of course minutes after posting, upon reading through the comments again, it became glaringly obvious that I needed to do both: fetch the refs/notes, then run the recovery script. I don't know how I didn't grok that the first couple times I read things...

from git-filter-repo.

hawicz avatar hawicz commented on July 18, 2024

Fyi, here's an improvement to @woopla's commands from 2020, above, which avoids dropping notes for commits that didn't get re-written:

tail -n +2 filter-repo/commit-map | git notes copy --force --stdin
tail -n +2 filter-repo/commit-map | awk '$1 != $2 {print $1}' | git notes remove --ignore-missing --stdin

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.