Giter Club home page Giter Club logo

Comments (5)

newren avatar newren commented on July 18, 2024

Hi Andrea,

I can see how users might be surprised by this, but there's not actually a bug here. If you have a codebase with a lot of branches and/or tags, odds are that any given commit is part of several branches or tags. For example, with the git.git repository, if I pick some old commit I authored I can see which branches and tags it is part of:

$ git branch --contains 784f8affe4dfc8ceec93803d6c22b4b8467a4642
  source-in-pretty
  rename-perf
  big-repo-small-cherry-pick
  conflict-handling-df
  pu
  rebase-use-sequencer-by-default
  checkout-m-dirty-index
  smarter-sparse-checkout
  dir-rename-fixes
  next
  remove-filter-branch
  simple/typo-fixes
  rebase-fixes
* master

And checking which tags contain that commit has far more answers:

$ git tag --contains 784f8affe4dfc8ceec93803d6c22b4b8467a4642 | wc -l
486

It would be correct to say that this commit was part of any of those 14 branches or any of those 486 tags even though most of them did not exist at the time the commit was created (and it would not be at all surprising if the branch I was on at the time I created that commit no longer exists; in such a case, it would be impossible for fast-export to even determine the "original" branch name). Personally, I think it was a design mistake for the fast-import format to require that a reference be specified with each commit; that was totally unnecessary[1] and just leads to confusion. But, since it does require it, fast-export has to pick something, so it picks whatever branch or tag the revision parsing machinery happens to first reach that commit with. And since filter-repo's --commit-callback just tries to expose the internals of how fast-export/fast-import are working to allow people to edit, it naturally shows up with whatever fast-export picked. But the branch name isn't really relevant because in the end the branches and tags end up pointing to the right commits.

Not much I can do other than invent a time machine to go back and remove the design misfeature of fast-import from over a decade ago, but the format has strong backward compatibility guarantees so it can't really be changed at this point.

Does my explanation help? Did I miss anything you were trying to ask or understand?

[1] Technically, the format has redundant information between the "from/merge" directives and the branch name when you have a single parent history, a fact they exploited to allow people to omit the "from/merge" directive to make it marginally easier to import such single parent histories. But that added another layer of confusion in that people reading the streams have a harder time figuring out what's going on, and also means that people trying to figure out how to deal with histories with multiple root commits having a horrendously difficult time until they learn the special magically syntax to avoid the special treatment of parent-less commits with an existing branch name. Also, that one lame usecase means we'll be stuck with people being surprised by the issue you raised for the rest of eternity. We could have prevented all three problems by just always requiring that parents be specified, and left the reference name out of commit objects entirely. Then people could set the branch and tag locations at the end with reset/tag directives. The format would have been SO much better this way. But I learned all that well over a decade too late to provide that feedback before the format was set in stone...

from git-filter-repo.

puntopaz avatar puntopaz commented on July 18, 2024

Hi newren,

Thank you for the detailed reply.
Your explanation was crystal clear and really helped me to understand why this happens.
It also matches what I already figured out (the first part about one commit being part of multiple branches/tags)

Unfortunately, it does not help me in my problem (creating a separated repo from a subfolder in my main repo, while maintaining branches/tags and history).
Basically, I need the information "in which branch was this commit originally made".
In this way I can to replicate the commit in the separated repo and then use the hash of the new commit as a reference for a submodule in the main repo.

Do you know of a reliable way to get this information? I tried several attempts with various git commands, but so far I got no luck. The best command I found is a variation of 'git log --graph'

If you use the command:
git --no-pager log --graph --oneline --no-abbrev-commit --decorate=full --first-parent
then you get an output like:

* d8948bb270da5ebfe74fb347a24511f9531ee5b8 (HEAD -> refs/heads/master) rev5: Commit E
* 3e87df641ee663dc99577a07625852659204cfb2 rev4: Commit D
* 78b2fd2fc5d6f7ef09fda209631ba25f94f322cb (tag: refs/tags/TAG_XXX) rev3: Commit C
* 818c90f84eb9f06c04534e3fc00e4597d2c9fc00 rev2: Commit B
* 87d8fe34a1f712048a903cb1e614a7dc73c0e29e rev1: Commit A

so git 'knows' the original branch (in some way).

So, why 'git log' implicitely returns the right ref, while 'fast export' does not? (rhetoric question)
I really think git should return the same info regardless of the command you use

from git-filter-repo.

newren avatar newren commented on July 18, 2024

Basically, I need the information "in which branch was this commit originally made".

It is not possible to get that information with git, except in an extraordinarily narrow circumstance. The only case where it could be possible is if reflogs are turned on and are set to never expire in any copy of the repository that ever existed (reflogs are off by default in bare repositories, and even when they are on they expire by default after a certain amount of time), and no one ever deleted any branches (seems quite unlikely), and no one ever deleted any copy of the repository (also seems unlikely), and no one ever worked with a detached HEAD (because otherwise there is no active branch) including never running a rebase (seems really unlikely), and people didn't generate the same commit on different branches (this is actually a pretty safe assumption, but I mention it for completeness; people can do that by sharing patches and commit info and then both entering it on different machines while one different branches), AND you have access to all the information in every single one of the copies and are willing to do a significant amount of legwork to sleuth through all the reflogs. If ALL of those conditions are met, then you can determine which branchname was the active one within which commits were originally created.

Do you know of a reliable way to get this information?

I can assure you that there isn't one. I still don't understand why you need it, though.

I tried several attempts with various git commands, but so far I got no luck. The best command I found is a variation of 'git log --graph'

If you use the command:
git --no-pager log --graph --oneline --no-abbrev-commit --decorate=full --first-parent
then you get an output like:

* d8948bb270da5ebfe74fb347a24511f9531ee5b8 (HEAD -> refs/heads/master) rev5: Commit E
* 3e87df641ee663dc99577a07625852659204cfb2 rev4: Commit D
* 78b2fd2fc5d6f7ef09fda209631ba25f94f322cb (tag: refs/tags/TAG_XXX) rev3: Commit C
* 818c90f84eb9f06c04534e3fc00e4597d2c9fc00 rev2: Commit B
* 87d8fe34a1f712048a903cb1e614a7dc73c0e29e rev1: Commit A

so git 'knows' the original branch (in some way).

This doesn't imply that git knows the original branch at all. All this output says is that master currently points to rev5, rev5 contains the other commits in its history, and TAG_XXX points to rev3.

rev2 could have been created on someone else's machine on a branch named "foo", then pushed it to some other repo (or someone pulled directly from him) and possibly put it on a branch named "bar". Then at some point someone merges it, as a fast-forward, and it becomes part of the "baz" branch. Then at some point someone decides that the "baz" branch looks like a better main branch than master, and forcibly resets master to point where "baz" does. And then they delete the "bar" and "baz" branches that repo never even know of the original "foo" branch that was on someone else's machine. I could come up with more complicated scenarios easily (even real world ones) especially when we start involving more than one commit, and any of them would be possible with the information shown with this git log.

So, why 'git log' implicitely returns the right ref, while 'fast export' does not? (rhetoric question)
I really think git should return the same info regardless of the command you use

git log doesn't attempt to label all the commits; it only labels the commit currently marked by the tip of the branch. You can try to force it to label all commits, e.g.
git log --graph --oneline --all --source
When I do that in git-filter-repo, it labels all of the oldest commits with "refs/remotes/origin/rebase-i-autosquash-rebase-merges-fails", which from a "which branch was this commit original part of" is totally crazy. I created that branch more like a tag to mark a place I wanted to test later, and never created any commits while I had that branch checked out. If you run that same git command on your repo, you will probably find that it labels rev1 and rev2 not with "refs/heads/master" but with "refs/tags/TAG_XXX". So, in that sense, log and fast-export are being consistent.

from git-filter-repo.

puntopaz avatar puntopaz commented on July 18, 2024

Hi newren,

When I started the repo splitter script I was just expecting all the 'commit.branch' info to be something like 'refs/heads/XXX', and that started this whole issue.
I wrote all the 'create-branch-in-subrepo' script part basing on that (wrong) assumption, but with your explanation now I understand better and should be able to rewrite and fix that part.
I hope to have the script working in a few days.

If you don't mind I will post it here, then you can use it as a contribution if you like it

Thank you again,
Andrea

from git-filter-repo.

newren avatar newren commented on July 18, 2024

Hi Andrea,

Cool, glad to hear you seem to be on track. I'd be happy to take a look and perhaps even include it in contrib/.

I'll go ahead and close this one out, but feel free to post whatever you come up with if you want.

Cheers,
Elijah

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.