I have a situation where I have two repository histories that have been duplicated and mangled (via interaction and migration around SVN--not my choice). I have both repositories as remotes in the same temporary maintenance repository. They share a few hundred commits worth of history, and then the "old" one continues for a few dozen more on a few branches. I need to fast-forward the "new" tree up to the state of the old one. Because of the mangling however, despite having identical content, they are not recognised as the same tree.
I would like a way to tell git "These two commits are identical, despite having different authors" (author ID was confused in translation). If possible, I would then really like if it could traverse the two remote trees and make that association for every node with identical content. This would mean I could then manually mark "commit 1" on both, and have it do the rest. Otherwise I would need to manually mark the root of every divergence (wouldn't be too bad, but would prefer not to).
I tried using graft points, which is nearly what I want-- gitk shows what I want, but when I pushed it back to the main (new) repository, it dragged along the couple-hundred duplicate commits. It's also a bit annoying to do, since I have to do it for a not-yet-merged child node.
I found https://stackoverflow.com/a/973403/372757 , and think that it will work: I will merely need to rebase the old commits onto the new repository, once for each branch.
None the less, I would still like to know if my original request is possible.
git has a pretty strict definition of what an "identical commit" would be, that probably doesn't match what you're thinking. In order to be an identical commit, all of the following must be true:
All of these things are either directly or indirectly used in generating the SHA1 hash for the new commit, and thus a commit won't be identical unless it's truly identical.
That said, and I think possibly more to the point of your question, when generating a new commit, if a particular file or tree is byte-for-byte identical to what an object that is already in the database, because another commit had those things in exactly the same state, then the new commit will point to those already existing objects - they won't be stored again.
If it's only author information that differs in two branches (which will be a different sequence of commits, even if the file contents matched entirely with another branch), you can use git filter-branch or git rebase to rewrite a branch, fixing the information as you go, but that will result in a whole new set of commits (but all the trees and file objects can potentially stay the same, assuming you don't change anything other than commit messages, times, or author/committer names). Note however, that if other work (by yourself or others) is already based off the existing branch, there can be a significant amount of cleanup involved in making such changes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With