Delete all git branches which don't add diff over master

Question

To take this question a level further - is there a way of deleting all branches that would have no diff if rebased on master.

While the answer to that question works for a standard merge workflow, it doesn't pick up branches that have been 'Squashed & Merged' upstream. Because those branches have been squashed before merging, they don't have the same commit hashes, and so git doesn't realize that they have 'effectively' been merged:

$ git branch feature -d
error: The branch 'feature' is not fully merged.
If you are sure you want to delete it, run 'git branch -D feature.

But if we rebase the branch first, it deletes them without complaint:

$ git checkout feature
Switched to branch 'feature'
Your branch is based on 'origin/feature', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup) 

$ git rebase develop
First, rewinding head to replay your work on top of it...

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

$ git branch -d feature
Deleted branch feature (was 72214c7).

So, is there a way of a) scanning the branches and checking which ones are safe to delete, and b) deleting those

Maximilian · Accepted Answer

This library offers a way of doing this: https://github.com/not-an-aardvark/git-delete-squashed

...using this bash script:

git checkout -q master && git for-each-ref refs/heads/ "--format=%(refname:short)" | while read branch; do mergeBase=$(git merge-base master $branch) && [[ $(git cherry master $(git commit-tree $(git rev-parse $branch^{tree}) -p $mergeBase -m _)) == "-"* ]] && git branch -D $branch; done

torek · Answer

You specifically mentioned using rebase, which makes the problem more difficult in the presence of squash "merges". There is an easier way to try it, which may suffice, or may not.

Let's take a look at a workflow that sometimes uses squash "merge"¹ instead of actual merges or rebasing, and maybe sometimes uses rebasing.

We start with some chain of commits in some repository, probably a centralized one on some big server we'll call origin:

...--A--B   <-- branch

If it's centralized on the server, we clone it and wind up calling this origin/branch in our own repository:

...--A--B   <-- origin/branch

Now we git checkout branch and start working, and we make some new commit(s) of our own:

...--A--B   <-- origin/branch
         \
          C--D   <-- branch

Maybe ours aren't enough, or we get feedback when we push commit-chain C--D and make a pull request out of it, so we add another commit or two:

...--A--B   <-- origin/branch
         \
          C--D--E   <-- branch

While all this is going on, the repository on origin is potentially also acquiring new commits, so that by the time we're really ready and have pushed C-D-E as a pull request, we might even have this on origin:

...--A--B--F   <-- branch

but in any case what happens now is that whoever controls origin (perhaps directly, as through a clicky GUI interface on GitHub, or perhaps indirectly through their own repository) eventually takes our C-D-E chain and puts it into the repository on origin, but does so by making a new, single commit—let's call this CDE to show that it does what C-D-E does—and putting that into the sequence on origin, so that they now have:

...--A--B--CDE--F--G   <-- branch

or:

...--A--B--F--CDE--G   <-- branch

or similar.

We now git fetch this to bring our repository up to date, giving us:

...--A--B--F--CDE--G   <-- origin/branch
         \
          C--D--E   <-- branch

or similar.

On the other hand, maybe the keeper of origin keeps our individual commits but rebases them him- or her-self, so that the upstream now has:

...--A--B--F--C'--D'--E'--G   <-- branch

and we wind up with:

...--A--B--F--C'--D'--E'--G   <-- origin/branch
          \
           C--D--E   <-- branch

¹I like to put quotes around "merge" for git merge --squash since this uses the merge machinery, but does not make a merge commit. Using git revert, git cherry-pick, and even git apply, we often wind up using the merge machinery, but people don't call those "merges"! There seems to be something about the fact that the top level command is spelled git merge --squash that leads people to call this "merging". Perhaps if the top level command were git gimme people would call this "gimming"? :-)

Since "squash" is a perfectly good verb of its own, though, I think it would be nice to just call this "squashing", and refer to these as commits that have been "squashed".

The ultimate goal

The goal here is to delete our branch branch if and only if there's some commit sequence CDE or C'-D-E' or some such, in our upstream origin/branch, that means that our original C-D-E chain is no longer needed.

The problem is that we don't know what the person or people controlling the upstream have done, because they never told us. (How rude! :-) )

There are any number of things we can try.

Method 1: rebase

We could try just running git rebase, rebasing our branch onto our origin/branch. If—this is a big "if"—they, whoever they are, actually copied our C-D-E chain to a C'-D'-E' chain, our Git will probably² find that the upstream origin/branch has our three commits, and will therefore drop them from our rebase. If it does, we will get this:

...--A--B--F--C'--D'--E'--G   <-- branch, origin/branch

and we will know it is safe to delete our label branch. But if they squashed instead of merging, our Git won't drop our C-D-E. It will instead try to apply them (with git apply or with git cherry-pick) one at a time. If we are lucky, it will discover that each one reduces to nothing at all, and after three manual "skip" steps we will get this:

...--A--B--F--CDE--G   <-- branch, origin/branch

If we are not lucky, we will get merge conflicts and have to live through Merge Hell until we realize that, oh hey, commit CDE equals the summary of our three commits and we should just drop them.

²Our Git will figure this out on its own if and only if the patch IDs (see the git patch-id documentation) match.

Method 2: merge

We could try merging. This relies on merge bases and endpoints. The merge base of our branch and our origin/branch is commit B. Our Git will diff B vs E, and B vs G. Then our Git will try to combine the two sets of changes.

The resulting merged files will either match G, in which case everything in our C-D-E is already included, or will not, in which case ... well, there are two possibilities.

Maybe G deliberately undid something we did—perhaps G is a revert of D, for instance, or a partial revert. Let's say we added the line Woohoo! in the middle of file README.txt. Someone took it back out because it was inappropriate, so now README.txt in commit G matches README.txt in commit B. D'oh!

Well, when Git compares B vs G, it won't see the added line. When Git compares B vs E, it will see the added line. So Git will put it in, thinking we want it. Woohoo! But maybe we did not want it after all. D'oh!

All in all, though, it looks like merging is a better strategy than rebasing, because it handles both the squashed case (C-D-E in branch becomes CDE in origin/branch) and the rebased case (C-D-E in branch becomes C'-D'-E' in origin/branch). But it leaves us with an annoying merge commit. If ??? is the mystery sequence that may include CDE or C'-D'-E', we go from this:

...--A--B---???---G   <-- origin/branch
          \
           C--D--E   <-- branch

to this:

...--A--B---???---G   <-- origin/branch
          \        \
           C--D--E--M  <-- branch

which leaves us a merge commit M. We can now compare M vs G to see if there's an extra line in README.txt or whatever (woohoo!), and based on that, decide whether to delete branch. If the two match exactly, it's safe enough to delete branch, as long as we don't care about the precise details of the C-D-E sequence. If not, we must think about the difference.

Method 3: squash

Instead of making regular merge M, we could just squash. This uses the merge machinery in the same way, but then:

Forces us to make the final commit, as if we had run git merge --no-commit.
Makes that last commit, once we run git commit, as a regular, non-merge commit.

That is, we get the exact same tree as with Method 2 (merge), but a different commit graph:

...--A--B---???---G   <-- origin/branch
          \
           C--D--E--S  <-- branch

As before, we just want to git diff the two commits—the two branch tips—and see if there is extra stuff like our README.txt change (now we must think about it) or not (now we can safely delete branch).

Conclusion

There's no particular reason to prefer any of these, except that the merge or squash method makes everything happen in one step, and works when the people running the upstream repository also squashed. Use whatever works best for you.

None will make the entire problem go away, because, well, "Woohoo! D'oh!" What will make the whole problem go away is if your upstream people, the ones running the repository on origin, tell you when they have copied or squashed your commits (assuming, of course, you trust / can believe them).

Delete all git branches which don't add diff over master

Tags:

git

Maximilian

2 Answers

Maximilian

The ultimate goal

Method 1: rebase

Method 2: merge

Method 3: squash

Conclusion

torek

Recent Activity

Donate For Us

Delete all git branches which don't add diff over master

Tags:

git

Maximilian

2 Answers

Maximilian

The ultimate goal

Method 1: rebase

Method 2: merge

Method 3: squash

Conclusion

torek

Related questions

Recent Activity

Donate For Us