What problem does git subtree solve? When and why should I use that feature?
I've read that it is used for repository separation. But why would I not just create two independent repositories instead of sticking two unrelated ones into one?
This GitHub tutorial explains how to perform Git subtree merges.
I kind of know how to use it, but not when (use cases) and why, and how it relates to git submodule. I'd use submodules when I have a dependency on another project or library.
You should be careful to note explicitly what you are talking about when you use the term 'subtree' in the context of git as there are actually two separate but related topics here:  
git-subtree and git subtree merge strategy.
Both subtree related concepts effectively allow you to manage multiple repositories in one. In contrast to git-submodule where only metadata is stored in the root repository, in the form of .gitmodules, and you must manage the external repositories separately.
git subtree merge strategy is basically the more manual method using the commands you referenced.
git-subtree is a wrapper shell script to facilitate a more natural syntax. This is actually still a part of contrib and not fully integrated into git with the usual man pages.  The documentation is instead stored along side the script.
Here is the usage info:
NAME ---- git-subtree - Merge subtrees together and split repository into subtrees   SYNOPSIS -------- [verse] 'git subtree' add   -P <prefix> <commit> 'git subtree' add   -P <prefix> <repository> <ref> 'git subtree' pull  -P <prefix> <repository> <ref> 'git subtree' push  -P <prefix> <repository> <ref> 'git subtree' merge -P <prefix> <commit> 'git subtree' split -P <prefix> [OPTIONS] [<commit>] I have come across a pretty good number of resources on the subject of subtrees, as I was planning on writing a blog post of my own. I will update this post if I do, but for now here is some relevant information to the question at hand:
Much of what you are seeking can be found on this Atlassian blog by Nicola Paolucci the relevant section below:
Why use subtree instead of submodule?
There are several reasons why you might find
subtreebetter to use:
- Management of a simple workflow is easy.
- Older version of
gitare supported (even beforev1.5.2).- The sub-project’s code is available right after the
cloneof the super project is done.
subtreedoes not require users of your repository to learn anything new, they can ignore the fact that you are usingsubtreeto manage dependencies.
subtreedoes not add new metadata files likesubmodulesdoes (i.e..gitmodule).- Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.
In my opinion the drawbacks are acceptable:
- You must learn about a new merge strategy (i.e.
subtree).- Contributing code back
upstreamfor the sub-projects is slightly more complicated.- The responsibility of not mixing super and sub-project code in commits lies with you.
I would agree with much of this as well. I would recommend checking out the article as it goes over some common usage.
You may have noticed that he has also written a follow up here where he mentions an important detail that is left off with this approach...
git-subtree currently fails to include the remote!
This short sightedness is probably due to the fact that people often add a remote manually when dealing with subtrees, but this isn't stored in git either.  The author details a patch he has written to add this meta data to the commit that git-subtree already generates.  Until this makes it into the official git mainline you could do something similar by modifying the commit message or storing it in another commit.
I also find this blog post very informative as well.  The author adds a third subtree method he calls git-streeto the mix.  The article is worth a read as he does a pretty good job of comparing the three approaches. He gives his personal opinion of what he does and doesn't like and explains why he created the third approach.
This topic shows both the power of git and the segmentation that can occur when a feature just misses the mark.  
I personally have built a distaste for git-submodule as I find it more confusing for contributors to understand. I also prefer to keep ALL of my dependencies managed within my projects to facilitate an easily reproducible environment without trying to manage multiple repositories. git-submodule, however, is much more well known currently so it is obviously good to be aware of it and depending on your audience that may sway your decision.
First of: I believe your question tends to get strongly opinionated answers and may be considered off-topic here. However I don't like that SO policy and would push the border of being on-topic a bit outward, so I like to answer instead and hope others do as well.
On the GitHub tutorial that you pointed to there's a link to How to use the subtree merge strategy which gives a viewpoint on advantages/disadvantages:
Comparing subtree merge with submodules
The benefit of using subtree merge is that it requires less administrative burden from the users of your repository. It works with older (before Git v1.5.2) clients and you have the code right after clone.
However if you use submodules then you can choose not to transfer the submodule objects. This may be a problem with the subtree merge.
Also, in case you make changes to the other project, it is easier to submit changes if you just use submodules.
Here's my viewpoint based on the above:
I often work with folks (=committers) who are no regular git users, some still (and will forever) struggle with version control. Educating them about how to use the submodule merge strategy is basically impossible. It involves the concepts of additional remotes, about merging, branches, and then mixing it all into one workflow. Pulling from upstream and pushing upstream is a two stage process. Since branches is difficult to understand for them, this is all hopeless.
With submodules it's still too complicated for them (sigh) but it is easier to understand: It's just a repo within a repo (they are familiar with hierarchy) and you can do your pushing and pulling as usual.
Providing simple wrapper scripts is easier imho for the submodule workflow.
For large super-repos with many sub-repos the point of choosing not to clone data of some sub-repos is an important advantage of the submodules. We can limit this based on work requirements and disk space usage.
Access control might be different. Haven't had this issue yet, but if different repos require different access controls, effectively banning some users from some sub-repos, I wonder if that's easier to accomplish with the submodule approach.
Personally I'm undecided what to use myself. So I share your confusion :o]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With