Git/Branching & merging

Branching is supported in most VCSes. For example, Subversion makes a virtue of “cheap copying”—namely, that creating a new branch does not mean making a copy of the whole source tree, so it is fast. Git’s branching is just as fast. However, where Git really comes into its own is in merging between branches, in particular, reducing the pain of dealing with merge conflicts. This is what makes it so powerful in enabling collaborative software development.

Why Branch?

There are many reasons for creating multiple branches in a Git repo.

You may have branches representing “stable” releases, which continue to get incremental bug fixes but no (major) new features. At the same time, you may have multiple “unstable” branches representing various new features being proposed for the next major release, and being worked on in parallel, perhaps by different groups. Those features which are accepted will need to be merged into the branch for the next stable release.
You can create your own private branches for personal experiments. Later, if the code becomes sufficiently interesting to tell others about, you may make those branches public. Or you could send patches to the maintainer of the upstream public branch, and if they get accepted, you can pull them back down into your own copy of the public branch, and then you can retire or delete your private branch.

You may, in fact, want to add updates to different branches at different times. Switching between branches is easy.

Branching

View your branches

Use git branch with nothing else to see what branches your repository has:

$ git branch
* master

The branch called "master" is the default main line of development. You can rename it if you want, but it is customary to use the default. When you commit some changes, those changes are added to the branch you have checked out - in this case, master.

Create new branches

Let's create a new branch we can use for development - call it "dev":

$ git branch dev
$ git branch
  dev
* master

This only creates the new branch, it leaves your current HEAD where you remain. You can see from the * that the master branch is still what you have checked out. You can now use git checkout dev to switch to the new branch.

Alternatively, you can create a new branch and check it out all at once with

$ git checkout -b newbranch

Delete a branch

To delete the current branch, again use git-branch, but this time send the -d argument.

$ git branch -d <name>

If the branch hasn't been merged into master, then this will fail:

$ git branch -d foo
error: The branch 'foo' is not a strict subset of your current HEAD.
If you are sure you want to delete it, run 'git branch -D foo'.

Git's complaint saves you from possibly losing your work in the branch. If you are still sure you want to delete the branch, use git branch -D <name> instead.

Sometimes there are a lot of local branches which have been merged on the server, so have become useless. To avoid deleting them one by one, just use:

git branch -D `git branch --merged | grep -v \* | xargs`

Pushing a branch to a remote repository

When you create a local branch, it won't automatically be kept in sync with the server. Unlike branches obtained by pulling from the server, simply calling git push isn't enough to get your branch pushed to the server. Instead, you have to explicitly tell git to push the branch, and which server to push it to:

$ git push origin <branch_name>

Deleting a branch from the remote repository

To delete a branch that has been pushed to a remote server, use the following command:

$ git push origin :<branch_name>

This syntax isn't intuitive, but what's going on here is you're issuing a command of the form:

$ git push origin <local_branch>:<remote_branch>

and giving an empty branch in the <local_branch> position, meaning to overwrite the branch with nothing.

Merging

Branching is a core concept of a DVCS, but without good merging support, branches would be of little use.

git merge myBranch

This command merges the given branch into the current branch. If the current branch is a direct ancestor of the given branch, a fast-forward merge occurs, and the current branch head is redirected to point at the new branch. In other cases, a merge commit is recorded that has both the previous commit and the given branch tip as parents. If there are any conflicts during the merge, it will be necessary to resolve them by hand before the merge commit is recorded.

Handling a Merge Conflict

Sooner or later, if you’re doing regular merges, you will hit a situation where the branches being merged will include conflicting changes to the same source lines. How you resolve this situation will be a matter of judgement (and some hand-editing), but Git provides tools you can use to try to get an insight into the nature of the conflict(s), and how best to resolve them.

Real-world examples of merge conflicts tend to be nontrivial. Here we will try to create a very simple, albeit artificial, example, to try to give you some flavour of what is involved.

Let us start with a repo containing a single Python source file, called test.py. Its initial contents are as follows:

#!/usr/bin/python3
#+
# This code doesn't really do anything at all.
#-

def func_common()
    pass
#end func_common

def child1()
    func_common()
#end child1

def child2()
    func_common()
#end child2

def some_other_func()
    pass
#end some_other_func

Commit this file to the repo, with a commit message saying something like “first version”.

Now create a new branch and switch to it, using the command

git checkout -b side-branch

(This second branch is to simulate work being done on the same project by another programmer.) Edit the file test.py, and simply swap the definitions of the functions child1 and child2 around, equivalent to applying the following patch:

diff --git a/test.py b/test.py
index 863611b..c9375b3 100644
--- a/test.py
+++ b/test.py
@@ -7,14 +7,14 @@ def func_common()
     pass
 #end func_common
 
-def child1()
-    func_common()
-#end child1
-
 def child2()
     func_common()
 #end child2
 
+def child1()
+    func_common()
+#end child1
+
 def some_other_func()
     pass
 #end some_other_func

Commit the update to the branch side-branch with a message like “swap a pair of functions around”.

Now switch back to the master branch:

git checkout master

This will also put you back to the previous version of test.py, since that was the last (in fact only) version committed to that branch.

On this branch, we now rename the function func_common to common, equivalent to the following patch:

diff --git a/test.py b/test.py
index 863611b..088c125 100644
--- a/test.py
+++ b/test.py
@@ -3,16 +3,16 @@
 # This code doesn't really do anything at all.
 #-
 
-def func_common()
+def common()
     pass
-#end func_common
+#end common
 
 def child1()
-    func_common()
+    common()
 #end child1
 
 def child2()
-    func_common()
+    common()
 #end child2
 
 def some_other_func()

Commit this change to the master branch, with a message like “rename func_common to common”.

Now, try to merge in the change you made on side-branch:

git merge side-branch

This should immediately fail, with a message like

Auto-merging test.py
CONFLICT (content): Merge conflict in test.py
Automatic merge failed; fix conflicts and then commit the result.

Just to check what git-status(1) reports:

On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")

Unmerged paths:
  (use "git add <file>..." to mark resolution)

        both modified:      test.py

no changes added to commit (use "git add" and/or "git commit -a")

If we look at test.py now, it should look like

#!/usr/bin/python3
#+
# This code doesn't really do anything at all.
#-

def common()
    pass
#end common

<<<<<<< HEAD
def child1()
    common()
#end child1

=======
>>>>>>> side-branch
def child2()
    common()
#end child2

def child1()
    func_common()
#end child1

def some_other_func()
    pass
#end some_other_func

Note those sections marked “<<<<<<< HEAD” ... “=======” ... “>>>>>>> src-branch”: the part between the first two markers comes from the HEAD branch, the one we are merging onto (master, in this case), while the part between the last two markers comes from the branch named src-branch, which we are merging from (side-branch, in this case).

Assuming we know exactly what the code does, we can carefully fix up all the conflicting/duplicated parts, remove the markers, and continue the merge. But perhaps this is a large project, and no single person, not even the project leader, fully understands every corner of the code. In this case, it is helpful to at least narrow down the set of commits that lead directly to the conflict, in order to get a handle on what is going on. There is a command that you can use, git log --merge, which is designed specifically to be used during a merge conflict, for just this purpose. In this example, I get output something like this:

$ git log --merge
commit 9df4b11586b45a30bd1e090706e3ff09692fcfa7
Author: Lawrence D'Oliveiro <ldo@geek-central.gen.nz>
Date:   Thu Apr 17 10:44:15 2014 +0000

    rename func_common to common

commit 4e98aa4dbd74543d7035ea781313c1cfa5517804
Author: Lawrence D'Oliveiro <ldo@geek-central.gen.nz>
Date:   Thu Apr 17 10:43:48 2014 +0000

    swap a pair of functions around
$

Now, as project leader, I can look further at just those two commits, and figure out that nature of the conflict is really quite simple: one branch has swapped the order of two functions, while the other has changed the name of another function being referenced within the rearranged code.

Another useful command is git diff --merge, which shows a 3-way diff between the state of the source file in the staging area, and the versions from the parent branches:

$ git diff --merge
diff --cc test.py
index c9375b3,863611b..088c125
--- a/test.py
+++ b/test.py
@@@ -3,18 -3,18 +3,18 @@@
  # This code doesn't really do anything at all.
  #-
  
--def func_common()
++def common()
      pass
--#end func_common
- 
- def child2()
-     func_common()
- #end child2
++#end common
  
  def child1()
--    func_common()
++    common()
  #end child1
 
+ def child2()
 -    func_common()
++    common()
+ #end child2
+ 
  def some_other_func()
      pass
  #end some_other_func
$

Here you see, in the first two columns of each line, “+” and “-” characters indicating lines added/removed with respect to the two branches, or a space indicating no change.

Armed with this information, I can approach the problem of fixing up the conflicted file with a bit more confidence, creating the following merged version of test.py:

#!/usr/bin/python3
#+
# This code doesn't really do anything at all.
#-

def common()
    pass
#end common

def child2()
    common()
#end child2

def child1()
    common()
#end child1

def some_other_func()
    pass
#end some_other_func

Just to recheck, after doing git add test.py on the above fixed version, but before committing, do another git diff --merge, which should produce output like:

diff --cc test.py
index c9375b3,863611b..088c125
--- a/test.py
+++ b/test.py
@@@ -3,18 -3,18 +3,18 @@@
  # This code doesn't really do anything at all.
  #-
  
--def func_common()
++def common()
      pass
--#end func_common
- 
- def child2()
-     func_common()
- #end child2
++#end common
  
  def child1()
--    func_common()
++    common()
  #end child1
  
+ def child2()
 -    func_common()
++    common()
+ #end child2
+ 
  def some_other_func()
      pass
  #end some_other_func

And what does git status say?

On branch master
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

        modified:   test.py

Now when you do like it says and enter git commit, Git automatically finishes the merge.

“The Stupid Content Tracker”

The git(1) man page summarizes Git as “the stupid content tracker”. It is important to understand what “stupid” means in this case: it means that Git does not use elaborate algorithms to try to automatically handle merge conflicts, instead it concentrates on displaying just the relevant information to help human intelligence to resolve the conflict. Linus Torvalds has famously said that he wouldn’t trust his code to such elaborate merge conflict-resolution systems, which is why he deliberately designed Git to be “stupid”, and therefore, reliable.