Understanding Darcs/Patch theory and conflicts

Conflicts

Up to now, we have only dealt with merging patches that do not conflict with each other. The next question of interest is how darcs should behave when they do.

Consider the previous darcs hackathon example, where as usual, Arjan decides that the shopping list needs some beer. In this scenario, Ganesh decides that you can't live on apples and cookies alone and records a patch adding "pasta" to the s_list file. Now he wants to know what Arjan is up to, and so pulls the beer patch into his repository, but oh no! Arjan and Ganesh's patches conflict! How should darcs behave here?

The darcs answer is that both patches cancel each other out so that neither of them has any effect. The resulting shopping list has neither beer nor pasta. This might sound alarming, but it's not as bad as you might think. Darcs does not silently delete your code. After canceling the two patches, it adds a third patch into your working directory which indicates both sides of the conflict so that you can select the one that you want. So any resolution you apply is a third patch which depends on the two conflicting ones. If you did darcs whatsnew on Ganesh's repository at this point, what you would get is something like this:

v v v v v v 
beer
-----------
pasta
^ ^ ^ ^ ^ ^

How do we know we have a conflict?

It is intuitively obvious that Arjan's patch conflicts with Ganesh's, but intuition is useless if it does not translate into actual Haskell code. The first issue is thus that of knowing that we have a conflict in the first place.

All of this boils down to commutation. We have a conflict if commutation is not defined for the two patches. Let us briefly revisit the merge process described in the previous chapter. When Ganesh tries to pull Arjan's patch in, he tries to adapt the patch to his context by performing the following sequence: invert his own patch, apply Arjan's patch $A$ , commute the inverted patch with Arjan's patch, and discard the evil step sister of his inverted patch. As we know, inverting patches is easy. Ganesh's patch is inverted into something which remove 'pasta' from line 3 of the s_list file. On the other hand, when we try to commute that against Arjan's patch, we have a failure.

Why? Simply because it is how we define commutation between the two types of patches. For instance, both Ganesh's and Arjan's patches are hunk patches. The commutation of two hunk patches of the same file is defined in darcs using Haskell code very similar to the following (simplified from PatchCommute.lhs):

commuteHunk :: FileName -> (FilePatchType, FilePatchType) -> Maybe (Patch, Patch)
commuteHunk f (p1@(Hunk line2 old2 new2), p2@(Hunk line1 old1 new1))
  | line1 + lengthnew1 < line2 = Just ...
  | line1 + lengthnew1 == line2 && nonZero = Just ...
  | line2 + lengthold2 < line1 = Just ...
  | line2 + lengthold2 == line1 && nonZero = Just ...
  | otherwise = Nothing
  where nonZero = lengthold2 /= 0 && lengthold1 /= 0 && lengthnew2 /= 0 && lengthnew1 /= 0 
        lengthnew1 = length new1
        lengthnew2 = length new2
        lengthold1 = length old1
        lengthold2 = length old2

Only four cases are defined. The first two cases cover the situation where the p1 occurs in an earlier part file than p2 (even bumping up against it as in the second case). The latter two cases cover the reverse situation (p2 is in earlier part of the file than p1). However, the case where p1 and p2 overlap simply does not fall into one of these possibilities. Thus we have a conflict on our hands.

Forced commutation

Now that we know we have a conflict, we now need to deal with this conflict in a sane manner. We not only want to deal with the conflict at hand, but deal with it in a way which allows the conflict to propagate cleanly across an entire sequence of patches. Well, darcs is based on commutation, so in order to keep things running smoothly, we need to make sure that things continue to commute. So, we're going to define a secondary forced commutation operation that we only use when there is a conflict.

Recall the definition of commutation from the previous chapter:

The forced commutation is going to do something similar, but with a very odd twist. Instead of patches $Y_{1}$ and $X_{1}$ performing the same change as their respective ancestors $Y$ and $X$ ; forced commutation is going to give us patches, each of which makes the change that the other patch does. That is, normal commutation wants $Y_{1}$ to do roughly the same thing as $Y$ , but forced commutation makes it do the same thing as $X$ .

operation	effect of $Y_{1}$	effect of $X_{1}$
normal commutation	$Y$	$X$
forced commutation	$X$	$Y$

Effects

As a side note, we're going to need a little terminology to keep ourselves from tripping over our tongues. It's not very convenient to always talk about one patch making the same change as another patch, which is something we will be referring to a lot. So let us compress things a little bit. Instead of saying that patch $Y_{1}$ makes the same change as $X$ , let us simply say that the effect of $Y_{1}$ is $X$ . It is the same idea, but with slightly smoother terminology.

Forced commutation in merging

Let us see what the implications of this are for Ganesh and Arjan. We want to commute the inverse of Ganesh's patch ( $B^{-1}$ ) against Arjan's patch. Since the two patches conflict, we have to resort to forced commutation, which produces two patches ${A_{1}}$ and ${B_{1}}^{-1}$ with the following bizarre properties:

the effect of $A_{1}$ is $B^{-1}$ ; it removes Ganesh's "pasta" from the shopping list.
likewise, the effect of ${B_{1}}^{-1}$ is $A$ ; it adds Arjan's "beer" to the shopping list.

Merging a conflict through forced commutation

This is all very convenient, because if I may remind you, what we're really after is cancelling out the patches. If we do the standard merging technique of simply removing ${B_{1}}^{-1}$ (so we don't add the beer after all), we will have successfully undone Ganesh's pasta patch. The merge is complete!

Marking the conflict

But wait! We can't just leave things undone. How is the poor developer supposed to know if there is a conflict, if darcs handles them by undoing things? The answer is that we're not going to stop here. Undoing the conflict is a very important first step, as we will see in further detail below. Look at it this way. We know there was a conflict, because of the way commutation was defined, and we know which patches were involved in the conflict. So whenever this happens, we first undo everything, and then inspect the contents of the conflicting patches, and use that to create a new conflict-marking patch.

FIXME:insert image here showing the conflict-marking patch

Darcs 2

:TODO: introduce this section

The exponential merge problem

Unfortunately, the darcs 1 merge algorithm has the property that certain merges -- merges that people have experienced in real life -- are exponential in time with respect to the size of conflict (in number of conflicting patches). This leads to the problem that some users have experienced where users would do a darcs pull and inexplicably, darcs would just sit there and hang...

So how does the new darcs 2 fix this problem? What's going on under the hood?

Conflictors

The notion of conflictors is essentially that we would special patches that contain a list of patches they conflict with

Use of Generalised Algebraic Datatypes to improve code safety

See also Haskell/GADT and http://wiki.darcs.net/Ideas/GADTPlan

Current research

Next Page: Conclusion | Previous Page: More patch theory
Home: Understanding Darcs