File Merge: How did we get ourselves into this mess

[article]
Summary:

There are several reasons why we may need to merge two modified versions of a file:

  • When using "edit-merge-commit" (sometimes called "optimistic locking"), it is possible for two developers to edit the same file at the same time.
  • Even if we use "checkout-edit-checkin", we may allow multiple checkouts, resulting once again in the possibility of two developers editing the same file.
  • When merging between branches, we may have a situation where the file has been modified in both branches.

In other words, this mess only happens when people are working in parallel.  If we serialize the efforts of our team by never branching and never allowing two people to work on a module at the same time, we can avoid ever facing the need to merge two versions of a file.

However, we want our developers to work concurrently.  Think of your team as a multithreaded piece of software, each developer running in its own thread.  The key to high performance in a multithreaded system is to maximize  concurrency.  Our goal is to never have a thread which is blocked on some other thread.

So we embrace concurrent development, but the threading metaphor continues to apply.  Multithreaded programming can sometimes be a little bit messy, and the same can be said of a multithreaded software team.  There is a certain amount of overhead involved in things like synchronization and context switching.  This overhead is inevitable.  If your team is allowing concurrent development to happen, it will periodically face a situation where two versions of a file need to be merged into one.

In rare cases, the situation can be properly resolved by simply choosing one version of the file over the other.  However, most of the time, we actually need to merge the two versions to create a new version. 

What do we do about it?
Let's carefully state the problem as follows:  We have two versions of a file, each of which was derived from the same common ancestor.  We sometimes call this common ancestor the "original" file.  Each of the other versions is merely the result of someone applying a set of changes to the original.  What we want to create is a new version of the file which is conceptually equivalent to starting with the original and applying both sets of changes.  We call this process "merging".

The difficulty of doing this merge varies greatly for different types of files.  How would we perform a merge of two Excel spreadsheets?  Two PNG images?  Two files which have digital signatures?  In the general case, the only way to merge two modified versions of a file is to have a very smart person carefully construct a new copy of the file which properly incorporates the correct elements from each of the other two.

However, in software and web development there is a special case which is very common.  As luck would have it, most source code files are plain text files with an average of less than 80 characters per line.  Merging files of this kind is vastly simpler than the general case.  Many SCM tools contain special features to assist with this sort of a merge.  In fact, in a majority of these cases, the two files can be automatically merged without requiring the manual effort of a developer.

An example
Let's call our two developers Jane and Joe.  Both of them have retrieved version 4 of the same file and both of them are working on making changes to it.

One of these developers will checkin before the other one.  Let's assume it is Jane who gets there first.  When Jane tries to checkin her changes, nothing unusual will happen.  The current version of the file is 4, and that was the version she had when she started making her changes.  In other words, version 4 was her baseline for these changes.  Since her baseline matches the current version, there is no merge necessary.  Her changes are checked in, and a version of the file is created in the repository.  After her checkin, the current version of

Pages

About the author

TechWell Contributor's picture TechWell Contributor

The opinions and positions expressed within these guest posts are those of the author alone and do not represent those of the TechWell Community Sites. Guest authors represent that they have the right to distribute this content and that such content is not violating the legal rights of others. If you would like to contribute content to a TechWell Community Site, email editors@techwell.com.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

Nov 09
Nov 09
Apr 13
May 03