An Agile Perspective on Branching and Merging


More on the Merging Problem
Working in a Private Workspace may result in conflicts which need resolving. With more than one person doing development, and even in some cases with single developers on a project, merging is required on a regular basis, particularly when you perform continuous integration: Two people both checkout the same version of the same file; They both make changes; and one person checks in first. Now the second person has to merge their changes together with the first person's.

This is the same problem as merging between two branches. The difference is the length of time of the "divergence" gradually increases the set of divergent files between the two codelines and hence increases the likelihood of concurrent edits (and the resulting need for a merge).

So, avoiding branching does NOT eliminate the merging problem. It only somewhat decreases the likelihood. What branching decreases more-than-slightly is the resulting complexity of the merge when it does happen. The more activity that has taken place in parallel on the same sets of files, the more complicated it may be to merge those files. If both codelines are very active—this will occur a lot. Most of the time this is not the case. Instead, only one of the codelines is very active (the one that works on the "latest stuff") and the other codeline is for the occasional repair that has to happen in the legacy version (and propagate forward).

The same holds true for most workarounds people come up with to avoid branching (for example conditional code without version-branching). If you don't branch and handle it with run-time code, you have to have separate run-time configuration settings for each case. If it is done with compile-time conditional-code, then you have to have separate build configuration settings for each case and know how to enable/disable them accordingly (and still have to do two builds, even though you didn't branch).

If instead we branch, we may have to do two builds, but we might execute the exact same sequence of steps (procedures) to do the build in either codeline because we will typically have "branched" the build tools and associated configuration files too. So the procedure to build either codeline would in fact be less complex than the one needed to avoid branching, because that "defers" the configuration-decision from checkout-time (on a new branch) to compile-time instead (for a different set of compilation flags). Therefore, we don't feel that avoiding branching avoids complexity. It's a question of choosing which part of the rug to "sweep the complexity" under. If you fear branching (or merging) and choose to never sweep it under the earlier portion of the rug, that is the choice you make, but the complexity still rears its ugly head in some form or another. The key is to recognize the choices and avoid being afraid of branching and learn to use it wisely—taming the branching beast. This process of education is part of our mission as agile SCM-istas!

The Refactoring Blues
Having said the above, one major complication for branching and merging, and yet also paradoxically a reason for branching, is refactoring. Agile teams want to refactor and indeed need to be able to do so to preserve a clean codebase which is more easily maintainable in the future. Simple refactoring does not change the functionality of the code but may change the textual format substantially. It can make it very difficult to promote a maintenance change (bug fix or perhaps enhancement) to a class in a newer version when that new class has been totally rewritten (because of refactoring) in the new version.

Refactoring becomes even more problematic when dealing with files that are renamed. This has particularly come to the fore with the rise of Java, where the class name must be the same as the name of the file in the file system (possibly a questionable design decision, but one we now have to live with).

Some IDEs, starting with IntelliJ and now including Eclipse and others, offer a number of refactorings from within the IDE, including the renaming of classes. This can result in a number of changes throughout the application where that class is explicitly referenced in other classes—all handled by the IDE, including automatic checkouts and renames in the underlying SCM tool if that tool's plug-in supports it. Handling renames across branches is one of those areas where there are different levels of SCM tool support—make sure you are using one that supports it well—or at least well enough.

The problem becomes worse if you have serious parallel development going on. If 3.5 is your base release and you have two major enhancements A & B in development. The business doesn't know if A will turn into 3.6 and B into 3.7, or the other way around. Or perhaps both will merge together into 3.6. Until the modules (which share a common database and rely on some common classes) are close to complete and have a release cycle planned concretely it's hard to refactor any of the base classes because then you make dependencies about them both being released together, or one before the other.

Note that the results of this type of requirement and the associated overheads need to be made visible to management. They often don't realize directly the maintenance overhead associated with being able to do this, and can blithely agree to client requests without considering the consequences (and that perhaps the client should either be paying more for the requested flexibility, or indeed that less flexibility should be allowed). And yet in spite of this, we have all personally experienced many projects which have been doing refactoring in parallel development projects for more than a decade, and this would not have been possible without branching (and good tool support and knowledge of how to use those tools). Using strict code ownership and file-locking would have made things too unbearably slow to be productive. With sound parallel development practices that group together modifications made toward the same purpose as logical (rather than physical) units of change, the projects had little or no hassle and no waiting around for other folks to check something back in to release a file-lock.

This highlights the fact that the most successful parallel development shops don't do parallelism at the granularity of single files or classes/methods. They do it at the level of projects and subprojects and tasks. If you do it at the file/class level granularity it becomes too complex track. If you do it at the project/task granularity it all fits into a single change-flow hierarchy. The "Streamed Lines" paper is clear about the requirement for project-oriented versus file-oriented locking. Otherwise it looks as if all those practices are referring to individual files and it seems crazy. This also raises the need for a version control tool that can not only do branching, but that lets you use symbolic names for branches (not just revisions on the branch, but for the branch itself). If the tool tracks both divergence (branch-points) and convergence (merge-points) then merging is no longer the nightmare that everyone always assumes it must be. In fact it's often quite trivial. And some of the more modern tools even provide some automated and intelligent merging support to make it a breeze most of the time.


User Comments

Kylie Wilson's picture

Very insightful article. I came across which has some great free resources. They also have a free test which can gauge your readiness to take the pmp exam.

January 28, 2014 - 6:40am
Helena Lui's picture

very insightful and well researched article. great job! you can check out for free PMP training resources. 

March 26, 2016 - 1:11am

About the author

About the author

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.