Language-Aware SCM

are you able to deal with C# or Java code specifically?”. In that sense, delivering a new generation of SCM able to handle diffs and merge conflicts based on “understanding” the underlying code would be simply “meeting customer expectations”.

Note: the Eclipse IDE is already able to display an outline of the diffs between files showing which methods have been modified, added and so on.

Displaying a language-aware diff

Once the code has been correctly parsed, and the “code unit” differences have been calculated (not a trivial operation depending on the level of depth you want to achieve), the next step is to render the “language-aware” differences correctly.

My opinion is that we’ll end up with “combined views” where some sort of class/method layout will be combined with a traditional “text-based” diff, so that the developer can choose which view he wants to focus on. (This reminds me the UML class diagrams embedded in some development tools: they’re good to some extent, especially in communicating the “big picture”. But in the end, the code is the best “representation” (like in DSLs) of the inner workings of a class.)

A simple visual outline like the following could help indicate that one method has been modified, one renamed and moved, one deleted, and a new method added.

3

Remember that a change as simple as this one can be extremely hard to follow with a conventional diff tool nowadays, if the code blocks are non-trivial (or not especially short).

Then the “method2” differences could be easily expanded and most likely displayed in conventional “text diff” format:

4

Time to merge

It is clear that diff tools can greatly benefit from “code-aware” technology, but the real benefit will be during “merge time”. Merging is more and more common these days, thanks to better version control systems able to deal with an unlimited number of branches and able to correctly handle merge tracking. But when the problem is at the “file level”, after the candidates have been correctly identified, you’re still left with your conventional 3-way merge.

One of the easiest things a “language-aware” merge can do is to transparently handle added code. Here’s a simple scenario to illustrate the idea: you add a method at the end of the file and another developer adds a different method at the end of the same file. A conventional merge tool will detect it as a conflict, since the same line of code has been modified by two contributors. Obviously, such a conflict could be automatically resolved by a “language-aware” merge tool without any risk of error.

Another simple example illustrates the power of this kind of merging. Let’s take a look at the following C# code:

5

One developer modified the “using” area by introducing a new “using” statement. The second developer added the same “using” at a different location.

There’s no way to easily resolve this conflict with a conventional “text based” diff tool, but it would be trivial for a “language-aware” tool: it would go through the “using” statements, find a duplicate one, and keep just one of them -- all without user intervention.

While this exampleis perhaps too simple, it shows the wide range of benefits that “code parsing” brings to the SCM arena.

Shortcomings

Two shortcomings of code-aware tools immediately come to my mind:

    • Parsing: right now code-agnostic diff and merge tools are able to deal with any text file. A “language-aware” system would have to develop support for each specific language, which is not rocket science but it is still much more expensive than developing a “one size fits all” solution.
    • Broken code: