If the road to hell is paved with good intentions, the road traveled by failed software projects is paved with broken builds. As a QA engineer, I used to find myself with wasted days, either stalled because there was no build to test or because the build I had just installed was seriously nonfunctional. After becoming a developer, I learned the masochistic joys of getting the latest code only to find that I could no longer build. From more experienced developers, I learned the survival technique of working with my own little sandbox of code that was days or weeks out of touch with the repository. This worked well enough until that moment we all dreaded: system integration.
I'm happy to report that my life is not like that any more. On my current team, everyone works with current code, the developers don't suffer from merge hell, and the functional. There are a lot of factors that go into reaching this happy state of affairs, but a lot of credit goes to an unlikely hero: an automated continuous integration (CI) system that has been tailored for our process.
Integration in Action
Here's how CI works. It is time to implement a new feature. A single command allows me to fetch any changed code, compile the code, and run the associated unit tests, all while I'm up getting a cup of licorice tea. As expected, the latest version compiles and passes the tests. As I work, I write both code and unit tests. When I feel I've finished, I rerun the same single command so that I have any changes other people have made while I was working. Oh look, a compilation failure. Someone changed the interface to one of the methods I’m using, but that's a quick fix. After I make the fix, I once again issue the command. The code now compiles and passes the tests on my machine, so I submit the changes to the version control system and get another cup of tea. On my return, I check my inbox and find an email telling me something's wrong with the changes I submitted. Doh! I forgot to tell the version control system about a new file I'd created. I add the file, return to email, and a few minutes later find a message reassuring me that the code now compiles and all the unit tests pass on an official build machine.
I return to other tasks. After about an hour, I notice out of the corner of my eye that the green lava lamp has stopped bubbling, and the red lamp has turned on. That means there's something wrong on the system test build machine. Checking email, I find that one of our system tests has failed. When I take a closer look, I notice that, in addition to my change, three other files have changed since the test last passed. Given the test that failed, I'm pretty sure the problem code isn’t mine. A quick review of the test with the other developer and the tester and we've got the problem nailed down. An hour later, the green lava lamp is back on—CI's work here is done. I leave, confident that tonight's build will proceed without incident.
Not New but Different
The idea of a "daily build and smoke test," also called the nightly build (at least by the author), became established when Microsoft’s engineering practices started being published in the mid-90s. In Dynamics of Software Development, Jim McCarthy describes the process as "Tip #32: If you build it, it will ship," while Steve McConnell devotes a chapter in Rapid Development to it as one of its documented best practices. Doing an automated, public, formal, very visible build once each and every day is a fantastic practice and a great remedy for many of those ills I had experienced earlier in my career. The practice is such an indicator of project health that McCarthy describes it as the heartbeat of the project.
As with many other recognized best practices, Extreme Programming took the idea one step further and came up with continuous integration. When describing the practice on xprogramming. com, Ron Jeffries writes, "We say daily builds are for wimps: XP teams build multiple times per day." Once a day and multiple times a day may seem like only a subtle distinction, but in my experience, there is a world of difference between a nightly build and a continuous one.
Nightly builds generate deliverables. They result in something tangible: something for QA to test, something for product managers to review, and something to reassure team members that they are in fact building a product. Because of the external audience, the nightly build is a formal event, a mini-milestone that your team should hit without fail. Breaking the nightly build is something that generates blame and often consequences, such as becoming the build mother or being awarded a dunce cap.
CI is different. Its builds don't need durable build products to be worthwhile. They are a way for a developer to have a conversation with the system, to get reassurance that he has done his part, at least for now. And with a CI build, the cycle time is short, the number of affected parties small, and the cost of failure low. This change in the cost of failure makes for a significant change in behavior—if you’ll let it. I've met people who want CI failures to be a shaming event, similar to what happens when the nightly build breaks. But given the nature of a CI build, does this make sense?
Consider when programs were recorded on punch cards and handed over to be entered into the system and run in batch mode. A syntax error in this environment was a very expensive loss of time, and significant effort was spent to resolve any mistakes. It made sense for code to be as perfect as possible at the first compile attempt. With a modern IDE, who worries about syntax errors as an expensive loss of time?
This implies that a CI build should be tuned to surface failure feedback as quickly as possible, but this feedback is not a management tool; it's an enabling tool. It allows the developer to take responsibility for each check-in in a way that isn't possible (or at least not cost effective) in the absence of such a system. There have been numerous times that I’ve found developers staying late, waiting to get their feedback emails from the system, their reassurances that they didn't break anything. Tracking the failures caused by each individual would only discourage the behavior of frequent check-ins, which you want to promote.
This isn't to say the formal nightly build should be abandoned—do both! The CI feedback means that problems are likely detected before the nightly build. The pending nightly build gives some added weight to fixing the problems immediately "so the nightly build will pass."
Putting the Pieces in Place
If you want to build continuously, what is the first ingredient you need—the one that is irreplaceable? You need a reliable way to get the latest code, execute the build, and have the build report—whether it failed or was successful—all by running a single command. A script that can do this should be considered an essential tool for a development team—yet it is often missing.
The second element you’ll need is a method of introducing it to the group. The details of the plan are less significant than how well it is adapted to the local team environment. The first time that I rolled out a continuous integration system it was a team decision where we described exactly how we wanted the system to function before doing any work to implement it. But some teams don't introd end of the spectrum, a friend of mine introduced the concept by running a syst his machine to pull down all the changes, compile and run some tests, and inform him of any failures. With the system in place, he knew when new code introduced a problem, and he let the appropriate people know. Before long, people wanted to know how he was always on top of these things, and the system became part of the formal build process.
The Canonical CI Build
The remainder of the article will be discussed in terms of a specific framework for a continuous build process, the open source project CruiseControl. It is characteristic of frameworks, as opposed to reusable components, to provide an inversion of control, where the framework defines the canonical application-processing steps as well as an extension mechanism to provide explicit hooks through a stable interface. True to form, CruiseControl defines the stages of a project build and requires a configuration file (typically named config.xml) to specify the implementations that should be used at each stage. These implementations are registered as plug-ins for the given project and any configuration data they require is provided in the same configuration file.
When the build begins, CruiseControl first reloads the configuration file and rereads the project settings. CruiseControl then has the option of a bootstrapping step, where a plug-in can do any preparation that is needed before the actual build is invoked. Following the bootstrapping, CruiseControl performs a mandatory modification check. If none of the plugins report a modification, the build loop will terminate and the project will return to its idle state. If a modification is detected, CruiseControl will invoke a builder. The builder will execute and return an XML log file indicating the success or failure of the build attempt. After this log is returned, there is the opportunity to incorporate into the main log file .xml files produced by the build. Finally, CruiseControl has an optional publishing step where any configured publishers can use the contents of the log file to notify the team of the build results. The log file is stored in the directory designated for that project.
The functionality provided by CruiseControl out of the box includes support for multiple source code management systems, the ability to execute builds with Ant or Maven, a Web (Java Management Extensions) interface for managing projects, and plug-ins for publishing results and artifacts via email, FTP, secure copy, and Lotus Instant Messaging (Sametime). In addition to all this build-time functionality, there is a JSP Web application for reporting against the logs for historical build results. This Web application will report information from Ant’s javac task, JUnit output, Checkstyle violations, Javadoc errors, and more.
With the provided functionality, it takes no custom code to get a basic CI build up and running. If I run such a build, I know it will detect compile or unit test errors within a few minutes of checking in, that I’ll be notified with an informative email, and that I’ll be able to peruse the Web interface to look at historical results. Now it is time to consider a couple of common adaptations to local conditions.
Time Is of the Essence
The first such adaptation is what to do when the build/compile/test cycle starts taking too much time. "Too much" is a subjective metric, but as a rule of thumb I like to have a developer get a response within fifteen minutes of a check-in. There is a time threshold beyond which automated builds move from being a continuous integration build to being effectively a daily build, and that threshold is much less than twenty-four hours. If you pass this threshold, the developer receives the build feedback after she has moved on to a new task. Then, responding to the system becomes an interruption of a new activity rather than closure on the previous one. When I find the cycle time growing too long, I turn to two basic strategies:
An example of the first strategy is to do incremental compilation. While it is appropriate dogma that a nightly build should be done from scratch, it is typically an acceptable trade-off for the quick incremental builds to compile a more limited set. At a previous company, we were able to excise a significant chunk off of our compile time by doing our incremental builds with Javamake as opposed to Ant's javac task. A second example would be to omit the build steps required to build a full deliverable but unnecessary for generating the compile and test feedback. These steps include generating documentation, building installers, making or signing JARs, and other similar tasks.
As the number of tests grows, especially long-running system tests, it becomes increasingly difficult to fit them all within the target time period. When I've had this happy problem, I've found the best solution is to add a second build machine—machine cycles are cheap; opportunity is fleeting. Try to divide the tests into "quick tests" and "slow tests," and have two continuous integration loops running throughout the day. This sort of segmentation not only provides the quick feedback that developers will wait around to get but also gives the benefit of running those system tests as frequently as possible. At Agitar, we have three feedback cycles: a fifteen-minute cycle for compilation, unit tests, and smoke tests (quick whole-product tests); a one- to two-hour feedback cycle for our system tests and tests using our product, Agitator, on itself; and our twenty-four-hour nightly build feedback cycle where we do a complete build and an exhaustive test cycle that takes approximately eight hours to complete.
Seeing What You Want to See
Getting continuous feedback is wonderful and addictive, but people get jaded quickly and pretty soon they want more than a one-size-fits-all view of the world. This is where the extensibility of the framework shines, with a range of customization strategies, from simply adding hyperlinks to adding support for entirely new tools.
An example that shows the range of customization possibilities—and the value of a reusable framework—is how we've incorporated the Agitator results in our reporting. Agitator is a developer’s testing tool that allows the developer to define assertions and then "agitates" the class under test while evaluating the assertions to see if they hold true. The primary problem we wanted to solve was to notify our developers if one of their classes failed agitation and to let them easily get the result files to debug the problem. In addition to testing the code, we also generate a dashboard that reflects the progress of the team toward reaching its testing goals, so a secondary interest was to make it easy for managers to view the dashboard results for each build. We also wanted to publish our developer report so that each developer could tell at a glance how many classes he owns, his progress in testing them, and any failures that surfaced in the run. Finally, we wanted an attention-getting way of informing everyone of the current state of the build.
First, the support for JUnit was used as a guide for giving developers feedback on failures. The JUnit support works by incorporating the XML result files and then displaying them via XSL. The mechanics of merging the XML, performing the transform, and displaying the result are all parts of the common framework, so all we had left to do was to produce our results in XML and create the XSL to display them.
The second step was to publish the artifacts of our build, which included both the test management dashboard and the data files that developers needed to debug the test failures. Again, the heavy lifting comes for free—Cruise-Control's artifact publisher handled most of the details and the website gave a rudimentary method to browse all the artifacts for a build. But while this default interface was serviceable, it didn't recognize the fundamental role both the dashboard and the failure results play in our process. So in place of the default link to browse the artifacts, we added two direct links, one to the dashboard of test results and the other to the failures data, segmented by developer.
The next step was to figure out how to send our developer testing report. This report already was being generated during the build by our tool, but we wanted it to be emailed to each developer. At the time we needed to do this, CruiseControl, while it had various email publishers, lacked a way to publish the contents of a file as an email. Our solution was to reuse the common email/publisher base class and add just the functionality we needed, the part that indicated which file to publish and that handled generating the email message from the file contents.
The final piece was publicizing the latest build status. To get the most out of our build system, we wanted to make the results very visible so that everyone would be on the same page, and we also wanted something that was both fun and cool. In the end, we decided on lava lamps. We bought some X10 home automation equipment, two lava lamps, and wrote a bit of Java code to control the lamps based on the build result. Now when the build is broken, the red lamp is on; when the build is passing, the green lamp is on. The result is undeniably cool and surprisingly effective. We're all very motivated to keep the lamp green.
With this last piece in place, I finished what I set out to do: integrate our test results from Agitator in the CruiseControl report in such a way as to feel completely natural for our workflow. This should be the goal of any such effort: Make the tool conform to your process, not your process to the tool. The advantage of using a flexible framework like CruiseControl is that you get the benefit of the custom fit, but your effort is spent only in customization, not in reinventing the wheel.