DevOps: Changing the Software Testing Game


The DevOps movement promises to be as influential as project management and good requirements ever were in programming and testing. By combining ops (deploys, monitoring) with programming (automated builds, automated virtual servers) and testing (risk management), we get something that is more than the sum of its parts.

In software, our specialties, like testing or programming, may be the surface game, but usually there are other games going on. If what we need to build is vague or not thought through, the game suddenly gets a lot harder. For years, I’ve thought of project management and requirements as the meta-game of software delivery. The “shift-left” push seems laser-focused on setting things up to prevent rework in a meaningful way.

Meanwhile, another small revolution has been happening in software. It’s the DevOps movement, and it promises to be as influential as—possibly more influential than—project management and good requirements ever were in programming and testing.

Testing like It’s 1999

At the turn of the century, we had one chance to “get it right” when creating software, which was burned on a physical disk and shipped in the mail. Get it wrong, and your company would end up on the front page of the newspaper—which people still actually read—until new disks came in the mail, which could take a week or more.

Because mean time to recovery (MTTR) was so long and so expensive, companies had to focus on the defensive side by increasing mean time between failures (MTBF). Extreme Programming, refactoring, and test-driven development hadn’t been invented yet, so for the most part, testers got bad code and had many cycles of test-report-retest before release.

Over the next decade we lost the physical disks and moved to the web, which meant a fix could take a day or two. More often than not the holdup for releasing a fix was testing, which was really a tradeoff of risk and time. If we calculate overall risk as risk exposure (badness) multiplied by the length of the time the defect was on the website, then reducing MTTR also reduces risk, arguably decreasing the value of testing.

By the time the web was mainstream and people were using social media, we began to hear about the “Facebook effect”—simply throwing features out that might not even work and letting customers sort it out. This happened particularly with software that was free for users. Gmail, for example, was in beta for its first eighteen months and incredibly buggy, yet MTTR was so low that few people noticed. Most of the time the problem would be fixed before you saw it, and if you saw the problem, it would be fixed the next time you checked mail.

The test community, of course, laughed along. “That will never work at my bank,” we said, or “at my insurance company,” or “for software people pay real money for.”

Then along came DevOps, which pushed MTTR from a couple of weeks in Scrum to mere hours or minutes.

The Swiss Cheese Model

Imagine risks on a software project are gaps, like holes in a block of Swiss cheese. What we tried to do in the nineties was have one release, protected by one process—a long, painful test cycle. It’s not quite a law of science that as we try to improve test coverage, the cost of narrowing the gap by half doubles, but something like that does happen on software projects. We try to make cheese with no holes in it, but covering those gaps gets more expensive over time, which leads to longer test cycles and slower releases. Believe it or not, I once worked on a software project that “improved the process” by changing the ship period from monthly to quarterly!

Now imagine a different approach to risk. Instead of reducing MTBF with testing, we try a variety of techniques, like hard examples up front, test-driven development, code craft, isolating code into components, and an early release to a beta group. We also get better at detecting problems earlier, and focus on MTTR with feature flags that are easy to turn on or off, quick deploys, and so on.

Each method is a slice of cheese with different holes. With enough layers, we get the protection of classic test approaches with much less cost—plus we can wean off classic regression testing. Testing still continues all the time, for both new development and other risks, but now it can be a measured investment based on emergent risks, in production or not. By combining ops (deploys, monitoring) with programming (automated builds, automated virtual servers) and testing (risk management), we get something that is more than the sum of its parts.

A Future Together

I recently had a conversation with a director of test who was concerned that DevOps would “disrupt” testing the same way the web disrupted print media. His instinct was to dig in his heels and fight.

My advice was quite the opposite. Don’t fight. Lead the charge. DevOps success requires a series of components that integrate a bit like puzzle pieces, from on-demand virtual servers by branch to hooking automated checks into the build system, feature flags, staged rollouts, and continuous monitoring. Help the organization realize how to manage all of this (from planning to execution and metrics) and you’ll never have to worry about having a job.

While the emphasis of the testing organization might change a bit, it will be growing to include build management and manage crowdsourced testing, infrastructure, and support. That’s not a bad thing. Don’t get me wrong; I’m still all in on software testing. Someone still needs to eat the dots and avoid the ghosts. It’s just that the game can get a whole lot easier.

Let’s play together.

User Comments

1 comment
Daniel Albuschat's picture

It's very important to distinguish two kinds of bugs: Bigs that make stuff simply "not work" until you fix them and bugs that break things in an irrecoverable way, such as deleting data or making data inconsistent. We still need good QA processes to catch bugs of the latter kind. I find it important to mention this because otherwise the reader could get the impression that the goal is to do less and less testing altogether. But the reality is that it just needs to better align with the risks.

August 21, 2016 - 4:32am

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.