As I reached for the coffeepot, I noticed a carefully printed sign taped to the cabinet. It read:
- Don't press the Brew button more than once.
- Please make sure the urn is aligned under the brew basket.
- Please ensure the urn is empty before brewing.
The sign struck me because it was such an elegant description of the known risks involved in making coffee. Clearly this organization had experienced coffee disasters in the past. Someone had compiled a list of the most common causes of brew failures and created a checklist of tests. What a wonderful set of guidelines!
Part of the power of such a checklist is its brevity. The author of the coffee-making checklist could have also added items such as "make sure there's coffee in the basket" and "remember to use a filter," but then the most critical items would be lost in the verbosity. It's obvious to even a coffee-making novice that coffee is an essential part of the process. So it seemed to me as I was reading this carefully crafted checklist that the point wasn't to catch every possible error. Instead, I think the author probably thought carefully about:
- What seems to go wrong most often?
- What errors are difficult to see at first glance, and thus require concentration to prevent?
- What causes the most damage when it happens?
I also noticed that the items the author chose to address involved subtleties of the coffee maker interface and the interaction between the urn and the coffee maker. These weren't obvious errors. They're "gotchas": things you learn about this particular coffee maker only after painful experience. They were eloquently worded. They also work. During my weeklong visit at this organization, I didn't see a single coffee brew failure. This was an organization that knew how to learn from mistakes.
Of course, it's easier to create such a checklist for a comparatively simple mechanical process, like making a pot of coffee, than it is for the complex process of building software. It takes much more care to construct similar guidelines for software. Yet it's a worthwhile exercise. By identifying the top five or so things that go wrong in various activities, we might just prevent the most common errors and save everyone a lot of time.A checklist for beginning a test might look like this:
Important! Before you begin your test:
- Make sure all delivered files have the correct version.
- Set up your test environment to emulate the real-world environment as closely as possible.
- Ensure the system is in a known state before you begin testing.
Now there's plenty that you might add to this list. Adding a few things specific to your environment would be a good idea. But remember the power of brevity. The more items you add to a list, the more likely someone using the list will inadvertently skip over items, inviting disaster.
When constructing a list like this, try filling in the blanks in these sentences:
- If anything is going to go wrong here, it's most likely ________________
- The top three most damaging kinds of failures would be: 1) ____ 2) _____ 3) ______
- The three most common causes of failures are: 1) ____ 2) _____ 3) ______
The answers to these questions form the basis for your checklist. Let's see how this would apply to the process of releasing software after it has been tested.
- If anything is going to go wrong here, it's most likely: the software that ships isn't the software that was tested.
The top three most damaging kinds of failures would be:
- Shipping a virus.
- Failing to install and load.
- Breaking other software (e.g., the operating system) on install or uninstall.
The three most common causes of failures are:
- Not verifying that the final release matches the last build tested.
- Failing to virus-check the final release.
- Untested configurations.
In the end, my checklist for releasing software looks like this:
Important! Before releasing software, please:
- Compare the last tested build against the release build to verify that they match exactly.
- Run at least one, preferably two, up-to-date virus checkers on the final release.
- Install, launch, and uninstall the final release on all supported operating systems.
Notice that all the items on this list are specific actions. It's tempting to include general guidelines such as "ensure the software does not corrupt data" on the list. Yet doing so would weaken the list. This is a prerelease checklist, intended to tame last-minute release chaos. It's not an all-encompassing list of things to test. Presumably someone tested the software before we decided to ship it.
Finally, it's important to note that different organizations will have different lists. Your checklists will reflect your requirements, your software's quirks, and your organization's historical weaknesses. Your context is unique; your checklists should be too.
Developing software is certainly more complex than brewing coffee, but both require attention to detail in order to avoid large messes. Just as the organization I visited learned from its coffee-brewing fiascos, all of us can use our past experience with failures to prevent trouble from brewing in the future.