Every tester and test manager with a few years’ experience lives with the memory of at least one important bug her testing didn’t find.
Two That Got Away
I think especially about two particular bugs that slipped through testing I was responsible for. One, from the first large-scale systems integration test I managed, made it to production—at least in pilot. My client was a retail company implementing a new reward-points scheme for frequent shoppers. My test team had done an extensive test with every scenario we could think of to try and induce incorrect calculations of customer points and points balances. The bugs we found were fixed and retested, the project declared a great success, and the systems deployed to production.
Everything went fine with the first group of stores to go online. Customers were awarded points for purchases as expected, and their points balances were accurately incremented with new points and decremented for the products they returned and the points they redeemed for reward purchases. But as more stores joined the pilot and data traffic increased between the stores and the central office points system, some wrongly calculated customer balances began to show up. Luckily, the errors favored the customer, so the retailer didn’t have regulatory issues from ripping off customers. Still, it was a nasty job to find and fix the problem and then clean up the data throughout the integrated production systems—particularly the aggregated data in the financial systems, where customer reward points were accounted for as a liability.
Could we have found this problem if we’d tested for a long enough time with sufficient data volumes? Perhaps. My customer, the retailer, hadn’t wanted to spend either money or time on load testing. But even though production data volumes exposed the problem, I have always suspected that we wouldn’t have found it if we had done a traditional load test. The performance test team would not have been looking for functional bugs and, unless the system had actually failed, they likely wouldn’t have found any.
The other bug I wonder about was, in fact, detected during performance testing—but purely by accident. My team (again on the vendor side) had completed functional testing of a new custom-developed bank teller application. Then the bank’s own performance test team did a load test, with scripts based on scenarios provided by the project business analysts. One of the BAs was in the room as the automated load scripts ran, and she happened to notice a display of numbers different from those she expected at that step. To everyone’s horror, the teller system had returned details of Customer B’s account in response to an account query for Customer A.
It was a showstopper. Without question, the bank had to delay the planned system implementation till the bug was fixed. It took time to pinpoint the bug. The prime suspect was a caching error, but no one could find one in the teller application. Nor could anyone reproduce the problem, in spite of rerunning the load scripts severally and in different combinations. Eventually, a clever programmer hypothesized and subsequently proved that the caching bug was actually in an infrastructure system underlying our custom application. Though that system was in widespread production use in many business domains, the problem had never before been reported.
I’ve thought long and hard about what those bugs mean for my test strategies. (I wonder, too, about the other ones that got away that I don’t know about.)
How can we find the functional issues that manifest only under load?