We were a small dot-com, scrambling to get our newest Web marketplace ready for release in just over a week. In the two months since I had started there as the sole tester on staff, I had reviewed the site design, written a usage-based testing model, and begun testing on my own. The pressure was building toward "Go Live" and it was becoming clear that, on my own, I couldn’t test the product thoroughly before release. Nine days before deployment, at the Friday staff meeting, Greg, the CTO, turned to me and said, “Dude, it's all up to you now. The people you see in this room are your test team. We're all going to test this thing together for this final week. We need you to tell us when, where, and how." In the room at that time was pretty much the entire company: six developers, a database expert, two interface designers, a network engineer, the receptionist, the office manager, Greg, and me. My test team. Our mission—to find and fix as many bugs as possible, in one week.
After the meeting, I huddled with Dave, the development lead, and in about an hour we created the plan of attack. Testing would start on Monday. One developer was designated the "EMT" (Emergency Medical Technician), the guy who would restore the system after the testers had crashed it, and who would fix any bugs that were stopping people from testing. The rest of the team was assigned to test Monday through Wednesday, three straight days (we ended up rolling one or two developers off testing after two days to help fix bugs). We would spend Thursday fixing bugs, then take Friday and Saturday (weekend gone out the window) to check bug fixes and finalize the build. Sunday we would deploy. Monday we would Go Live.
With such an aggressive schedule, how could we organize the testing effort quickly, and be sure we covered the whole product without duplicating effort? Dave and I decided we would use the set of sixty or so use cases I already had created, and get all the testers to run each one.
These use cases, the way I wrote them, were not the same as test scripts. They did not specify which buttons to click in which sequence or what the exact expected output should be from a specified series of inputs. Instead, they described, using generalizations, the way the system was designed to behave, under certain broad conditions. Each one captured a distinct aspect of system functionality and gave the minimum information required to exercise it.
For example, Figure 1 shows what a use case of this type for "checking Web email" might look like.
Use cases become tests by turning them into this question, where the title of the use case fills in the blank: "How can I be sure that a user who is ______________ [in this case, checking Web email] will do so successfully?" Two testers may test from a use case using very different thought processes. Depending on their individual experience, they will look for and find different bugs. A tester with training in Web interface design, for instance, may test while thinking about style sheets, tables, graphics, and controls, looking for mismatching colors, graphic elements that resize improperly, or dead links. Someone with training in developing the Java code behind the email delivery may have a window open to the database where the email is stored, to check the character-by-character formatting of the message, or to see how large an email can fit in the display window. Or they may monitor server load during a large file download. All of these are legitimate tests, and they all fall under the use case "checking Web email."
Most test teams assign different tasks to every tester in order to reduce redundant test effort. We felt that we had a group of people diverse enough, and a set of tests thorough yet generic enough, that we could assign everyone the same tasks, and the differences in the testers themselves would manifest as different tests ran, eliminating duplicate effort.
I asked the testers to start at one point or another in the list of use cases, loop through the whole list, and finish only when they had tested each use case in at least a couple of different ways. If there was more time for testing, I asked them appeared to be an especially promising hunting ground for bugs, and keep looking. Once they found a bug, they were to write up a bug report in our bug tracking tool (which I demonstrated for those who had never used it before, particularly the network guy and the office staff). Even Greg, our CTO, setting a great example, tested with everyone else and found a couple of wonderful, subtle bugs.
Everyone used the same set of materials to guide their tests, so they could kibitz back and forth: "Hey, have you done test 40 yet? What did you try?" "Come here and look at this. Is this right?" The team loved the high morale and camaraderie. The developers tested aspects of the system that they had not seen and had only heard blasphemous rumors about, and said they enjoyed learning the whole system and seeing how it fit together as a business process. Plus, they razzed each other when they found errors in each other's code. Everyone kept me very busy answering questions and helping to investigate and write bug reports on hard-to-describe or hard-to-reproduce bugs.
As Dave and I had hoped, the errors people found reflected their training and experience. The developers found bugs like incorrect system calls, situations where state got out of whack, performance bottlenecks, and memory leaks. (They also found the incentive to make several testability upgrades to the system, from which we benefited on many later projects.) The graphics people discovered dead links, incorrectly applied style sheets, and error messages with improper formatting. And the office manager and the receptionist found spelling errors, usability problems, and design flaws that novices see and experts don't.
The team wrote more than 300 bug reports in three days of testing. Only 10% of them were duplicates. We fixed all the high-priority ones and re-tested the fixes, and on Sunday, as scheduled, we deployed the site for a happy customer. Everyone felt that the product had been thoroughly shaken out; and the Deployment and Go-Live process, while not buttery-smooth, was successful. We looked back and were astonished that, by working in unison, we had accomplished so much good testing in so little time—and had so much fun doing it!
I am convinced that what made the testing work, in terms of avoiding duplicate bug reports (and by inference, avoiding duplicate testing effort), was the generic nature of the tests. The use cases functioned as charters for the testers, constraining their attention to a particular aspect of the system, without restricting their freedom to exercise it as cruelly as their imaginations allowed. Each member of the team brought a certain level of interest, background, and creativity to the tests, and so each one performed a different series of activities, and noticed different responses from the system. Under these circumstances, if you think about it, there was really very little chance that anyone would find the same bugs as anyone else.
If I had to do it over again, I would pair the testers up and encourage them to work in teams. This would allow the testers to keep better testing notes, and let me track their work more closely. Also, paired testing, in my experience and in others', is more than twice as effective at finding bugs as two testers working alone. Two minds together approach a problem with a synergistic power that transcends that available to two minds working apart.
If you haven't tried testing with broad charters, with a variety of testers running all the charters independently or in pairs, try it out. You might be pleasantly surprised to find that your team's built-in variations in testing styles can generate the coverage you need with less effort, more creativity, and more fun.