More Reliable Software Faster and Cheaper

How Software Reliability Engineering Can Help Testers

no longer were in the position of the end-of-process dumping ground.

I will illustrate the SRE process with Fone Follower, an example adapted from an actual project at AT&T. I have changed the name and certain details to keep the explanation simple and protect proprietary data. Subscribers to Fone Follower call and enter, as a function of time, the phone numbers to which they want to forward their calls. Fone Follower forwards a subscriber's incoming calls (voice or fax) from the network according to the program the subscriber entered. Incomplete voice calls go to the subscriber's pager (if the subscriber has one) and then, if unanswered, to voice mail. If the subscriber does not have a pager, incomplete voice calls go directly to voice mail.

List Associated Systems
The first activity is to list all the systems associated with the product that for various reasons must be tested independently. These are generally of two types:

  1. base product and variations
  2. supersystems

Variations are versions of the base product that you design for different environments. For example, you may design a product for both Windows and Macintosh platforms. Supersystems are combinations of the base product or variations with other systems, where customers view the reliability or availability of the base product or variation as that of the combination.

Implement Operational Profiles
An operation is a major system logical task, which returns control to the system when complete. Some illustrations from Fone Follower are Phone number entry, Process fax call, and Audit a section of the phone number database. An operational profile is a complete set of operations with their probabilities of occurrence. Table 1 shows an illustration of an operational profile from Fone Follower.

When implementing SRE for the first time, some software practitioners are initially concerned about possible difficulties in determining occurrence rates. Experience indicates that this is usually not a difficult problem. Software practitioners are often not aware of all the use data that exists, as it is typically in the business side of the house. Occurrence rate data is often available or can be derived from a previous release or similar system. New products are not usually approved for development unless a business case study has been made, and this must typically estimate occurrence rates for the use of various functions to demonstrate profitability. One can collect data from the field, and if all else fails, one can usually make reasonable estimates of expected occurrence rates. In any case, even if there are errors in estimating occurrence rates, the advantage of having an operational profile far outweighs not having one at all.

Table 1. Fone Follower Operational Profile

Once you have developed the operational profile, you can employ it, along with criticality information, to allocate unit test resources among modules to cut schedules and costs. But its main use is in the system test phase, as we will see shortly.

Define "Just Right" Reliability
To define the "just right" level of reliability for a product, you must first interpret exactly what "failure" means for the product. Note that a failure is any departure of system behavior in execution from user needs; it is NOT a fault or a bug, which is a defect in system implementation that causes the failure when executed.

The second step in defining the "just right" level of reliability is to choose a common measure for all failure intensities. A failure intensity is simply the number of failures per natural or time unit. A natural unit is a unit other than time that is related to the amount of processing performed by a software-based product,

About the author

John D. Musa's picture John D. Musa

John D. Musa is one of the creators of the field of software reliability engineering (SRE) and is widely recognized as the leader in reducing it to practice. He currently teaches a two-day course, More Reliable Software Faster and Cheaper, worldwide to organizations who want to deploy the SRE practice. He also consults with a wide variety of clients. He is principal author of the widely-acclaimed pioneering book Software Reliability and author of the practically-oriented Software Reliability Engineering. Elected IEEE Fellow in 1986 for his many seminal contributions, he was recognized in 1992 as the leading contributor to testing technology. His leadership has been recognized by every edition of Who's Who in America since 1990 and by American Men and Women of Science. He has more than 30 years of diversified practical experience as software practitioner and manager. He has published more than 100 papers and given more than 200 major presentations. You can reach him at

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, is the place to go for what is happening in software development and delivery.  Join the conversation now!