More Reliable Software Faster and Cheaper

How Software Reliability Engineering Can Help Testers

no longer were in the position of the end-of-process dumping ground.

I will illustrate the SRE process with Fone Follower, an example adapted from an actual project at AT&T. I have changed the name and certain details to keep the explanation simple and protect proprietary data. Subscribers to Fone Follower call and enter, as a function of time, the phone numbers to which they want to forward their calls. Fone Follower forwards a subscriber's incoming calls (voice or fax) from the network according to the program the subscriber entered. Incomplete voice calls go to the subscriber's pager (if the subscriber has one) and then, if unanswered, to voice mail. If the subscriber does not have a pager, incomplete voice calls go directly to voice mail.

List Associated Systems
The first activity is to list all the systems associated with the product that for various reasons must be tested independently. These are generally of two types:

  1. base product and variations
  2. supersystems

Variations are versions of the base product that you design for different environments. For example, you may design a product for both Windows and Macintosh platforms. Supersystems are combinations of the base product or variations with other systems, where customers view the reliability or availability of the base product or variation as that of the combination.

Implement Operational Profiles
An operation is a major system logical task, which returns control to the system when complete. Some illustrations from Fone Follower are Phone number entry, Process fax call, and Audit a section of the phone number database. An operational profile is a complete set of operations with their probabilities of occurrence. Table 1 shows an illustration of an operational profile from Fone Follower.

When implementing SRE for the first time, some software practitioners are initially concerned about possible difficulties in determining occurrence rates. Experience indicates that this is usually not a difficult problem. Software practitioners are often not aware of all the use data that exists, as it is typically in the business side of the house. Occurrence rate data is often available or can be derived from a previous release or similar system. New products are not usually approved for development unless a business case study has been made, and this must typically estimate occurrence rates for the use of various functions to demonstrate profitability. One can collect data from the field, and if all else fails, one can usually make reasonable estimates of expected occurrence rates. In any case, even if there are errors in estimating occurrence rates, the advantage of having an operational profile far outweighs not having one at all.

Table 1. Fone Follower Operational Profile

Once you have developed the operational profile, you can employ it, along with criticality information, to allocate unit test resources among modules to cut schedules and costs. But its main use is in the system test phase, as we will see shortly.

Define "Just Right" Reliability
To define the "just right" level of reliability for a product, you must first interpret exactly what "failure" means for the product. Note that a failure is any departure of system behavior in execution from user needs; it is NOT a fault or a bug, which is a defect in system implementation that causes the failure when executed.

The second step in defining the "just right" level of reliability is to choose a common measure for all failure intensities. A failure intensity is simply the number of failures per natural or time unit. A natural unit is a unit other than time that is related to the amount of processing performed by a software-based product,

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.