More Reliable Software Faster and Cheaper

[article]
How Software Reliability Engineering Can Help Testers

will be used in Guide Test.

Guide Test
The last activity involves guiding the product's system test phase and release. For software that you develop, track reliability growth as you attempt to remove faults. Then you certify the supersystems, which simply involves accepting or rejecting the software in question. You also use certification test for any software that you expect customers will acceptance test.

To track reliability growth, input failure data that you collect in Execute Test to a reliability estimation program such as CASRE (for information, access

http://members.aol.com/johndmusa/CASRE.htm). Normalize the data by multiplying by the failure intensity objective in the same units. Execute this program periodically and plot the FI/FIO ratio as shown in Figure 4 for Fone Follower. If you observe a significant upward trend in this ratio, you should determine and correct the causes. The most common causes are system evolution, which may indicate poor change control, and changes in test selection probability with time, which may indicate a poor test process.

If you find you are close to your scheduled test completion date but have an FI/FIO ratio substantially greater than 0.5, you have three feasible options: defer some features or operations, rebalance your major quality characteristic objectives, or increase work hours for your organization. When the FI/FIO ratio reaches 0.5, you should consider release as long as essential documentation is complete and you have resolved outstanding high severity failures (you have removed the faults causing them).

Figure 4. Plot of FI/FIO Ratio for Fone Follower

For certification test, you first normalize failure data by multiplying by the failure intensity objective. The unit "Mcalls" is millions of calls. Plot each new failure as it occurs on a reliability demonstration chart as shown in Figure 5. Note that the first two failures fall in the Continue region. This means that there is not enough data to reach an accept or reject decision. The third failure falls in the Accept region, which indicates that you can accept the software, subject to the levels of risk associated with the chart you are using.

Figure 5. Reliability Demonstration Chart Applied to Fone Follower

A Proven, Standard, Widespread Best Practice
Software reliability engineering is a proven, standard, widespread best practice. As one example of the proven benefit of SRE, AT&T applied SRE to two different releases of a switching system, International Definity PBX. Customer-reported problems decreased by a factor of ten, the system test interval decreased by a factor of two, and total development time decreased thirty percent. No serious service outages occurred in two years of deployment of thousands of systems in the field.

SRE has been an AT&T Best Current Practice since May 1991. McGraw-Hill published an SRE handbook in 1996. SRE has been a standard of the American Institute of Aeronautics and Astronautics since 1993, and IEEE standards are currently under development. There have been more than fifty published articles by users of SRE (see my website  http://members.aol.com/johndmusa), and the number continues to grow. Since practitioners do not generally publish very frequently, the actual number of users is probably many times the above number.

James Tierney, in a keynote speech at the 8th International Symposium on Software Reliability Engineering, reported the results of a late 1997 survey that showed that Microsoft had applied software reliability engineering in fifty percent of its software development groups, including projects such as Windows and Word. The benefits they observed were increased test coverage, improved estimates of amount of test required, useful metrics that helped them establish ship criteria, and improved specification reviews.

SRE is highly correlated with attaining Levels 4

About the author

John D. Musa's picture John D. Musa

John D. Musa is one of the creators of the field of software reliability engineering (SRE) and is widely recognized as the leader in reducing it to practice. He currently teaches a two-day course, More Reliable Software Faster and Cheaper, worldwide to organizations who want to deploy the SRE practice. He also consults with a wide variety of clients. He is principal author of the widely-acclaimed pioneering book Software Reliability and author of the practically-oriented Software Reliability Engineering. Elected IEEE Fellow in 1986 for his many seminal contributions, he was recognized in 1992 as the leading contributor to testing technology. His leadership has been recognized by every edition of Who's Who in America since 1990 and by American Men and Women of Science. He has more than 30 years of diversified practical experience as software practitioner and manager. He has published more than 100 papers and given more than 200 major presentations. You can reach him at j.musa@ieee.org.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

May 04
May 04
May 04
Jun 01