Software Measurement Programs

Leveraging metrics basics into a functional measurement plan
Better Software Magazine
Volume-Issue: 
1999-03
Summary:

A metrics program is any planned activity in which you use measurement to meet some specific goal. If you do not have a clear technical goal for a metrics program, then you are almost certainly not ready for such a program. Here's how to design a measurement program that leads to decisions and actions.

If your organization makes a conscious effort to record information about its software defects, then you may be pleasantly surprised to know that you already have in place the basics of a metrics program. However, it is likely that you are either not making much use of the information or-even worse-getting misleading results from it. First, let's take a look at the use of fault data in data-collection programs.
Starting Point: Distinguishing Between Faults and Failures

Typically, you might see the kind of data shown in the first two or four columns of Table 1, based on a real sample of modules from a major system.

In this case, the development organization was quite rigorous in its approach to recording defects. Every defect discovered during independent testing and in operation was traced to a specific software module. The organization wanted to identify problem modules out of the hundreds in total. The raw defects data (column 2) suggests that modules Q and L are the problem modules. In many cases this is as far as your data will allow you to go. Yet looking a bit deeper reveals a very different story. First, recovering the module size data in thousands of lines of code (KLOC) and taking it into account (columns 3 and 4) immediately explains the problem with module Q. It's big. We might now conclude that A and L are the problem modules because they have the most defects per line of code. However, when the defects are split between those that were discovered by testers pre-release (column 5) and those that were the cause of customer-reported problems post-release, then the picture is completely different. The problem modules post-release are actually C and P.

Now most people who collect fault data prior to release are really interested in using it to predict the number of post-release failures. With our assumptions, Table 1 highlights just how bad the pre-release fault data is at predicting post-release failures at the module level. Generally, there is now very good empirical evidence that pre-release faults are a bad predictor of post-release faults. Figure 1 shows the results of a recent study of a major telecommunication software system. The modules with most faults pre-release generally had very few post-release. Conversely, the genuinely failure-prone modules generally have low numbers of pre-release faults. For this system some 80% of pre-release faults occurred in modules which had NO post-release faults. Similar results have been observed for other systems.

A high number of pre-release faults may simply be explained by good testing (rather than poor quality), and a low number of post-release faults may simply be explained by low operational usage. In fact, the relationship between faults and failures is not at all straightforward. There is very strong empirical evidence that most failures experienced by software systems are caused by a very tiny proportion of the residual faults. Conversely, most residual faults are benign in the sense that they will very rarely lead to failures in operation. This is shown in Figure 2.

In 1983, Ed Adams of IBM published the results of an extended empirical study into the relationship between faults and failures in nine large systems over many years of operation. He found remarkably consistent results between the nine systems. For example, in each case around 34% of the known residual faults led to failures whose mean time to occurrence was over 5,000 years. In practical terms, such failures were probably only ever observed once by a single user (out of many thousands of users over several years). Conversely, the big faults-those which cause the frequent failures

About the author

Norman Fenton's picture
Norman Fenton

Norman Fenton is Professor of Computing Science at the Centre for Software Reliability, City University, London and also Managing Director of Agena Ltd, a company specializing in risk management for critical computer systems. He is a Chartered Engineer who previously held academic posts at University College Dublin, Oxford University and South Bank University where he was Director of the Centre for Systems and Software Engineering. He has been project manager and principal researcher in many major collaborative projects. His recent and current projects cover the areas of: software metrics; safety critical systems assessment; Bayesian nets for systems' assessment; software reliability tools. Professor Fenton has written several books on software metrics and related subjects.

Upcoming Events