We all want our software projects to be successful. Like well-planned journeys, we want them to stay on schedule, meet our expectations, remain relatively problem free, and lead us to our intended destination.
Our project crew might have the right mix of people (skillful developers, solid testers, and a good management team), working on what appears to be a reasonable schedule, toward the desired combination of features--but the difference between arriving in Albuquerque and arriving in Algiers is the use of some metrics along the way to help us keep our project on target.
I want to share with you some ways to get the most out of the numbers you track, so you can fully understand where the project is and where it needs to go.
In order to get value from the metrics we use, they must be meaningful. What do I mean by meaningful software metrics? I define them as metrics that effectively
That last item is especially important. We analyze a lot of numbers and produce reports to share with members of the team, as well as other people in our company. Will they interpret our metrics in the same way that those of us close to the project do? Will they ask the right questions or make the right assumptions?
I find that the software metrics I track are most useful in two ways: (1) Investigation and (2) Control and Management. I discover new things about the software and look for trends with which I am familiar to help me make decisions that help the team reach its project goals.
When I use metrics for investigation, I discover things about the project-what is going on, and why. I can slice up the numbers in various ways to get different perspectives on what is being coded and tested, and how well. Throughout the project lifecycle I investigate its state and progress through my use of open bug reports, find rates, etc.
Calculating metrics for discovery is especially useful during the beginning and middle phases of the project, and it can also be useful during the final phases to help me see that things are on track. I use them to investigate how the software is progressing-its current state and trends. It's not enough to just run the numbers; I must analyze them and find the meaning behind what I see. Once I understand the context, I want to convey that context to the people who will view the metrics. They need that framework so they can understand the project's state as well, and make reasonable decisions or reach reasonable conclusions.
Let's look at a common metric: the bug find rate. This is usually calculated as the number of new bugs found each day or in each build. Find rate reports are widely used to track a number of things. We can see how quickly bugs are being found and where. During our investigation of the bug find rates we can slice them by severity, component, and type. They can also be tracked over time by build and by date. Each time we look at a different aspect of the find rate, we need to ask a number of questions.
For example, look at Figure 1, a graph of the find rate for the DiamondX software project. Here we see a find rate trend for an entire project over time. When you look at this chart, what questions come to mind? Do we know enough about this project to come to any useful conclusions about what is significant? If you are the QA Manager who created this chart, you might have a pretty good idea about the find rate in the context of what you know about the project.
There may be some things about specific numbers, however, for which you'll need more information. When I run metric reports, I go through a series of questions to help me come to the most correct conclusion about what I see happening. (For a sample list, see the Metrics Analysis Questions sidebar.)
I may even find that I need more information in order to answer the questions. Figure 1, for example, seems to show a downturn in the project's bug find rate. That's an assumption that I want to test. What could explain the movement in the graph? Does it mean that the software is improving? To help me figure out the answer I might ask the following five questions:
1. Has there been a shift in QA personnel that could account for fewer bugs being found in the last week?
2. What kinds of tests are being run?
3. Have there been significant changes in the code that is being developed?
4. Is all of the software really stabilizing, or just part of it?
5. What kinds of bugs are being found?
Figure 2, a stacked area graph, shows the same information that we saw in Figure 1 (the total daily find rate), but breaks that information down into the four project components (for this example, named ARB, PRN, SLV, and TRM). Here we see the PRN component's find rate is slowing, while the rates for the other three components are fairly steady. That is, on 10/28 eight bugs were found for the PRN, while only two were found on 11/8.
Why the change? When we do our analysis, we might find that this is due to test issues, development issues, or a combination of both. Without doing more analysis, we could easily reach an incorrect conclusion. By doing a little more investigation we should be able to get a clear picture of the meaning behind the metric.
From the testing side, the differences may have to do with the types of tests being conducted during this two-week period, or with testing personnel issues. If the PRN component has an automated test suite that is run every day, and the tests don't change, this could be an explanation-because we are not testing anything new. If true, would additional tests turn up more bugs? Another explanation could be that one of the testers on the PRN component had an emergency appendectomy on November 1, meaning that the component is now under-tested. What about the allocation of testers per component? There could be several testers on PRN, and only one tester each on SLV and TRM. Would moving a tester from PRN to SLV and/or TRM make a difference? If you move personnel around, the graph may change; knowing that will help you better interpret the numbers.
On the development side, it may be that the PRN component is less complex than the others. Maybe the code was finished and tested before the other components, with most of its bugs fixed early in October. If this is the case, the trend rate for PRN actually shows the software stabilizing-but also shows there is still work to do on the others to get them in the same shape.
If we only looked at the find rate for the whole project (Figure 1) without seeing it broken down by component, we would see the find rate for the whole project beginning to decrease. But we might have missed a significant message: that not all of the software is progressing at the same rate. Once we know that, we can find out why. this is one example of how understanding what's affecting the numbers can help us manage the project going forward.
Once I have discovered the meaning behind the numbers, it would be prudent to add context information to the graph so the information's significance is clear to anyone reading the chart. Then they don't have to guess, and risk coming to a different conclusion. Changes in the trend lines and absolute calculations should be annotated to show the events that affected them (a note, for example, that one of the two testers on the PRN component has been out since November 1, or that personnel were redeployed to different parts of the project).
Control and Management
Calculating metrics for control helps me manage the project direction: Is the project where I think it ought to be? Throughout the project I calculate metrics to help me understand how it's progressing. Once I have found good interpretations of the metrics I'm using, they help me to steer the project where I want it to go.
Let's look at another example of how I can interpret metrics in ways that help to manage and control the project. In the last phases of a project I begin to look for familiar patterns in the metrics, which help me understand if the project is on track. For example, I expect the find rate to begin to go down over time, approaching zero. When I see this trend I would like to assume that the software is stabilizing and improving.
But this may or may not be the case. Why not? What could cause the trend to go down and yet not confirm that the software is improving?
Many of us have found that the trends of open bugs, bug find rates, and bug fix rates in the last phases of a project help project managers to discover important trends in the progression of the project. They are usually calculated by build or by date, and can be sliced again by component, severity, and type. In the example below, I use the following three equations:
open bugs = [(current open bugs + daily find rate) - daily fix rate] (Note: "Current open bugs" is the total number of bugs open at the start of each day.)
bug find rate = total number of new bugs found each day of testing
bug fix rate = total number of bugs Development declares are fixed each day (Note: Once bug fixing is in full swing, a moving average of bugs fixed can be useful too.)
By looking at all three of these measures together, we can see how we are progressing toward our goal of zero open bugs. For example, at 9 a.m. today the bug tracking system told me that there was a total of 90 open bugs (current open bugs). By 6 p.m., 16 new bugs have been added to the bug tracking system, and developers have declared that they have fixed 12 bugs today. Therefore, the open bug calculation is: [(90 + 16) - 12] = 94. So tomorrow we'll use 94 as the number in our calculation for "current open bugs."
I already have a mental picture of patterns to use as a baseline in what I want to achieve for each metric. The daily number of open bugs and bugs found should begin to edge toward zero as the project nears its code freeze and ship dates. The daily bug fix rate should show up as a fairly steady line that is close to horizontal once coding is complete and developers turn their full attention to fixing bugs. Figure 3, the FrameLenz Final Phase chart, shows us ideal trends for the project's last days. This is the expected mental picture-what I want to see in order to ship FrameLenz in February.
(Some definitions are important to understand the baseline in Figure 3. The goal for the FrameLenz project is to have zero open bugs when it ships. When this chart was created it may have been counting only critical bugs-the only ones we intend to fix before shipping. Because we need to be consistent in the things we measure, the open bug rate and the find rate should both consist of only critical bugs. The fix rate would reflect only the effort to fix those bugs. If the bugs being counted included all severities, would the ideal trend lines look different?)
In this ideal baseline we see that by January 26 the open bug number is lower than the fix rate; we are fixing bugs faster than we are finding them. According to our baseline trends in Figure 3, we should be on the right track to shipping FrameLenz on time, right?
Actually, we don't know at this point.
We could say it is likely. But we don't know whether the find rate will get to zero and all of the open bugs will be fixed so we can ship on time. However, this is only the ideal goal, so it's okay for our ideal baseline trend to incline toward zero. I should note here that this example uses a business application for which it's neither realistic nor practical to expect that there will be zero bugs in the software when it ships. Even though that is our goal, our application isn't a life-critical software application, which will often have a higher standard for quality. (Good thing!)
Figure 4is the actual graph of the open bugs, find rates, and fix rates for FrameLenz-and it doesn't match the ideal in Figure 3. The fix rate lags behind the find rate. That is, Figure 4 shows that testers are finding bugs faster than developers are fixing them. Since there were open bugs already in the queue that needed to be fixed, it seems that the team isn't fixing bugs fast enough.
Is this true? Or are testers still finding too many bugs this late in the project? On January 26, the number of open bugs remaining is 39. If we find no more bugs, those 39 can be fixed in six days, based on an average fix rate of 7.5 bugs per day. That's perfect, right? Suppose we continue to find bugs every day (that is, the find rate goes up or stays the same)? Or maybe a particularly difficult problem will take all of a developer's time for several days to fix (meaning the fix rate goes down)? The open bug count will still not reach zero.
Clearly we need to do something to help get our project back on track. What are the questions we could ask that would help us understand the current trends? Before deciding on a course of action, we should go back into investigation mode to make sure we understand what the numbers are telling us about the current situation. What can we change to effect positive results in the project (and, therefore, the trends) so that they begin to look more like our ideal in Figure 3?
In our example, let's assume one of our project goals was to fix all critical bugs in the software before shipping it. Given our data in Figure 4, we have a number of options, centered around (a) raising the fix rate, (b) lowering the find rate, and (c) changing the ship date.
Raise the fix rate, by enlisting more developers to help fix bugs. Be careful with this one. Additional developers must be able to come up to speed quickly. If not, you may be adding more new bugs and not really increasing your fix rate enough. Perhaps the FrameLenz developers are only working part-time to fix bugs, spending much of their time coding new features for the next release. In this case, reassigning them to bug fixing full time may do the trick.
Lower the find rate, thereby lowering the open bug rate.
Change the ship date. Shipping on time might be risky. The project needs more time to get to the desired level of quality. If bug triage is already in place and the bugs calculated in Figure 4 are the ones we must fix to meet our project goal, we run the risk of shipping critical bugs in our software.
These options involve time, resources, and product. A change in one of them will affect the other two. Once one of the options is chosen, it will have an impact on the trend lines going forward-and the change may be quite dramatic. Once again, it's prudent to make some notation on the published graphs so the audience can understand the numbers in context.
Finding the Meaning
In our FrameLenz and DiamondX examples, we have seen that the trends in even the simplest metrics can be more complex once we understand what is going on behind the numbers. Those projects' charts raised questions in our minds about how what we saw matched what was actually occurring on the project.
Meaningful metrics help bridge that gap between perception and reality. They can clearly show us the current state and trends in the software, and whether our project is following its flight plan. But for our metrics graphs to be effective, we should analyze the reasons behind the numbers we see, and add that context before we distribute them. (And remember that just because we can calculate certain metrics doesn't mean that all of them are useful in all situations.) It's important to pick and choose the methods that give a complete picture, and that suggest the most useful plan of action to get us to our destination. STQE