There are as many was to measure a project as there are to build it. Unfortunately, many of these metrics are useless. Eric Ries calls them "vanity metrics" because they look good and make you feel good but offer little in the way of actionable value.
Whatever your feelings on metrics, at the end of the day, organizations will expect and want them. With the yardstick of "helping the team to self-reflect and improve" and the caveat "your mileage may vary," here are my four go-to metrics for an agile team, along with some experiences on their effectiveness.
Four Interlocking Team Measures
Why are there four? If you only measure one key metric, it is easy to get tunnel vision. Be it the teams focusing on just making the metric better (often through gaming the system) or management using the measure to drive all decisions, you can end up with a product or organization that looks good but is really driving off a cliff.
Likewise, with as many as ten metrics it is more likely that different parts of the organization will focus on different metrics, driving a wedge into the efforts to align the organization. Humans best handle three to five concepts at a time, so four main metrics seemed like the optimal dashboard.
Cycle time is your direct connection to productivity. The shorter the cycle time, the more things are getting done in a given timebox.
You measure this from when work starts to when the feature is done. In software terms, I tend to think of this as "hands on keyboard" time. Measuring cycle time is best done automatically via your agile lifecycle tool of choice, though even measuring with a physical task board will give you useful data.
This measure is the connection between customer satisfaction and the team. The lower the defect rate, the more satisfied the customer is likely to be with the product. With a high escaped defect rate, even the most awesome product is going to have a lot of unsatisfied customers.
You measure this by the number of problems (bugs, defects, etc.) found in the product once it has been delivered to the user. Until a story is done, it is still in process, so focus on the story's execution is preferable over tracking in-progress defects
This metric is a way to measure predictability. If a team commits to thirty stories and only delivers nine, the product owner has about a 30 percent chance of getting what they want. If, on the other hand, the team commits to ten stories and delivers nine, the PO has roughly a 90 percent chance of getting what they want.
Measuring is a simple exercise of documenting how much work the team commits to doing at the start of the sprint versus how much they have completed at the end of the sprint.
This is the team "health" metric. It creates awareness that puts the other three metrics into better context. If all the other metrics are perfect and happiness is low, then the team is probably getting burned out, fast.
Build this into your sprint retrospectives. Open every retrospective with the team writing their happiness scores on whatever scale you choose. Track these numbers from sprint to sprint to see the trends.
Why These Metrics?
Cycle time and escaped defects are highly quantifiable and well understood across industries. Smaller numbers mean you are delivering a higher quality product, faster. I originally added the planned-to-done ratio primarily because it was something the teams could have an immediate and real impact on, so this fulfilled the "early wins" idea. It becomes useful long term in mapping predictability, which helps in forecasting. The happiness metric is the “human factor,” which lets us gauge the overall team health.
The first three measures form a self-supporting triangle that prevents gaming the system. If you crash your cycle time, then defects will almost certainly go up. A high planned-to-done ratio can be great, unless cycle time is through the roof, showing the team is getting very little done per sprint. Finally, by layering happiness over the rest, you can see the human side of the equation. A low happiness score is nearly always a sign of underlying problems and can be a leading indicator of something else.
You may be wondering about velocity. I track velocity also, but I think it has a very specific place. The four team metrics are for the team to reflect upon during a retrospective, with an eye toward getting better.
Velocity, on the other hand, is a measure the team uses during sprint planning. Its only use is as a rough gauge to how much work to take on in the next sprint. It also can be horribly misused if shared up the management chain—there are better ways to predict when a team will be done or how effective it is.
When measuring velocity, I measure both the story point and story count velocity. By doing this, I find the team has built-in checks and balances to their workload. For example, let's say the team has a three-sprint average of 50 story points and ten stories. If their next sprint is 48 points and nine stories, then they are probably going to finish all the work. If they exceed one of the numbers—say, doing 48 points but twenty stories (a bunch of small ones)—then the sprint might be at risk, as that's a lot of context switching. And if they exceed both numbers—say, committing to 70 points and fifteen stories—then this is a clear warning flag, and a good coach might want to touch base with the team to make sure they are confident that they can do better than their rolling average.
Metrics in Action
These charts are based on real data and are a snapshot about eighteen months into an agile transformation. I tend to stick with a six-month rolling window because if you go much beyond that, things have changed so much as to be irrelevant to what the team is doing or working on now.
The spike represents the team moving to a new project and the ramp-up time as they got used to the work on the new project while going through a series of organizational changes.
This graph shows a fairly typical curve for teams that have moved to cross-functional roles and automated testing. With everyone in the team jointly responsible for the story and the quality and a greater focus on test automation, we see a dramatic drop in defects found in the product after release.
This team lost its ScrumMaster, which impacted its overall performance, as reflected in the first sprint's data. In the second sprint, an experienced ScrumMaster came in to help. The early dips represent the team getting used to a new set of norms, and the later dips were a result of changes in the program that reduced the clarity of the team's backlog.
These data show how the support of a ScrumMaster improved the team’s overall health. The graph also reflects that the churn in the product and organization impacted the team’s happiness later on.
Based on these graphs, the first thing I'd plan is to engage with the team and listen to what's been going on in the last couple of sprints. The dip in the planned-to-done ratio and in the happiness metrics are enough to tell me there might be something going on. The low cycle time and escaped defects would lead me to suspect the problems were external to the team.
The real challenges were coming from a chaotic product strategy that had the team bouncing around among priorities. The volatility in the backlog changes led to lower quality stories. The team was developed enough to stop when they dug into a story they didn't understand and shift to work they did. This lowered the planned-to-done ration because not all work committed to could be finished, while cycle time was low as they worked on things they had a good understanding of.
Try These Team Metrics
These are the team metrics I've had the most luck with. Their interrelationship prevents gaming one measure without impacting others. They provide useful data to the team for retrospective improvement, and they are meaningful to leadership and help with forecasting.
If you’re interested in trying these metrics out, you can use the Team Dashboard pack I’ve created in Google Docs by downloading it here.
Great article, your metrics will be put into use immediately. I have one question regarding the expected trend, prediction information on the team metrics. I see the expected trend averaging the last two completed story points and I saw the note to average the worst 3 completed story points for worst case pediction. However, the expected prediction and worst case trend seem to be non functioning equations. Are these useful and how can they be put to use, if they are useful?
My apologies for totally missing this question.
You've found a place where I'm in process of trying a new formula. Previously I used a rolling, last three sprint average. I'm trying to move to a Mean Average using the best and worst. Right now the formula is not working in the Google Doc. My apologies for the confusion.
Great article! Hopefully all teams will have a set of metrics but they must be visible to be useful. See my post on Creating a Culture Change with Visual Management
I’ve used Planned to Done metric can be helpful to get teams who aren't meeting their commitments. However, this metrics can introduce a lot of anxiety and unnecessary introspection especially when the problem is outside of the team’s control. I’d recommend an alternative the SAFe Program Predictability Measure. The benefit of this measure is a ratio of the business value delivered not the team. To get this the Business/PO is involved at the start to set value and during the demo to assess value. Now the larger team can have discussions the internal and external reasons for not meeting the expectations.
Interesting idea, I can definitely see value in this once the organization has moved into being able to apply business value to their stories. Do you see it being valuable in early stages when the product owner may not even be fully engaged and is still doing just rank order prioritization?
Great article. I really like the planned to done ratio metric. I wonder how "escaped defects" are measured. Is this supposed to be only based on customer complaints? Plenty of studies show that dissatisfied customers often do not complain. They either find a workaround, stop using a feature, or abandon the product altogether. Should any issues found before release and not fixed be part of the "escaped defects"? I am sure that in any software application there is a growing number of such issues because adding features always trumps fixing existing flaws.
HI, Tim. Sorry, I totally missed this question when you posted it.
Escaped Defects is any issue found after the sprint has ended. Good agile teams should be practicing Definition of Done and any issues found during the sprint are either addressed or the product owner determines they don't need to be fixed for the work to be "done". At that point, any future work is a new story. A key thing here is that we don't have defects or bugs during a sprint. We just have incomplete work on the User Story.
Once a sprint has ended, then any issue found is an escaped defect. It doesn't matter if test finds it or a customer reports it, it is a defect.
Escaped defects is a very incorrect metric. One simple reason is that you don't know how many defects will be found in the field over the coming years. For enterprise software or small products, we think we know what defects escaped. For large products, we really don't know.
This metric among others have been out of favor for a long time in the testing community: http://kaner.com/pdfs/metrics2004.pdf
Of course, there are others who do follow escaped defects. I would strongly advise against it.
That hasn't been my experience. I've worked in customer support and in program management, either end of the development chain, in large enterprise organizations. If you set up things the right way, you can easily trace a field defect back to your engineering teams. Earlier in the value stream (after sprint end and before ship) it's even easier to trace a defect back to a specific team. If you are doing continuous integration, then it becomes almost automatic based on who made the last commit.
What would you suggest as a way to trace the quality of the work the team does? I'm always looking for better ways.
How do you know what the defects in the field are? If it is only based on customer complaints you are missing a good chunk. Only about one third of those customers who have a reason to complain about a defect will do so. The others either live with the quirkiness, stop using the broken feature, or drop the product entirely due to quality issues.
One option can be to ask customers where they encounter quality issues / bad user experience. Such surveys tend to have low return rates amd often only yield responses from those who would have complained anyway.
One might argue that an issue that no customer complains about it not really a defect, but even a spelling error or a misaligned UI control adds to the overall impression of the product.
great article, great ideas.
i just created a 'escaped-bugs-query' for our last release.....
(and i think about getting better results in measuring our happiness :-))