IT organizations and, in particular, application development departments, are increasingly under pressure to provide performance and compliance metrics to justify annual spend. Unfortunately, many metrics campaigns collapse under their own weight. There's no shortage of things to measure, the scope is greater than first appears, measurement needs to happen frequently, and collection is a distraction to traditional software delivery operations. By comparison, Agile processes are uniquely well suited to metrics, providing measurements transparently and consistently as an extension of day-to-day operations. Framed in a scorecard, information collected during an agile project provides a comprehensive analysis of delivery excellence at the project, program and department levels.
Metrics campaigns soon face the problem of having too many things to measure: technical analyses of code, defect counts, time to deliver, and so forth. Without structure, the data collection doesn’t tell the whole story, or some measures receive greater weight than others. The result is a confusing or distorted picture of a software development organization.
The scope of a metrics exercise is also deceptively large. While development excellence measured through code quality is important and easy to measure, it gives only a partial picture of delivery. The rest of the lifecycle including project management and requirements gathering, while not as easily measured, have just as great an influence on results.
Furthermore, metrics must be constantly collected, analyzed and reported. Beyond code quality measures, traditional methodologies don't lend themselves to measurement beyond "done/not-done" at a coarse measure of work. Detailed process data collection is highly subjective and an encumbrance and, subsequently, inconsistently executed by development organizations.
Agile projects intrinsically provide a complete set of metrics which are transparently generated and collected. Unit tests, code quality analysis and other technical measures are automated and integrated into continuous build (when using tools such as CruiseControl). The common unit of work – the story – is the foundation for fine-grained measures of project management, quality and functional compliance. Iterative execution gives cadence to reporting, allowing data to be reported frequently. Coupled with automated tools, development performance data collection is an extension of, not encumbrance to, Agile project delivery operations.
Getting Information out of Data
With scope, breadth and consistency settled, all that’s left to do is shape development performance data into information that can support decision making. To paint a picture of "delivery excellence" (i.e., excellence across the software development lifecycle from inception to post-production support), the team must categorize and normalize metrics into a common scale. An overall delivery excellence rating can then be defined and performance consistently analyzed and trended.
The first step is categorization. One model is to analyze four delivery dimensions: development excellence, quality, business responsiveness and project management.
Development excellence consolidates objective measures of code. Again, there is no shortage of instrumentation and this exercise alone can create a confusing picture. It’s helpful to divide development measures into two categories:
- Toxicity is an analysis of undesirable characteristics of the code. Analyses of violations (e.g., anti-patterns), code duplication, design quality ratings (using tools such as JDepend) and many more define how toxic a code-base is. A lower score (zero violations, zero code duplication, etc.) is preferable.
- Hygienity is an analysis of desirable characteristics of the code. Components might include the percentage of the code base covered by unit tests or the extent to which code complies with object-oriented (OO) standards. A higher score (degree of OO compliance, test coverage) is preferable.
It is possible to have a high degree of desirable attributes while also having a high degree of undesirable attributes. This categorization prevents any single measure from having a distorting influence on development excellence.
Quality metrics provide an indication of defects that escape development as well as an indication of functional compliance to specification. This underscores a particular strength of Agile processes: iteratively executed story-based QA provides feedback on technical and functional compliance with greater frequency than other methodologies. As a result, quality problems are exposed much earlier in a project, reducing the probability of substantial re-work or compliance blowback as the target delivery date draws near.
Business responsiveness measures make business and IT alignment highly visible. Requirements in agile projects are captured as stories, and stories are an expression of business value. Thus, the team can index the degree of business value targeted and business value delivered. Variance in this measure suggests an alignment problem between business and IT.
It is also important to rate project management. In waterfall or ad-hoc projects, project management assessment is Boolean (e.g., was the release date met or not?) as opposed to being a continuous assurance of delivery. While variance of delivery phases (including development, integration and certification) can be analyzed on an agile project, story-based requirements and iterative planning and reporting uniquely allow Agile teams to provide accurate forecasting. Successfully executed, this creates a high degree of transparency throughout the delivery process. On Agile projects, the question changes from "was the date made or not?" to "what was the accuracy of goal-setting?" This exposes potential deficiencies in requirements gathering, estimating, and planning.
The metrics scorecard consolidates performance data and shows an overall rating of delivery, a rating by category and details within each. Because each metric will be reported in a different unit of measure (time, money, scores, percentages, etc.) the data must be normalized into a common scale (such as -2 to +2) before it can be consolidated into a scorecard. To form the scale and individual rating, it is also necessary to define scoring thresholds. For example, unit test coverage above 80 percent might be considered an excellent result. Therefore, the threshold for a score of "2" would be a score at or above 80 percent coverage, a "1" would be awarded for code coverage between 60% and 80 percent, "0" would be for 40 percent to 60 percent code coverage, and so forth. A project reporting 75 oercent code coverage would score "1" on this scale. In the same way, simple heuristics can be defined for each individual measure, and aggregate scores calculated.
The resulting scorecard will paint a complete picture of a project, program or department performance relative to organizational or industry baselines.
In the early stages of a metrics campaign not all of this data will be available, either because there is no history or because other development methodologies don’t allow for this type of analysis. The absence of the baseline does not negate the value of collection: over time a baseline will emerge by virtue of collecting this data.
The resulting scorecard provides a thorough analysis of the internal business processes of a software development organization. From this analysis, hotspots as well as an overall picture of Business/IT alignment become highly visible.
About the Author
Ross J. Pettit has over fifteen years experience delivering complex development projects and managing multi-national operations as a developer, manager, and consultant. He holds a BS in Management Information Systems and an MBA. He is currently consulting to global clients implementing Agile practices as a Client Principal with ThoughtWorks.