This month we take an "agile" slant on metrics for CM, including the CM process itself. Agility is supposed to be people-centric and value-driven. So any metrics related to agility should, at least in theory, provide some indication of the performance and effectiveness of the value-delivery system, and how well it supports the people collaborating to produce that value. We borrow heavily from the concepts of Lean Production (and a little from the Theory of Constraints, a.k.a. TOC). Let's see where it takes us.
Back to Basics: Agility Is People-Centric and Value-Driven
I'm a big fan of rhyme and reason for metrics, and I value the essence of Victor Basili's GQM paradigm (Goal-Question-Metric) in providing the rational behind a metric. One of the more famous quotes from TOC founder Eliyahu Goldratt goes something like "Tell me how you will measure me, and I will tell you how I will behave!" That's a very appropriate source for this column because much of Agile methods borrowed heavily from the concepts of Lean and TOC, so it makes sense to see what they have to say on the subject of metrics for Agile CM environments.
Agility, at its core, is predicated on the rapid delivery & production of business value via the power of the people who produce it! In terms of what to do, and what to stop doing, both Lean and TOC have a similar and relentless focus:
- Lean focuses ruthlessly upon "flow", and the elimination of waste (or what it calls "muda")
- TOC focuses on "throughput", and the elimination of constraints/bottlenecks
To the extent that "flow" and "throughput" are talking about the same thing, both Lean and TOC seem (almost) perfectly aligned. The question is, what is the "flow" whose "throughput" needs to be maximized and optimized? David J. Anderson in his book and related articles suggests that Agile, Lean and TOC converge and complement each other when we consider business-value as the "thing" whose flow and throughput are to be optimized. The first of the principles behind the agile manifesto states:
"Our highest priority is to satisfy the customer through early and continuous delivery of valuable software."
"We increase return on investment by making continuous flow of value our focus."
That pretty much settles it for me: the overarching goal is the continuous delivery of working software that realizes maximal business value and ROI. So the ideal for a metric about the CM process would be to actually measure and report the overall ROI of CM to the business. This elusive CM ROI measurement is highly sought after by many CM practitioners, some might even call it a "silver bullet" for proving the worth of CM. We won't be quite so ambitious in this column, but we will look at how to measure aspects of CM for their impact on the overall value delivery system.
Cumulative Flow and Throughput Accounting
How do we attempt to measure business value and its flow for product development? For software development it would mean measuring the "value" of the software that was delivered. And that value is determined by how much the customer will pay for it at the time of delivery -- such value can depreciate over time, making time-to-market (lead-time) a critical concern in product feature definition and development. David Anderson actually proposes a throughput accounting system for software management and also writes of managing lean software development with cumulative flow.
I'll let readers pore through those articles rather than repeat them here. The basic gist is still the same, namely treating software/product development as a continuous flow of value through a value-delivery stream and applying basic throughput accounting (from TOC) and Lean techniques (such as cumulative flow diagrams).
CM in the Value-Stream
Now is the time to discuss the role of CM in the value-delivery stream of software development. Lean uses a lot of Japanese terms, and Mary Poppendieck once told me in an email correspondence that CM was vitally important in Lean - if it had been assigned a single word name and meaning in the "Lean vocabulary" then she thinks it probably would have translated to the word "order" (others in the CM world might prefer the term "integrity").
CM ensures product integrity by establishing & maintaining an orderly flow of the value stream and its evolution. That value stream begins with the process for assessing, approving, and authorizing requests (for projects/products and their changes); and it ends with the delivery of value that has been realized by reliably integrating and assembling it in repeatable, reproducible and traceably transparent fashion!
So two of the most prominent places where CM plays a role in the value stream are the front-end process for managing requests, and the back-end process for establishing baselines and managing their build/integration/assembly. And of course if we are delivering value "continuously" then we are also continuously performing request management/prioritization as well as build/integration management. Let's take a look at how some of the concepts of Lean apply to these portions of the Agile development process and the metrics that are suggested by them!
Being ever so Takt-full
One very obvious metric well aligned with lean is average end-to-end cycle-time for realizing the value of a request. The clock starts ticking when the request first arrives and stops when the implemented solution for that request is delivered to the requesting customer. ITIL circles might recognize this as resolution cycle-time. Closely related to the average resolution-rate is the metric for the average arrival-rate of requests. The intent is to apply them to the end-to-end scale to achieve a closed-loop with the customer.
These can additionally be applied internally to a CM group managing requests to build/integrate/release a version of the product or system. When multiple levels of integration are required, this can (and probably should) be applied to the entire integration pipeline, starting from the lowest-level integration/build request and ending when the complete (top-level) product or system has been baselined and released.
Those metrics are probably nothing new to veteran practitioners. However, another closely related concept in Len might very well be, and that is the concept of Takt time. Blogger Peter Abilla writes many informative articles about Lean concepts and theory(go see his series of articles on Lean/TPS and Queing theory). Abilla describes Takt-time as follows:
Takt time comes from a German word 'takt', which means rhythm or beat. Takt time is not the same thing as Cycle Time or Lead Time, though Takt Time has a very real relationship to both. Takt Time must be measured through a simple calculation:
(Takt Time) = (Net Available Time per Day / Customer Demand per Day)
Takt time is measured as (Time/Piece), not the other way around. This is important because the operator knows that he or she only has so much time per x. ... to a large measure, Takt Time is about the maintenance of 'Flow', which is very important in queueing efficiency.
The 'Batch' is back!
Also from Lean production come the notions of process-batches and transfer-batches (especially batch processing-time versus batch transfer-time). The time to "process" a batch (batch processing time) is the amount of time it takes to create or produce the elements in the batch. The batch "transfer-time" is the amount of time it takes to transfer one or more "batches" into the value-stream. A "transfer batch" may encompass more than one "processed batch", but the alleged ideal is when the transfer batch-size is equal to the process batch-size (a concept called single-piece flow). When the batch transfer-time approaches the batch processing-time, then single-piece-flow no longer produces optimal results.
One of the conclusions of Lean is that small "batch-sizes" are key to optimizing the flow of the value-delivery stream. For software development, this is basically saying iterative & incremental development utilizing evolutionary delivery is a critical success factor (nothing new here). A "batch" can take many forms: a "batch" of features for an iteration, a "batch" of changes for a build, a "batch" of modifications for a change, etc. We'll discuss applicability to some of those a bit later in this article.
Measures of Continuous and Reliable Flow
The word "continuous" keeps popping up in discussions of Agile development. We have continuous integration, continuous flow, continuous value-delivery, continuous testing/review, etc. In most cases, these activities aren't truly continuous, they are just always ongoing with very high frequency. But continuous has another meaning here besides ultra-high frequency; it also means always "ready", as in always available, serviceable, or of "shippable quality." This relates to being "agile" in response to changes in the (business) environment.
So in addition to measuring and monitoring the "frequency" of those things that are supposed to be continuous (like integration), concepts and measures of system availability, reliability and survivability also apply:
- If the build is broken, it is akin to a "system outage" of the integration codeline used by developers
- If the request/order processing queue is blocked for any reason, the development & implementation process may starve from a lack of timely requirements input
Identifying Waste and Bottlenecks
Measuring value and its "flow" isn't enough. If we wish to improve and optimize that process, we need insight into what is impeding the flow of value and what is not adding value (and hence producing waste). According to Mary and Tom Poppendieck, the seven forms of "muda", or waste, in lean production map to the following seven forms of software waste:
- Extra/Unused features (Overproduction)
- Partially developed work not released to production (Inventory)
- Intermediate/unused artifacts (Extra Processing)
- Seeking Information (Motion)
- Escaped defects not caught by tests/reviews (Defects)
- Waiting (including Customer Waiting)
- Handoffs (Transportation)
So to the extent that we can identify or measure these in our value-stream, we can identify opportunities for improvement:
- Waiting, handoffs, and information seeking can often be identified by measuring time spent in or between various "states" of change/request workflows in a corresponding change/request tracking system.
- Decision-making time for assessment, acceptance, authorization, prioritization, and project/release selection are also forms of waiting or motion in the change/request management system
- Inventory can be measured by looking at existing "work-in-progress" that has not yet been integrated or synchronized.
Of course defining and instituting such workflows must be done in such a way as minimize the "friction" upon the value-creation process.
There are other applicable Lean concepts and measures, as well:
- "Kanban" or "Pull-systems" and their measurements
- Measurements from Queuing Theory (including those already mentioned) such as average arrival-rate, service-rate, utilization-rate, average wait-time, etc.
- Process Cycle Efficiency
- Traceability, Visibility and the Order of Pipeline events
- Supplier, Input, Process, Output & Customer (SIPOC) Metrics
Those are all interesting applications to investigate. For the remainder of this article we will focus on specific application of concepts already mentioned to version-control, and change-request management.
Request-queues as the "Value-Supply-Chain" for Development
For the part of CM that is viewed as “the front-end,” the value-supply chain is the request/change approval pipeline. Lean's view of Queuing theory and Little's law would apply. There would be overall cycle-time from concept-to-realization of a request/change, and some drill-down into segments in order to measure/identify & eliminate the various forms of software waste:
- Throughputof the request-queue is the [average] number of requests per unit of time. So the number of requests that arrive, are approved, or closed all relate to different aspects of throughput:
- the arrival rate tells us the demand, the approval rate tells us our how frequently we are supplying (“feeding”) actionable requests to development,
- and the closure rate tells us how frequently we are delivering realized value to the customer.
Note that the "value" associated with each request is not part of the equation, just the rate at which requests flow through these critical value-transformation points in the system: when it is first conceptualized, when it is understood & authorized, and when it is finally realized.
- Process Batch-size is all the requests that are decided upon in a single CCB or governance meeting, ...
- Transfer Batch-size would be the number of requests we allow to be queued-up (submitted) prior to approving them (possibly for a CCB meeting, or maybe for a particular release, iteration, or build/milestone). It is similar to a backlog (but not quite the same thing).
- Processing-time is average approval-time of a request from the time it arrives up until it is accepted and authorized. And ...
- Transfer-time is the time [average] time between approval of a request and when development work commences to implement the request. Sometimes a request is approved, but still waits a long time before it is prioritized high-enough to be selected for and targeted to a release, iteration or build/milestone. Transfer-time can also be the time between when a change-task is allocated until the time it is actively being worked on.
- Takt time in this case would relate to response time and would be the [average] number of request the team can approve during a given day/hour if they didn't have to wait-around for information from other stakeholders.
- System outage would occur if the approval pipeline is broken, and we are somehow unable to respond to customer demand by supplying development with approved requests to implement.
If any of our readers have other ideas about what the above would correspond to for request/change management (or disagree with any of the above), we’d like to hear from you. We don’t claim to have perfect knowledge or perfect understanding on how to translate these concepts into request-management terms and we appreciate any feedback that helps us to learn & improve.
Integration Codelines as "Streams" of Development Change-Flow
A codeline that is used to integrate changes and baseline the results can be viewed as a value-delivery stream of sorts. The codeline is the media through which software development changes flow in a value-stream. The efficiency and throughput of the system is not based so much on the customer assigned business-value for the functionality being developed: that can change at any time, along with any other business condition or priority.
So the efficiency and throughput of a codeline have more to do with the rate at which changes flow through the codeline. The value of those changes is assured or promoted to the extent that the quality of the changes can be assured, but ensuring the changes implemented are valued is a function more of the request/order management pipeline than of the development and integration pipeline. This view of a codeline as a stream through which integrated changes flow is supported by Laura Wingerd's book Practical Perforce: Channeling the Flow of Change in Software Development Collaboration (indeed, it is supported by the very title of the book).
If we regard a codeline as a production system in this manner, its availability to the team is a critical resource. If the codeline is unavailable, it represents a "network outage" and critical block/bottleneck of the flow of value through the system. This relates to the above as follows:
- Throughput of the codeline is the [average] number of change "transactions" per unit of time. In this case we'll use hours or days. So the number of change-tasks committed per day or per hour is the throughput (note that the "value" associated with each change is not part of the equation, just the rate at which changes flow through the system).
- Process Batch-size is all the changes made for a single change-task to "commit" and ...
- Transfer Batch-size would be the number of change-tasks we allow to be queued-up (submitted) prior to integrating (merging, building & testing) the result. Note that if we target only one change-task per integration attempt, then we are basically attempting single-piece flow.
- Processing-time is average duration of a development-task from the time it begins up until it is ready-to-commit. And ..
- Transfer-time is the time it takes to transfer (merge) and then verify (build & test) the result. We could also call this the overall integration time (or the integration-request cycle-time)
- Takt time in this case would regard the development as the "customers" and would be the [average] number of changes the team can complete during a given day/hour if they didn't have to wait-around for someone else's changes to be committed.
- System outage would occur if the codeline/build is broken. It could also be unavailable for other reasons, like if corresponding network or hardware of version-control tool was "down", but for now let's just assume that outages are due to failure of the codeline to build and/or pass its tests (we can call these "breakages" rather than "outages")
- MTTR (Mean-time to repair) is the average time to fix codeline "breakage," and ...
- MTBF (Mean-time before failure) is the average time between "breakages" of the codeline
Note that if full builds (rather than incremental builds) are used for verifying commits, then build-time is independent of the number of changes. Also note that it might be useful to capture the [average] number of people blocked by a "breakage," as well as the number of people recruited (and total effort expended) to fix it. That will helps us determine the severity (cost) of the breakage, and whether or not we're better off trying to have the whole team try to fix it, or just one person (ostensibly the person who broke it), or somewhere in between (maybe just the set of folks who are blocked).
Nested Synchronization and Harmonic Cadences
"The practical economics of different processes may demand different batch sizes and different cadences. Whenever we operate coupled processes using different cadences it is best to synchronize these cadences as harmonic multiples of the slowest cadence. You can see this if you consider how you would synchronize the arrival of frequent commuter flights with less frequent long haul flights at an airline hub."
In their latest book Implementing Lean Software Development: From Concept to Cash, Mary and Tom Poppendieck advise using continuous integration (with test-driven development) and nested synchronization instead of infrequent, big-bang integration. Both this and Reinertsen's statement above apply directly to Agile CM environments and how we might measure key indicators related to these factors:
- Harmonic cadences address nested synchronization of integration/build frequencies, both in the case of
- different types of builds (private build, integration build, release build), and ...
- different levels of builds (component builds, product-builds)
- and also in the case of connected supplier/consumer "queues" where builds or components are received from an (internal or external) supplier and incorporated into our own product/components builds.
As mentioned earlier, measuring the overall cycle-times for each of these types/levels of build, as well as the overall cycle-time to progress through all the build types/levels from beginning to end, can be important measures for an Agile CM environment. Drilling down further to identify waste/bottlenecks could entail measuring the segments of each of those cycle-times that correspond to waiting, information-seeking, decision-making, and integration/build server processing time and utilization.
- In the arena of change/request management, harmonic cadences would also address release cycle planning for a product-line of products that are releases of multiple (internal and external) component releases. One can think of an overall portfolio/project and governance/CCB strategy as encompassing progressively larger levels of scale: requirement, change, use-case, feature, iteration, project/release, product/program, etc. ITIL service management addresses the various levels of these across an enterprise. Nested synchronization and harmonic cadences address how to integrate the management of networks of queues and sub-queues diverging and converging across all impacted elements of an enterprise architecture and its applications, services, and the products the enterprise uses those to produce.
- Nested synchronization would seem to apply to branching structures where development branches feed into integration/release branches and their relation to mainline branches, and the direction and frequency with which changes get merged or propagated across codelines. Here again the cycle-time for "transferring" changes from one branch to the next (as well as through all the branching levels) can be a useful measurement for ensuring "flow."
Of course, when you can manage without the "nesting", that is ideal for continuous integration. Continuous integration together with test-driven development seems to approximate what Lean calls one piece flow. An article from Strategos discusses when one-piece flow is and isn't applicable.
In the context of SCM, particularly continuous integration and TDD, one piece flow would correspond to developing the smallest possible testable behavior, then integrating it once it is working, and then doing the next elementary "piece", and so on. This is typically bounded by:
- the time it takes to [correctly] code the test and the behavior
- the time it takes to synchronize (merge) your workspace (sandbox) with the codeline prior to building & testing it, and ...
- the time it takes to verify (build & test) the result
So measuring each of the above (rather fine-grained) could also prove useful -- provided that it could be done unobtrusively. If one has instituted and automated a repeatable process for creating reproducible builds, then the measures can often be performed or triggered automatically during the automated process (or the data can be generated and logged for subsequent collection and analysis).
Note that working in such extremely fine-grained increments might not always work well if the one-piece-flow cycle-time was dominated by the time to synchronize (e.g., rebase), or to build & test, or if it usually resulted in a substantially disruptive or destabilizing effect on the codeline. In those cases, if the time/cost "hit" was more or less the same (independent of the size/duration of the change), since the penalty per "batch" is roughly the same for a batch-size of one piece as it is for a larger batch-size, then it makes sense to develop (and "transfer") in larger increments ("batch-sizes") before integrating and committing changes to the codeline. Hence making some basic measurements to monitor these values can give an indication of when to stop striving for one-piece flow, or else when to revisit the build process to eliminate bottlenecks!
Experiences with Version Control-related Metrics
We've seen one or more of the following being gathered (not necessarily all of them on the same project):
- mean & median commit frequency
- average commit "granularity" (delta-lines-of-code per commit)
- average number of changed files per commit
- average number of lines changed/added/removed per file changed per commit
- average number of files with merge-conflicts per commit
- average time to resolve merge-conflicts per commit
- average build-time for a private build (usually an incremental build)
- build-time for an integration build
- average integration-test cycle-time (time to run all the tests)
- change volatility report (a report of the most frequently modified files over a given time period)
- change collision report (a report of files that had the most parallel development and/or checkout-lock-contention during a given time)
Granted, the last two items above aren't really metrics (they're reports), but hopefully you get the idea.
Metrics regarding change-tasks' size and commit-frequency are relative rather than absolute metrics. We want to observe the trend over time so we might notice when: integration frequency went from ~2hrs last iteration to ~3hrs this iteration while the change-task size stayed mostly the same; or maybe the task-size increased as well, in which case we might ask if we were perhaps integrating too frequently before, or are we integrating to infrequently now, and why?
Build-time and test-time are important by themselves as absolute numbers because they help give us a lower-bound on the minimum time required to commit changes. And might help us decide whether we should try to do full builds instead of incremental, just as an example.
Combine all of those above and they basically give us an idea of approximately how many changes might be committed (or be prevented from being serially committed) during the time it takes to build & test. This might help us decide when/if to switch from single-threaded commits to multi-threaded or vice-versa, or whether to go from running partial builds/tests to more complete builds/tests.
The information about number of conflicts and time to resolves them can also help feed into the above, plus also gives as some "trends" about how disjoint are change-tasks typically are. If there is an increasing trend of parallelism, then maybe we need to ask if we are doing a good job of defining the boundaries of a use-case (story), or of splitting up the work for story effectively.
If certain files are more prone to parallel development than others, it may be an indication of a "code smell" and perhaps some restructuring should be done to minimize dependencies and/or checkout contention and merge conflicts. (See several of the techniques described in John Lakos' book Large Scale C++ Design). If certain files are more frequently edited than others, than we may want to look closely at which of those files significantly impact build-time (for incremental/dirty builds). It too may be an indication that the file should be split-up and/or its code refactored, not necessarily because of checkout-contention, but to minimize rebuild time (which brings to mind more techniques from Lakos' book).
In general, these metrics would be used to help us find/regain our team's "sweet spot" to dynamically adjust the successful balance between integration frequency, commit-frequency, change granularity, build/test "completeness", and when and how often to run full and complete builds and test-suites. A few of them would also help raise a yellow flag about situations where refactoring (or process improvements) might help minimize merging effort and build-times.
(Im)Parting Words of Advice: Principles and Guidelines for Gathering CM Metrics
In this article, we introduced a lot of potentially new terms and concepts from Lean and Theory of Constraints and explored how they apply to SCM. Many possible measurements were proposed without much time devoted to which ones will prove most useful or when to attempt measuring them (you probably don't want to start with all of them at once). Giving such advice can be very perilous without being situationally-aware of the individual parameters and constraints of project and team. We instead attempt to give some concise "meta-advice" to those attempting this sort of thing.
Two almost diametrically opposed concerns for CM metrics in an agile environment are that:
- On the one hand, Agile's insistence on very small/tight feedback loops with ultra-high frequency makes for a lot of fine-grained details to attempt to measure in order to gain insight into what's going on
- On the other hand, Agile's insistence on "people and interactions over process and tools makes it extremely challenging to collect such metrics transparently and unobtrusively.
So we need to find an appropriate balance between the value of the metrics and the cost of gathering them. After all, we don't want to create waste (muda) as a result of our metric process itself.
Value-Up! Agile metrics need to reward and reinforce the mental shift from the traditional “work-down” attitude to the newer “value-up” attitude espoused by Sam Guckenheimer in his book Software Engineering with MSVSTS. Primary measures promote the notion that:
“Only deliverables that the customer values (working software, completed documentation, etc.) count. You need to measure the flow of the work streams by managing queues that deliver customer value and treat all interim measures skeptically.”
Measure Up! Don’t use metrics to measure individuals in a way that compares their performance to others or isolates the value of their contributions from the rest of the team. The last of the seven principles of Lean software development tells us to “Optimize across the whole.” When measuring value or performance, it is often better to measure at the next level-up. Look at the big-picture because the integrated whole is greater than the sum of its decomposition into parts. Metrics on individuals and subparts often create suboptimization of the whole, unless the metric connects to the “big picture” in a way that emphasizes the success of the whole over any one part. Mishkin Berteig writes the following regarding “Measure Up!”
“In a single-team situation this means that individuals are measured and rewarded based on team performance (their sphere of influence). In a multi-team environment, that means that the group of teams should be measured as a group and compensated as a group. This will encourage all teams to work towards the success of the overall project. I personally believe there is some room for individual-based compensation, but the way it is handled needs to be done so that it does not encourage sub-optimal behavior.”
Measure Wisely! Applying the principles of “measure up!” and “value-up” may sound great in theory, but it still doesn’t tell you what to measure. It may give some general advice, but is lacking in specific criteria. Deborah Hartmann and Robin Dymond wrote a paper on Appropriate Agile Measurement that gets into more specifics about criteria and checklists to use when selecting measurements for agile projects and teams.
- Keep them lightweight - start light and see what happens. If you are getting interesting results, consider some deeper investigations into a particular area.
- Review and discard when appropriate - Metrics often have a shelf life - what was useful at one point in your development cycle or state of your development process can become irrelevant or not cost-effective later on. Some metrics are always useful, some have temporary value only.
- Make the tools do the work - dig metrics out of log files or by analyzing your repository rather than by forcing developers to enter extra keywords and fields. Ask yourself the question: "What are the minimum things developers need to do to get their work done?", and "What can we gather without extra input?" As Joel Spolsky puts it: "People add so many fields to their bug databases to capture everything they think might be important that entering a bug is like applying to Harvard. End result: people don't enter bugs, which is much, much worse than not capturing all that information."
- Finally - Don't abuse your metrics! Use them in a trustworthy and transparent way (see the previous. If you use metrics to punish (or reward) individuals, it can utterly kill collaboration & cooperation. People will will quickly learn to give you what you measure (as per the quote from Eliyahu Goldratt at the start of this article) and management will reward mercenary motion over palpable progress. If instead you collect and publish the metrics in a spirit of openness, collaboration, cooperation, and continuous improvement, they can add tremendous value!
We close by noting that some have described CM as the part of the software development process that is most like assembly-line manufacturing. It is in this vein that we offering the following quote:
"There are five golden metrics that really matter: total cost, total cycle time, delivery performance, quality and safety. All others are subordinate."
--from Manufacturing’s Five Golden Metrics