the build process to eliminate bottlenecks!
Experiences with Version Control-related Metrics
We've seen one or more of the following being gathered (not necessarily all of them on the same project):
- mean & median commit frequency
- average commit "granularity" (delta-lines-of-code per commit)
- average number of changed files per commit
- average number of lines changed/added/removed per file changed per commit
- average number of files with merge-conflicts per commit
- average time to resolve merge-conflicts per commit
- average build-time for a private build (usually an incremental build)
- build-time for an integration build
- average integration-test cycle-time (time to run all the tests)
- change volatility report (a report of the most frequently modified files over a given time period)
- change collision report (a report of files that had the most parallel development and/or checkout-lock-contention during a given time)
Granted, the last two items above aren't really metrics (they're reports), but hopefully you get the idea
Metrics regarding change-tasks' size and commit-frequency are relative rather than absolute metrics. We want to observe the trend over time so we might notice when: integration frequency went from ~2hrs last iteration to ~3hrs this iteration while the change-task size stayed mostly the same; or maybe the task-size increased as well, in which case we might ask if we were perhaps integrating too frequently before, or are we integrating to infrequently now, and why?
Build-time and test-time are important by themselves as absolute numbers because they help give us a lower-bound on the minimum time required to commit changes. And might help us decide whether we should try to do full builds instead of incremental, just as an example.
Combine all of those above and they basically give us an idea of approximately how many changes might be committed (or be prevented from being serially committed) during the time it takes to build & test. This might help us decide when/if to switch from single-threaded commits to multi-threaded or vice-versa, or whether to go from running partial builds/tests to more complete builds/tests.
The information about number of conflicts and time to resolves them can also help feed into the above, plus also gives as some "trends" about how disjoint are change-tasks typically are. If there is an increasing trend of parallelism, then maybe we need to ask if we are doing a good job of defining the boundaries of a use-case (story), or of splitting up the work for story effectively.
If certain files are more prone to parallel development than others, it may be an indication of a "code smell" and perhaps some restructuring should be done to minimize dependencies and/or checkout contention and merge conflicts. (See several of the techniques described in John Lakos' book Large Scale C++ Design ). If certain files are more frequently edited than others, than we may want to look closely at which of those files significantly impact build-time (for incremental/dirty builds). It too may be an indication that the file should be split-up and/or its code refactored, not necessarily because of checkout-contention, but to minimize rebuild time (which brings to mind more techniques from Lakos' book).
In general, these metrics would be used to help us find/regain our team's "sweet spot" to dynamically adjust the successful balance between integration frequency, commit-frequency, change granularity, build/test "completeness", and when and how often to run full and complete builds and test-suites. A few of them would also help raise a yellow flag about situations where refactoring (or process improvements) might help minimize merging effort and build-times.
Parting Words of Advice: Principles and Guidelines for Gathering CM Metrics
In this article, we introduced a lot of potentially