Kanban System Design

[article]

The kanban software development community can be traced back to Agile2007 in Washington DC. At that conference, a number of people were talking about their different approaches to development that they were using. Chris Matts was talking about "real options" and "feature injection," Arlo Belshee was talking about "naked planning," and David Anderson was talking about kanban. All three had some similarities, which inspired a group of people to go away and experiment themselves and share their experiences. The name the group chose to use as an identity was “Kanban.”

Kanban is the Japanese word for visual card, and it can have a number of interpretations with respect to software development. Firstly, it could be used to refer to the index card commonly used by agile teams. Secondly, it could be used to refer to an agile team’s task board, or story board. Finally, it could be used to refer to the whole system within which an agile team works.

In his book Toyota Production System [1], Taiichi Ohno says, “The two pillars of the Toyota production system are just-in-time and automation with a human touch, or autonomation. The tool used to operate the system is kanban.” With this perspective, a kanban system for software development refers to the whole system, and not simply the tool or the board. The community chose to name the systemic approach after the tool that inspired much of the thinking.

Systems Thinking
Viewing kanban as a systemic approach leads to systems thinking. Systems can be thought of as being made up of elements, which interact to meet a purpose. They are more than the sum of the parts, and the system’s purpose is crucial in determining the system’s behavior.

John Seddon, in his Vanguard Method [2], says that the way we think about systems usually defines the system, which then defines performance. In order to improve a system, we should change our mind-set by first understanding the purpose of the system. This allows us to define measures which help create knowledge about whether the system is meeting that purpose. Finally we can design a method to enable the system to meet that purpose.

Kanban is a way of designing a method, and generating metrics, in order to improve capability to meet a purpose. The remainder of this article discusses five aspects of a kanban system: workflow, visualization, work in process, cadence, and learning. These aspects are not practices to be followed, but leverage points that help thinking about the method used to change and improve an organizations delivery capability.

Workflow
Workflow is the understanding of how business or customer value travels through the kanban system.  The agile community recognized that software development is a knowledge creation activity that includes randomness and variation and the “inspect and adapt” mantra is the response that makes the impact visible such that the feedback can be used to learn and respond accordingly. We can take this further by understanding the mathematics and science behind the randomness and variation and exploiting this to our advantage.

 Recognizing the workflow through an activity such as value stream mapping can give us this additional transparency, which we can use to influence our process. Value stream mapping is so called because its focus is on understanding how units of value flow through a system. For software development, this can be generalized into how units of value expand into smaller units of work, which then collapse back to deliver the original value.

For example, a specific benefit to a customer may expand into a number of features, which may expand into various user stories, which may then each expand into tasks. The tasks subsequently collapse back together to realize the user stories, which realize the features, which ultimately realize the benefit.

Understanding the workflow thus consists of knowing what we consider to be of value and what expand and collapse points there are to create feedback and deliver that value through the system.  It can be thought of as understanding the system structure, which may be a network rather than linear, and making our model of this system structure explicit helps us to deliver the value more effectively through the network.

Another way of thinking about discovering a workflow is to view it as process archaeology. A process often has many layers, and by digging through those layers we can surface what is really going on. This will typically involve talking to the team members about how they really work, and it will often result in something other than what was expected, as problems that were previously hidden are surfaced.

Common items to look for in a workflow include queues and batches and failure demand. Queues and batches are points in the workflow where the work is being processed. Queues are where work is building up because there is not enough capacity to process it and batches are where work is being held to be handed over and processed in a large volume. Failure demand is where work is the result of not doing something, or not doing something right. Rather than optimizing a value system for failure demand, the failure demand itself should be avoided.

Visualization
Visualization is the means by which we can understand the work and the workflow by creating a powerful visual management tool that shares a mental model of the system structure that is visual, interactive, and persistent.

In a recent TED Talk [3], Tom Wujec explains how this works when he talks about three ways that the brain creates meaning. First, visualization creates a mental model because of the way different areas of the brain process different visual inputs such as shape, size, and location. Second, interaction enriches the mental model further through engagement. Finally, persistence allows the mental model to be part of an augmented memory that can evolve over time.

This leads to the idea of boundary objects. Brian Marick wrote an introductory paper [4] in which he talks about communities and practice and interest. A community of practice is formed around a work discipline, while a community of interest is formed around a common problem or concern. Communities of interest are made up of members of different communities of practice. A boundary object provides a means for communities of interest to communicate across their different practices, and a visualization, through creating a shared mental model, can be a boundary object. This is because the mental model is created collectively and collaboratively, and helps clarify the meaning of what the visualization is representing.

Marick lists several properties of a boundary object that can be useful to bear in mind when building a visualization. It should be a common point of reference for the community of interest; represent different meaning to different members of the community; help translation between the meanings, support coordination, and alignment of the work within the community; be a plastic working agreement that evolves as the community learns; and address different concerns of the community members simultaneously.

Another relevant set of ideas to visual management are those raised by Dan Pink when he talks about the surprising science of motivation. In his book Drive [5] he says that rather than the carrot and stick approach of extrinsic motivation, a better approach is intrinsic motivation, which consists of three elements: autonomy, mastery, and purpose. Autonomy, or the “desire to direct our own lives,” is achieved when team members can see what needs doing, understand the working agreements, and choose themselves what they should do. Mastery or “the urge to get better and better at something that matters” is achieved through being able to interact with the visualization to evolve and improve it. Purpose, or “the yearning to do what we do in the service of something larger than ourselves,” is achieved when the persistence of the visualization makes it clear what the value of the work is and why it is being done.

A visualization consists of multiple pieces of data, and in the classic book The Visual Display of Quantitative Information [6], Edward Tufte introduces a set of principles for the effective display of data and it is insightful to review some of these ideas.

Tufte talks about a number of different types of graphical designs. Time series is probably the most common, where time is along the horizontal axis and another data type along the vertical. This is probably the least relevant design, because a system visualization is typically a snapshot of the current status. Similarly, a space time narrative, which tells a story in a spatial dimension over time, may not be the most obvious choice. It does raise the question of visualizing the narrative of the work over time though, which could be interesting. Maps also introduce some different ideas. What would a visualization look like if it showed the terrain of a project and where each piece of work was on that terrain? The most common form of visualization is probably a relational one, where the two axes show different types of information, such as scope and status.

Most of Tufte’s book is spent discussing ways of improving the way that data is presented; specifically, maximizing data ink, reducing chart junk, and improving data density. Data ink is the ink that actually represents data. While physical visualizations generally use more than just ink, the principle holds true for making sure that as far as possible, anything on a board should hold information. The corollary to this is that anything that isn’t data ink is chart junk. Grids, redundant data, or decorations and embellishments for aesthetics may create noise, which masks the real story. Finally, data density is the amount of data within the given space. The eye can take in a high precision of detail, so by maximizing the data ink and being clever with multi-functioning graphical elements, it is possible to visualize many dimensions in a small space.

A kanban system visualization is what Tufte would call a multi-variant display, with the variants typically being the usual project management details, but also including the concerns of any member of a system’s community of interest. As a starter, there are the popular “iron triangle” variants of scope, time, resource, and quality. Other common variants are things like priority, status, issues, risks, constraints, dependencies, and assumptions. More recently, teams have been talking in terms of variants such as capacity and demand, not to mention value and other economic aspects.

To visualize all these variants we can use a number of techniques. Properties such as size, color, format, location, and alignment can all create multi-functioning graphical elements to achieve a high data density, while for a physical visualization, material and texture can add further depth.

Work In Process
Work in process (WIP), and the way it is limited, is the means by which we can create a pull system which balances capacity and demand through a value network.

In a pull system, work is processed through being signalled, rather than being scheduled. This is what avoids a build-up of inventory and enables work to flow through the system as capacity allows. Kuroiwa-san, an ex-Toyota manager, used the analogy of a chain of paperclips in a talk at Agile Japan in 2009. Pushing the paperclips will inevitably cause them to pile up, whereas pulling them will result in them moving smoothly.

Applying this to a software development workflow means that upstream work can be made available, but it is the team members’ responsibility to decide when they are able to take it. The act of taking, or pulling, the work, is a signal for the more upstream work to be processed. However, when work is available but not being pulled, then production upstream will gradually throttle down to avoid any pile-up. With a push system on the other hand, work will be scheduled and handed downstream regardless of whether there is capacity to process it or not.

WIP has an impact on productivity, inventory, and teamwork, and by being aware of WIP, and reducing and limiting it, we can improve a kanban system.

Productivity can be measured in terms of cycle-time and throughput of valuable units of work. Cycle time is the length of time to complete a process and throughput is the amount of output from a process in a given period of time. Cycle time and throughput are both improved by decreasing WIP. A simple example of this effect is CPU load, where application performance goes down as CPU load increases. The effect can be explained by looking at Little’s Law for Queuing Theory:

Cycle Time = Number of Things in Process / Average Completion Rate

Little’s Law tells us that to improve cycle time, there are two options: reduce the number of things in process or improve the average completion rate. Of the two, reducing the number of things in process is the easier, and once that is under control, then the more challenging changes to improve completion rate can be applied.

A further understanding can come from Traffic Flow Theory:

Flow = Speed * Density

Traffic jams occur as traffic density increases and traffic speed decreases. However, when traffic density decreases, speed only increases to a point (which should be the speed limit). As a result, there is a point at which decreasing WIP below a certain density will reduce throughput.

Another factor in improving cycle time and throughput is that of multitasking. Reducing multitasking is beneficial for two primary reasons. Time is lost to context switching per task, so fewer tasks means less time lost. Gerald Weinberg, in his book Quality Software Management: Systems Thinking [7] suggests that 20 percent of time is lost per additional task. Thus one task can consume 100 percent of time available, two tasks will consume 40 percent of time available each with 20 percent lost to context switching, three tasks will consume 20 percent of time available each with 40 percent lost to context switching, etc.

Performing tasks sequentially yields results sooner.

Dr. Eliyahu Goldratt introduced the idea of throughput accounting in his business novel The Goal [8]. Throughput accounting suggests that the business goal is to make a profit, and that this is determined by work in process, operating expense, and throughput. Profit is increased by decreasing work in process, decreasing operating expense, and increasing throughput.

Any features we have developed but not yet released, can be considered inventory. Therefore, as well as helping to improve cycle time and increase throughput, limiting work in process also helps to increase profit by reducing inventory. In his keynote at Agile2009, Alistair Cockburn also introduced the idea that, for software development, the unit of inventory is the unvalidated decision [9]. By limiting WIP we are focussing on getting feedback on fewer decisions sooner.

Finally, by having fewer work items in process, the team is able to focus more on the larger goals and less on individual tasks, thus encouraging a swarming effect and enhancing teamwork. Limiting WIP like this can seem unusual for teams, and there is often a worry that team members will be idle because they having no work to do but are unable to pull any new work. The following guidelines, in priority order, can be useful to help in this situation.

  1. Work directly on existing work to progress it
  2. Collaborate with team members on existing work to remove a bottleneck
  3. Begin working on new work if capacity is available
  4. Find some other useful work

When team members have to find some other useful work then “bubbles of slack” are formed around the work. This creates opportunities for improvement without needing to schedule them with techniques such as "gold cards." This can be work that won’t create any work downstream but will improve future productivity and can be paused as soon as existing kanban slots become available. Investigative work such as technology spikes, refactoring or tool automation, and personal development or innovation type work, are all activities that might help the team in the future.

Cadence
Cadence is the mechanism that teams use to establish a reliable and dependable capability. A consistent cadence demonstrates a predictable capacity and gives some confidence in coordinating the upcoming work when it is being triggered rather than scheduled.

Vanilla agile timeboxing is one specialized form of cadence. It is a metronomic cadence with a single tick. All the main process events are based around this single tick, which occurs on the timebox boundaries. In addition, the unit of work, commonly user stories, should be small enough to be scheduled into the timebox and subsequently completed in the same timebox. However, while user stories in process can be limited within a timebox, they don’t always fit into one exactly. Further, while releases can occur at the end of each timebox, user stories are only potentially shippable product increments, but may not be coherent product increments.

The various events can be decoupled, however, such that they happen separately at different rhythms. This creates a polyrhythmic cadence, more like a drum circle, where each drum represents a different event. The rhythm is more complex than the single tick of a metronome and can be more varied. Units of work can be larger minimal marketable features (MMFs), which while needing to be as small as possible, are not constrained be being required to fit into a timebox. Instead, an MMF is able to flow over a number of process events while it delivers a releasable coherent product increment. Prioritizing, planning, reviewing, retrospection, and releasing all still happen regularly, but because they are de-coupled, they can happen independently, at differing rates, which may provide more freedom in creating a natural process.

A cadence is usually "harmonic," in that there is a neat overlap between the different rhythms and generally keeps a regular "time signature" to create consistency. However, it does not have to be, and a look at some definitions of cadence from dictionary.com can show why.

  • In music, the ending of a phrase, perceived as a rhythmic or melodic articulation or a harmonic change or all of these; in a larger sense, a cadence may be a demarcation of a half-phrase, of a section of music, or of an entire movement
  • Music. A progression of chords moving to a harmonic close, point of rest, or sense of resolution.
  • The flow or rhythm of events, esp. the pattern in which something is experienced: the frenetic cadence of modern life.

Thus cadence is what gives a team a feeling of demarcation, progression, resolution, or flow. It is a pattern that allows the team to know what they are doing and when it will be done. For very small or mature teams, this cadence could by complex, arrhythmic, or syncopated. However, it is enough to allow a team to make reliable commitments because recognizing their cadence allows them to understand their capability or capacity.

The appropriate cadence for a team will be influenced by their transaction and coordination costs. Transaction costs are those associated with performing an activity. For example, the cost of making a release is a transaction cost. Coordination costs are those associated with the logistics of an activity. For example, the cost of getting people together to manage a release is a coordination cost.

Thinking in terms of transaction and coordination costs can provide the basis for establishing an appropriate cadence for the various events such as prioritization, planning, reviewing, retrospection, and releasing. Focusing on reducing these costs can subsequently allow the cadence to change as delivery capability improves.

The end goal of reducing costs and improve cadence is to be able to quickly, reliably, and frequently release valuable software. In doing so, we can help to further reduce costs. David Anderson uses the example of over-ground and under-ground trains. Over-ground trains, which run less frequently, tend to require more planning by looking at a timetable and travelling to the station at the right time to avoid unnecessary waiting. Under-ground trains, which run more frequently, tend to require less planning because it is safe to turn up and catch a train quickly. Thus by releasing quickly, reliably, and frequently, we can reduce the need for much of the planning overhead.

Once a cadence has been established and a delivery capability understood through measuring cycle-time and throughput, then reliability can be achieved. Rather than making promises or setting targets, information can be given about the likelihood of certain timescales. When a team pulls in a piece of work, it is able to forecast that it should be delivered within a known time period, with known percentage likelihood, based on historical performance. For example, a team can measure its capability to deliver features within fiftenn days, 80 percent of the time.

By releasing frequently, to a known cadence, with a well-understood capability, a team can build trust that it is delivering to its full capacity.

Learning
Learning is how a team constantly develops a kanban system’s capability to meet its purpose. A kanban system should continuously improve to create an economy of flow, rather than an economy of scale, with the ultimate goal being to eliminate the kanban system.

In their book Learning to See [10], Mike Rother and John Shook use the phrase “Flow where you can, pull when you must.” A kanban system allows the work to be pulled, but in order to really achieve flow, the team members should be always looking for ways to keep the work moving, rather than keeping themselves busy.

One approach to continuous improvement is to reduce WIP limits. When a kanban system appears to be working smoothly, lowering a WIP limit is analogous to lowering the waterline. It will expose the rocks, and new bottlenecks and constraints will be discovered. As a result, teams can work to remove the new bottlenecks and constraints until work is flowing through the system smoothly again.

Another approach to continuous improvement is through retrospectives and other spontaneous change events (sometimes known as kaizen). When teams naturally refine and grow their capability, they often discover that they consistently have free space on their kanban board. This is a sign that they can retrospectively lower their WIP limits as a result of an improvement.

These two approaches can be related to the states described by Mihalyi Czikszentmihalyi in his book Flow: The Psychology of Optimal Experience [11]. Pre-emptively reducing a WIP limit is equivalent to moving a team through a state of anxiety, where the skills required are greater than the current ability. Retrospectively reducing a WIP limit is equivalent to moving a team through a state of boredom, where the ability becomes greater than the skills required. Both paths are valid and can be used in context.

Czikszentmihalyi’s description of flow is just one model that can be used to help us understand kanban systems in order to learn and improve. Applying different models, such as Theory of Constraints, Queuing Theory, and Network Theory can give different perspectives and insights that help us continuously improve

Implications
Viewing kanban systems from these aspects creates a meta-language to help describe and think about any process. Kanban is not a methodology, but something that can be applied to an existing way of working to understand it from the perspectives of workflow, visualization, work in process, cadence, and learning.

As a simple example, it is possible to describe the typical agile timebox in terms of limiting work in process. Don Reinertsen gave me the analogy of a bucket of water as being a container for work in process. If the bucket is being continuously filled with water, then there are two approaches to avoiding the bucket from overflowing. The first is the equivalent of a timebox. If we know the rate at which the water fills the bucket, then we can set a cadence to empty it before it overflows. The second is the equivalent of setting explicit WIP limits. If we have mechanism to signal when the bucket is nearly full, then we can use that to empty it before it overflows.

These aspects can be used as levers, adjustable in either way, to tune a process. This is a different approach to describing a process in terms of practices that are more like knobs to be dialled up to ten (or eleven). The current configuration of the levers can be used to describe the current location of a team’s process on its journey of continuous improvement, a bit like a trail marker identifies a location on a forest path.

Having a wide range of configurations of processes, using these aspects of a kanban system, means that we can employ different processes in different contexts, and work to improve those processes as we improve the underlying contexts. Using a ski slope metaphor, we can begin with a “nursery slope” process for an immature team or organization that requires lots of safeguards in place due to low skill level, and, over time, move the team toward an “off piste” process when the team or organization is very mature and requires much less safety due to its high skill level.

Being able to begin with a “nursery slope” process and move toward an “off piste” process creates an evolutionary style of introducing change. This is in contrast to a revolutionary style of jumping straight into the implementation of a new process. An evolutionary approach is appropriate for contexts where there is strong resistance or where a revolutionary change will highlight more issues than it is possible to resolve effectively.  Large enterprises, with legacy technologies, complex architectures, and political silos, may struggle to make the leap to a having multi-skilled, cross functional teams delivering production code every few weeks.

Whatever approach is taken, it should be remembered that method is only a means to achieving purpose and measuring capability toward that purpose. Rather than focusing on being lean or agile, which may (and should) lead to being successful, we should focus on becoming successful, which will probably involve being lean or agile. The end goal is to be successful, and a kanban system is a means to that end, not an end in itself. To finish with a quote from The Toyota Way [12] by Jeffery Liker, “kanban is something you strive to get rid of, not to be proud of.”

References

[1] Ohno, Talichi. Toyota Production System: Beyond Large-scale Production.

[2] Seddon, John. Freedom from Command and Control: A Better Way to Make the Work Work.

[3] https://www.youtube.com//watch?v=wPFA8n7goio

[4] http://www.exampler.com/testing-com/writings/marick-boundary.pdf

[5] Pink, Daniel. Drive: The Surprising Truth About What Motivates Us.

[6] Tufte, Edward. The Visual Display of Quantitative Information.

[7] Weinberg, Gerald. Quality Software Management : Vol. 1 : Systems Thinking: Systems Thinking.

[8] Goldratt, Eliyahu. The Goal: A Process of Ongoing Improvement.

[9] http://alistair.cockburn.us/get/2754

[10]  Rother, Mike and John Shook. Learning to See: Value Stream Mapping to Add Value and Eliminate Muda.

[11] Cslkszentmihalyi, Mihaly. Flow: The Psychology of Optimal Experience.

[12] Liker, Jeffrey. The Toyota Way: 14 Management Principles from the World's Greatest Manufacturer.

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.