Revisiting Refactoring

[article]

Refactoring is one of the cornerstones of the technical agile development practices. It is the mechanism that allows the design and architecture of a system to evolve over time. It is one third of the red-green-refactor loop and the core of test driven development (TDD). But does it really deliver on its promises? If you and your team are diligent writing tests and refactor mercilessly will your software evolve well and easily? Is the cost of refactoring always small and affordable? Refactoring is not a silver bullet, and sometimes is painful and expensive, so we cannot rely on always refactoring with a limited cost. However, any design, no matter how appropriate today, will be inappropriate tomorrow as the requirements change, and refactoring is our best tool to evolve our designs and architectures.

Refactoring Defined
To "refactor" code means to change the structure of the code without changing its behavior. In all but the most simple of programs, this practically means that there are tests that act as a ‘safety-net' to let you know when your design changes have changed behavior. Refactoring cannot practically occur without tests. Therefore all teams that want to change/evolve their design must have automated tests to enable the refactoring.

So why refactor? In essence, we refactor to make things better. When the design no longer meets the problem at hand - whether it is because the requirements have changed or we have a better understanding of the problem - it helps to change the design to better match the problem. This makes the code more readable for others and therefore easier to understand, and readability is important since we spend much more time reading code than we do writing it.

Theory: Refactoring Leads to Good Design
Refactoring, as mainly used in the Agile community, is almost always found within the red-green-refactor loop of Test Driven Development (TDD). Write a failing test (red), make it pass (green), then clean up the code (refactor). It has been claimed that this simple process will lead to designs and architectures that evolve into just what you need. It has been claimed that if you build for only what is required for the requirement at hand that you will be able to add the missing functionality later by refactoring at a small cost. But is this necessarily true?

No it is not - not always. In fact, evolutionary designs and architectures can be just as bad if not worse than upfront designs and architectures. Teams - especially large ones - have a tendency to reinvent the wheel in different parts of the systems as different solutions to similar problems evolve.

Moreover, refactoring is a greedy algorithm, [i] which means that we make locally optimum choices - via YAGNI—You Ain't Gonna' Need It [ii]—with the hope of finding the global optimum. Greedy algorithms are notorious for getting stuck in local minima. Likewise, refactoring keeps your code and design clean - but not optimal. Every once in a while large refactorings need to be made, and they are not easy.

Large Refactorings are Hard
Large refactorings are significantly more difficult than the default refactorings that have been catalogued. They should not be underestimated. For example, retrofitting fine-grained security control in a system that was not designed with security in mind will be a time consuming and difficult task. But they can be done if a safety-net of tests is written for the application.

If, on the other hand, there were no tests then this type of retrofitting would be significantly more expensive (if not impossible). This is why many traditional applications - written without a safety-net of tests - are full of duplicate code; developers cut-and-paste code instead of modifying the design because of the prohibitive costs and dangers of doing so.

So it seems we're damned if we do and damned if we don't. What is the solution? Since designs do become stale and we do tend to over-generalize, we should always have a safety-net of tests to allow us to respond to the changes over time. At the same time, we need to get away from pure YAGNI. There are times where we know with a large degree of certainty the requirements of the application upfront - the larger the degree of certainty we have in the overall requirements, the greater amount of initial design needed .

Different Viewpoints
Here are some of the current viewpoints regarding the effectiveness of Refactoring and TDD:

Jim Coplien, in "Religion's Newfound Restraints on Progress," takes aim at testing and test driven development. [iii] Since Refactoring is an integral part of TDD, the comments on TDD implicitly apply to refactoring Coplien states:

Integration and system testing have long been demonstrated to be the least efficient way of finding bugs. Recent studies (Siniaalto and Abrahamsson) of TDD show that it may have no benefits over traditional test-last development and that in some cases has deteriorated the code and that it has other alarming (their word) effects. The one that worries me the most is that it deteriorates the architecture. And acceptance testing is orders of magnitude less efficient than good old-fashioned code inspections, extremely expensive, and comes too late to keep the design clean.

... TDD, about engaging the code. TDD is a methodology. TDD is about tools and processes over people and interactions. Woops. ...

James Shore, in "The Agile Engineering Shortfall," acknowledges that there is a shortfall of TDD when taken by itself as a practice. However:

There is an agile engineering shortfall. Short planning horizons necessitate alternative engineering practices. The problem isn't with the practices, though--good agile engineering practices exist and work well. The problem is with agile methods that don't provide agile engineering practices, and with teams that adopt a small subset of the agile engineering practices (typically, just TDD). It's unfortunate, but no surprise, that they run into trouble as a result.

Kent Beck, in his new book Implementation Patterns, writes:

Three values that are consistent with excellence in programming are communication, simplicity, and flexibility. While these three sometimes conflict, more often they complement each other. The best programs offer many options for future extension, contain no extraneous elements, and are easy to read and understand.

... Flexibility can come at the cost of increased complexity.

... Choose patterns that encourage flexibility and bring immediate benefits. For patterns with immediate costs and only deferred benefits, often patience is the best strategy. Put them back in the bag until they are needed. Then you can apply them in precisely the way they are needed.

Of course, to put the new functionality in later, when it is needed, you will need to change the design - i.e. refactor.

Conclusion
Refactoring is a greedy algorithm - it reaches local minima. Large refactorings are very expensive. They are many times prohibitively expensive. Therefore pure reliance on refactoring without some upfront design and architecture is costly and suboptimal.

At the same time, no matter how good an upfront design and architecture is, it will grow out of date and be a bad one. The development community is not a fortune-telling community; traditional upfront design and architecture over-generalizes in the wrong places. Generalization and flexibility by design is costly and does not come for free. A pure upfront design technique always fails over time.

As of yet, there is no silver bullet. We know that designs and architectures will change as they fail to meet the requirements of a changing world - so we will have to refactor. At the same time, we need to realize that not all refactoring is easy or cheap. Where does that leave us? It leaves us with doing the best we can; individually finding the balance of YAGNI and upfront design. It depends on each one of us individually.

Good judgment comes from experience - and experience comes from bad judgment. Refactoring is only as good as the developer performing the refactoring. For that matter, design and architecture are only as good as the developer creating them. The difference is, that with refactoring we have an option of learning from our mistakes and changing the design. It is not perfect - and we should be aware of its shortfalls - but it is better than the alternative.


About the Author

Amr Elssamadisy is a software development practitioner at Gemba Systems, helping both small and large development teams learn new technologies, adopt and adapt appropriate Agile development practices, and focus their efforts to maximize the value they bring to their organizations. Gemba focuses on issues such as personal agility, team-building, communication, feedback, and all of the other soft skills that distinguish excellent teams. Amr's technical background and experience (going back to 1994) in C/C++, Java/J2EE, and .NET, allows him to appreciate the problems of and support development teams 'in the trenches.' Amr is also the author of Patterns of Agile Practice Adoption: The Technical Cluster, an editor for the AgileQ at InfoQ, a contributor to the Agile Journal and a frequent presenter at software development conferences. 


[i] http://en.wikipedia.org/wiki/Greedy_algorithm

[ii] You Ain't Gonna Need It (YAGNI)  tells you, the developer, not to design for any requirement that is not at hand. It tells you that if you write tests diligently (usually via TDD) you will be able to refactor the code later and add the necessary complexity.

[iii] http://www.artima.com

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.