Tracer bullet software development (TBSD) is a method based on the concept of tracer ammunition, which allows a shooter to follow the path of a bullet toward a target and adjust his aim as needed. Software tracer bullets combined with canned data (hard-coded data returned from an API that simulates production data) allow you to begin product demonstrations very early in the development cycle. TBSD is also an effective way to build scalable software systems that are easily refactored and tested, and it decouples your teams’ efforts, allowing them to work in parallel.
I once heard a story about a large retailer and how it discovers what sells. The retailer buys anything that might sell, puts it on the shelves, and waits to see what happens. The idea is similar to the World War II battleships. They didn’t spend hours aiming their first shot from the big guns. They fired a shot, studied how much it missed the target, adjusted, and fired again. Usually the third shot hit the target. That’s the same idea as TBSD.
TBSD consists of six steps:
Let's consider each step.
1. Identify System Objects
When we identify system objects, we’re trying to define the major parts of our system. We want these objects to be fairly large, but not enormous. They're usually some variant of client, server, data analysis, data access, and database, which maps loosely to the well-known Model-View-Controller paradigm. Our goal is to identify the large, granular system objects that encapsulate major system functions and can be cleanly isolated from one another.
We don't want to get overly specific at this stage, which is why these are large objects. On the other hand, getting too large and putting "data analysis" in the same system object as "login" can result in clunky designs and poor performance, so I usually put intensive data analysis into its own object.
Most system objects share a collection of well-defined attributes:
System Objects Can Stand Alone
I prefer to consider each system object its own box and then add an interface and networking layer to enable the objects to communicate with each other. Why? It enforces developer discipline. On a Friday night when we want to go home but we need some data from the database, most of us will go straight to the database instead of adding the proper code to our data access layer. Sure, we know it's a hack, but it gets us home more quickly. Over time, these quick "fixes" create a messy code base that is very brittle. However, if we do things properly, when the database code is rewritten or the schema changes, only the data access layer will require modification. But in a brittle system where developers have bypassed the database encapsulation layer, changing the database schema will break the entire system. We used to call this "spaghetti code." Today's term is "big ball of mud." Either way, we want to avoid this maintenance nightmare at all costs.
When each system object is a separate object, we can't cheat. If the only way to the database is through the data access layer, then we’ll ensure that it will always be used.
System Objects Have Recognizable, Cohesive Functionality
Just like traditional object-oriented code, system objects encapsulate a set of cohesive functionality. Some system objects handle data analysis; others handle database storage; others handle the user interface. Some are wrappers around pieces of complex functionality.
System Objects Have A Clean Boundary
System objects are defined by their interfaces, called APIs. In order to create a set of APIs, you must define two aspects: what the system object does and how to access that functionality.
2. Propose Interfaces
At this point, we’re not talking about Java or C# interfaces. First, we must define the basic types of information passed into or out of the systemobjects. A clientmight need an application server interface called login or upload_a_data_set. The data access layer would have interfaces like fetch_a_data_set. The interface points between two system objects would be discussed, negotiated, and defined by the teams responsible for the two objects.
This step often is a series of negotiations. The client teammight ask for an interface to retrieve some data, but the application server group might point out that the data could easily be fetched by a slight change to an existing interface. Each team tries to find the best way to get the information it needs from the layers above and below it.
Each interface is documented as the system object's API. Each object has its own API that is used by other team members early in the product-development cycle. This creates a clean, usable API, rather than a complicated API that looks good on paper but in practice is difficult to use. As you define the APIs and start to use them, you’ll discover inefficiencies and overlaps in your choices and clean them up as you go. In a normal product cycle we don’t get this type of feedback until our code is in use—when it's usually too late to change what’s already in the field.
3. Implement Interfaces
Implementing the interfaces means putting "just enough" code behind each API to make it appear real. For instance, the fetch_a_data_set interface in the data access layer looks like this:
return new_data_set(1, 2, 3...., 99);
As another example, when the client calls the login interface on the application server, we'll return just enough information to make it appear we have really logged in to the system. Perhaps we return true or a security token. Our client develops code against the API, and it appears to work.
As we proceed, the system looks like it is entirely composed of mock objects—fake objects created by a programmer to copy the behavior of real objects. Each system object has a complete set of APIs that all return canned data.
4. Connect Interfaces
When each interface returns canned data, the system appears to run. The system objects now invoke objects through their interfaces as if the system were live. It is important to note that this is a huge milestone. Now, each team can use system objects as if they were complete. This allows each team to move forward independently.
Sometimes a team may need a specific type or format of data, but that's no problem; simply ask for richer canned data. If a client team is working on a specific type of graph, like a scatter plot, it might need some randomized data. In listing 1, the interface returning the canned data adds some simple branching logic to return more than one set of canned data.
I've built interfaces that returned five or six different sets of canned data. Remember, our goal is to enable each team to operate as if it were coding in a production environment long before the database schema has been stabilized or real data is populating that database. The sooner we can run the system in a production-like environment, the sooner we’ll identify several classes of problems. We'll find that our numerical analysis libraries aren't good enough for the customer, our communications layer can't handle our data sets quickly enough, our new graphing library just doesn’t look good, or that we missed an interface. These are problems that often are not found until you run the code against production-level data, but our canned data can get us very close to this ideal.
By creating our system objects and letting them interact, we prove that our communication layers actually work together. If your team decides to have a Visual Basic client talking to a Ruby on Rails server with a NetKernel Java data analysis back end, you'll know at this point whether these technologies can communicate with one another the way you thought they would.
If possible, put your code into a production operating environment at this step. Discover problems with firewalls, round robin DNS, and load-balancing routers. Put your servers on different subnets, run different operating systems, or separate them by a low-bandwidth WAN. Again, the idea is to catch issues before you've coded them too deeply into your product.
A friend of mine was on a team that worked on a project for nearly two years before discovering that the client-server CORBA communications layer had to communicate through a firewall. This became a huge problem for this project, and team members had to rewrite large portions of their code. With TBSD, we’ll catch these issues earlier by putting our tracer bullet code into a network topology that looks like our production environment and watching what works and what breaks. I love the Pragmatic Programmers' idea "Fail early." Here we apply that concept to our entire system, not just to its code.
We'll find algorithmic bottlenecks as well. It's much better to realize—before the final round of testing—that our code, or a third-party analytics library, is too slow. I'd rather discover it in the first few weeks of a project so I can focus on getting the performance I need. This approach shouldn’t be abused and used for a premature optimization boondoggle, but to recognize that a given library or approach to a problem is unacceptably slow. Sometimes this testing will discover that a third-party scientific library isn't producing acceptable results. Catch these incompatibilities early, when it’s still relatively easy to get the correct libraries in place.
5. Add Functionality
Now we'll start replacing our canned data with real code—one interface at a time. In our data access layer, we'll gradually put in code that talks to the real database and returns real data as shown in listing 2. Leave the canned data in place as long as you need it. In fact, leaving it in production code isn't a problem as long as everyone knows the name of the canned data. You can use that data for later testing if you need it.
Be very careful not to change your interfaces as you add the real code. The interface is your contract with your teammates. That specific API is what everyone agreed to use. If something needs to be modified, discuss it with your teammates first. Never let a compile or test failure be your team’s notification that you changed an interface.
6. Refactor, Refine, Repeat
Tracer bullet software development is an iterative process. Don't try to get everything right the first time—it won't happen. It takes far too long to aim perfectly, so instead take your best shot, and get the team moving ahead. Then, as you start to realize where you can improve, make another pass through.
Sometimes We Adjust Our System Objects
I once worked on a project in which we had a "server" object. We quickly realized that having login and data_analysis routines in the same system object was a mistake. The data analysis was swamping the entire server, and the login calls were timing out. We quickly split our "server" object into two objects: an "application server" and a "number cruncher." Later, this decoupling helped us scale the project across machines.
On another project I went too far in the other direction and defined too many system objects. They were overly granular, and we were wasting too much time defining the interface points. After the second day of documenting the APIs, we realized that it was too painful, so we stepped back and rolled eight system objects into one.
By staying lightweight and changing often, you help eliminate the need to hit the target perfectly the first time through. If we know that adding or removing system objects will happen frequently, then it's OK to get "close enough" to the target with our first attempt. If we think our first set of system objects can’t change, then we'll spend days or weeks trying to get them perfect the first time.
Other Times, We Rewrite The Functionality Behind An Interface
When the code runs too slowly or incorrectly, we have to rewrite it completely. But as long as the API doesn't change, the adjacent system objects shouldn't be aware that anything is different and should continue to work correctly.
For example, we might retool a system object by adding a cache. In most cases the interface doesn't change at all. The code just runs much faster. Remember that these sorts of benefits are only available because each system object is truly independent. But if one system object reaches into another object's implementation, changing the code will break other system objects and the entire system will fail.
Sometimes We Change Our Interfaces
We're not always going to get the interfaces right the first time. The best example is when we coded our login interface and forgot that the interface needed more than a user name and password. Sometimes it also needs a Windows NT authentication domain name. When you spot something that needs changing, be sure to tell your teammates you're changing it.
As our project moved forward, we realized that the application had to process a lot more data than we originally anticipated and that our databases couldn't read the data fast enough. There were millions of very small database lookups taking place. We tweaked, tuned, and cached, but we couldn’t read the data from the disk fast enough to create a responsive client application.
We finally retooled our database access system object using database clustering. It routed all the writes to the main database, but cycled reads among a number of read databases. We used as many as sixteen read databases and then were able to read the data fast enough. However, no code outside of our database access object was changed. The beauty of the cleanly encapsulated system object was that the other system objects calling fetch_a_data_set( String set_name) had no idea how the data was being retrieved. The call just returned the data faster. It was amazing.
But then we noticed that our data analysis code had become the bottleneck. We were returning data faster than it could be analyzed. So we took the same approach and used multi-machines on our number crunchers as well. When a request to analyze the data came in from the client, we split the request into much smaller units and sent one unit to each number cruncher. Because our other system objects were only accessing the code via the defined APIs, no code outside the system object had to be changed.
We were able to take a single machine application and make it run across an arbitrary number of computers for data analysis and for database access. And we didn't have to change any of our core data analysis code. Touching that code could have introduced all kinds of bugs, but tracer bullets let us keep the code untouched.
Tracer bullets are an excellent way to let your teams work in parallel while creating robust, scalable systems that are well-tested. The early product demonstrations you'll be able to make will help you deliver what your customers really want. It’s a great way to write software that you can use right along with your favorite development methodology.
Canned Data is hard coded data returned from an API that simulates production data. As you create canned data, make it as close as possible to your anticipated production data values. Create data that's just as complicated, random, and large as a production data set.
Canned data is used to exercise various parts of your system long before the entire system is in place and functioning, so is must look real. If in production we'll have a data set with a million data points, write a bit of code to generate a million data points and return that in a result set. This will test our communication paths and algorithm implementations long before we've finalized our database schema, created our data access layer, populated the database with sample data, and implemented processing algorithms. Make your canned data look real and you'll catch a variety of problems much earlier in the development cycle.
A mock object is a fake object created by a programmer to copy the behavior of a real object. This can be invaluable when dealing with a scarce resource (like an expensive piece of hardware), a slow resource (like a large database), a changing resource (like a current temperature input), or when the code you want to invoke hasn’t been written yet. When you create a mock object, you copy the object’s method signatures and return canned data. Mock objects run very fast and can be used to create tests that run quickly with very few external dependencies.