You've carefully planned the testing for your product to ensure that it meets the requirements, and you've probably included cases for stress conditions as well. But have you tested how your database application will behave if run on a much slower network than your testing environment? How does your communication system function when running over a network that drops a good fraction of its traffic? What will a user see when accessing your Web page over a slow dial-up link?
You may have created tests for cases where something like a broken network connection occurs between your product's tiers; however, networks usually don't stop abruptly—most often they crawl to a stammering halt over a few hours before dying. Furthermore, poor network conditions usually appear intermittently, sometimes surfacing during peak usage times or when someone in the office twists a cable the wrong way. In any event, you need to ensure that in less than ideal network circumstances your application either continues working or exits gracefully.
Enter NIST Net
You can create such conditions in your testing lab with a network emulation tool called NIST Net from the National Institute of Standards and Testing, an agency of the U.S. Government. With NIST Net, you can control the maximum bandwidth, drop rate, duplication rate, and delay rate between several computers running on a TCP/IP network. In this article we’ll look at a brief summary of how the product works, and how you can use it to create “real-world” conditions in a controlled, predictable manner.
NIST Net, like several other government software products, resides in the public domain and is, therefore, free for the taking. You may distribute, dissect, or use any bit you want for your own purposes. However, if the package doesn't work the way you expect (or at all), you may have no recourse other than to no longer use the software. Happily, NIST Net has performed as well as or better than some commercial products of late.
This tool works with the router in the Linux kernel to manipulate network traffic, so the computer running NIST Net will also serve a router between your main network and the machines whose traffic you wish to control. While the software should work with any Linux distribution, the authors of NIST Net specifically recommend using the Slackware or RedHat distributions of Linux. Currently, NIST Net only works on 2.0.xx Linux kernels—so Slackware 3.9 or earlier, or RedHat 5.2 or earlier, is required. An update to the current 2.2.xx kernels is coming soon. No distribution of NIST Net currently exists for Windows NT or Windows 98.
Using the Software
As you can tell from the installation process (see the "Getting NIST Net Running" sidebar), NIST Net contains three parts: a kernel module, a command line interface, and an X-windows graphical interface. To use the software, you communicate with the kernel module via either the graphical or command line programs. With both systems, you can feed the same information to the kernel module; I prefer the X-windows interface because I find it much easier to use.
You begin the process by specifying a list of source and destination IP addresses or hostnames. For each source/destination pair, you then enter the network characteristics (described in the next section) you want to control. After entering all of your data, you notify NIST Net of your changes by clicking on the "Update" button.
If, at this point, you expect network havoc (as defined by your parameters) and still don’t notice changes, you have encountered one useful (yet sometimes frustrating) feature of the product: Until you turn NIST Net "on," none of your changes will take effect. Unfortunately, the interface shows no visual indication of the on/off state of the product; but the nice part about this behavior is that you can set up a complicated network scenario and turn it on and off with just a button click.
Given a source and destination IP address or host name, you can control the following network characteristics:
Kicking the Tires
The best way to see NIST Net in action is to begin pinging a host on your test network from your main network. By doing so, you will be sending traffic across the Linux router and into the clutches of NIST Net.
For instance, running ping with no changes would result in output like:
Reply from 18.104.22.168:
bytes=32 time<10ms TTL=128
However, introducing a delay of 300 milliseconds would result in output like the following:
Reply from 22.214.171.124:
bytes=32 time 300ms TTL=128
Not very interesting on the fact of it, but consider a typical dial-up connection to the Internet—it has a drop rate of about 5%, and a delay of about 350 milliseconds with a packet delay range of about 300—and you will begin to see a more interesting result:
Reply from 126.96.36.199:
bytes=32 time 500ms TTL=128
Reply from 188.8.131.52:
bytes=32 time 430ms TTL=128
Now that you have a feel of what you can do with NIST Net, study your own software for communication points that you can begin to stress and strain. Obviously, if your testing subject uses a database server, or stores data in shared files, you have a good place to begin testing. For instance, how does your application behave when running over a PPP connection? Most dial-up users will have a connection with at least a 300-millisecond delay (with a standard deviation of about 150 milliseconds, and a drop rate below 10%). Using NIST Net you can create this network in your lab—and find problems such as database connection timeouts and ill-behaving visual controls before your users do. And you can quickly reconstruct the scenario, to verify that the software changes you've made to compensate for poor network conditions work as expected.
Another worthwhile test strategy is reducing the bandwidth to emulate a slower network. A 100-mbs Ethernet network is typical for many technology companies, but it may not be what a significant number of your customers use. Furthermore, the increasing number of slower networks in homes and small offices (some as slow as 1.5 mbs) means something appearing to work very quickly in your lab (pre NIST Net) appears downright lethargic to some customers.
Finally, be vigilant for more subtle communication dependencies: DNS lookups, validation with a security server, or remote procedure calls (DCOM/CORBA). After you take a good look at the various ways that your application is dependent on the availability of network services, you will be surprised at the less-than-robust behavior when you begin to limit access to services you'd taken for granted.
Making a Testing Plan
You may find the best return on investment by creating several scenarios representing a graduated decline in network reliability, and creating several separate scenarios for declining network performance levels (see Table 1). At the same time, decide on the test cases that will be run under the different scenarios.
However, to truly benefit from this (or any) test plan, you need to create it early in the target software's lifecycle. "Designing-in" provisions to compensate for poor network communications early in the engineering lifecycle—rather than cobbling together a solution two weeks into testing—is the best way to create a robust product. In other words, failure states and cases should be thought through and called out in the requirements, not discovered during testing. Since setting up network conditions in NIST Net requires little work, you can use it to test and verify different approaches to error handling during the design phase.
NIST Net Limitations
NIST Net works on the networking layer of the OSI networking reference model, meaning the software only knows about packets—not about the packet’'s payload, or how it is transmitted. If you need to emulate packet corruption, you'll have to write your own tool or look elsewhere. NIST Net also does not work with Novell's IPX/SPX suite; so if your testing plan calls for simulating poor conditions using this protocol stack, NIST Net would not be the solution for you.-
Furthermore, since NIST Net functions on the networking layer, it has a very pronounced effect when working with UDP—but a diminished effect with TCP. Why? Because TCP includes compensation for the poor conditions emulated with NIST Net. Highloss conditions simply result in slower transmission speeds as TCP attempts to retransmit missing packets. Send and-forget protocols like UDP don't provide for retransmission, so NIST Net dropping or delaying packets will have a more noticeable effect; the network protocol doesn't know how to compensate for the missing data. If your testing subject, however, is the protocol stack, NIST Net would be great for testing and verifying that TCP works as expected.
(Does all of this network terminology make you queasy? The network resource information accessible through the WebInfolink feature at the end of this article provides links to help you understand the basics of networking.)
NIST Net includes no reporting of what it did to what packets when, preventing you from correlating an application failure with a specific network failure. Since NIST Net randomly manipulates packets within the supplied parameters, reproducing the exact same sequence of events cannot be done. And without a log from NIST Net you will be limited to conveying the conditions emulated, not the precise sequence of events.
NIST Net is a powerful tool, allowing you to emulate network conditions seldom occurring in your lab but nonetheless prevalent in the real world. NIST Net requires some work to get installed and running—such as reconfiguring your physical network, adjusting your routing tables, and properly configuring a Linux kernel. NIST Net allows you to reproduce the same network conditions easily, so you can reproduce the conditions in which your application fails, easing diagnosis and repair.
Most importantly, you should learn how to use this tool during the early phases of an application's design. That's the best way to ensure the engineering of adequate error detection and recovery before you begin testing.