Four Paths To Predictable, Repeatable, Reusable Test Data

Better Software Magazine
Volume-Issue: 
2010-04
Summary:

Modern applications operate in highly integrated environments, and critical systems rely on massive amounts of data that likely contain sensitive information. Discover useful strategies for preparing your baseline, handling interfaces, designing input data, and planning for output results.

Designing test data is difficult work. You must understand the requirements, design the test cases, and establish an envi­ronment in which test data can be created, maintained, and reused efficiently. However, I’ve found the rewards can be spectacular: 90 percent productivity improvement for manual testing, increased test coverage, and foundational support for test automation. 

Many testers spend the majority of their time locating the data they need for every test case or creating or conditioning the data if it is not available. Test data is typically taken from a copy or subset of production data that is frequently modified, making the contents unpredict­able and the test data unreliable. The result is that only a fraction of the possible tests can be executed in the time allocated. However, with an inventory of well-designed and readily avail­able test data, testers can execute orders of magnitude more tests in the same time.

Test data can be organized in four categories, each of which requires a unique creation strategy. If you address each category correctly, your test data will be predictable, repeatable, and reusable.

Baseline
One approach to creating the ideal test environment is to start with a set of empty databases so that all data is loaded under strict control. My experience is that this is not effective. Gone are the days when databases contained data only as numbers and character strings. Today, in addition to that data, databases are complex repositories of procedures, pointers, and rules. Except in the case of an initial implementation of a new application, trying to build a database from scratch for each testing cycle is fraught with difficulty. 

The most common source of baseline test data is production data. That data represents reality and is generally available in large volumes. But, production data changes constantly. If the test environment is refreshed every time there is a production database change, then test repeatability is lost. 

The best strategy is to start with a copy or subset of production data and then condition it to meet your needs. First, you must assure that no confidential or sensitive information is con­tained in the test database. Many laws and regulations require that names, addresses, social se­curity numbers, account numbers, medical information—basically anything that can tie a real person to his private information—cannot be revealed to anyone without a need to know (and testers don’t need to know). This typically requires scrambling or obfuscating protected fields. 

One key to efficiency is to perform this obfuscation process as infrequently as possible. Ex­tract updates of production data only when there is no alternative, because it means you have to repeat the conditioning of your test data. 

Next, you must develop a strategy for aging the test data. This means keeping the same rela­tive difference between any stored dates and the system date. This is important because many business rules use time: birth dates, expiration dates, shipping dates, due dates, and so forth. You must either change the system date back or roll the stored dates forward; otherwise, incor­rect business rules could be triggered, making the test outcome unpredictable. 

So, how do you keep your test data current? By applying changes to the data in small batches but keeping the majority of the contents fixed. Treat your test data baseline like the valuable asset it is by continually adding new content that supports expanding test coverage. 

Finally, establish an archive-and-restore procedure to enable you to preserve and later re­fresh the baseline. You may want more than one version of the baseline to allow you to ad­vance the data to the next state, so testing can

About the author

Linda Hayes's picture
Linda Hayes

Linda G. Hayes is a founder of Worksoft, Inc., developer of next-generation test automation solutions. Linda is a frequent industry speaker and award-winning author on software quality. She has been named as one of Fortune magazine's People to Watch and one of the Top 40 Under 40 by Dallas Business Journal. She is a regular columnist and contributor to StickyMinds.com and Better Software magazine, as well as a columnist for Computerworld and Datamation, author of the Automated Testing Handbook and co-editor Dare To Be Excellent with Alka Jarvis on best practices in the software industry. You can contact Linda at lhayes@worksoft.com.

Upcoming Events