No Data? Make Your Own!
Perhaps you don’t have access to a good set of real-world data. That means you’ve got to roll up your sleeves and figure out how to create something useful. This can be extraordinarily daunting, particularly if you think you’re going to need huge amounts of data.
Make no bones about it: Rolling your own data will be time consuming. You’ll need to sit down with your entire team and figure out what scenarios you want to create data for, and you need to be extremely lean and realistic about what you’re going to bite off. If your system is new, do you really need to generate a data set simulating five or ten years’ worth of data? No! You don’t even know if your system will live that long, so why try to solve that sort of problem?
Brainstorm realistic data scenarios, and ensure that you’re covering all of the system’s use cases. I once worked on a data creation effort where we initially missed several small but important use cases that had a definite impact on the overall performance. Avoid this by getting multiple team members involved in the exercise, and run back through your list of features to ensure you’re covering everything.
A great source for ensuring you’re covering all those use cases is your functional automation suite, assuming you have one. Many UI-level automation suites are organized around functional areas, so you’ve got a great starting point for your coverage. Those same functional tests are also a good starting point for actually generating or creating your data. Reconfigure your suites so they don’t delete content they’re creating, then start looping through iterations and watch your data set grow. You’ll need to do some post-processing to adjust dates across a realistic spread, but getting an 80 percent solution is better than nothing.
If you don’t have functional tests in place already, then step back and look for the easiest way to create your content. Do you have web services in place that support content creation or stored procedures in your database? Some modest effort with your developer crew will give you some one-off tools capable of building you what you need to spin up your initial data.
Keep in mind that the tools you’re building for this effort only need to be simple, lightweight scripts, batch files, or tiny programs. Moreover, you will likely end up stringing together a series of small tools to do the job for you. Don’t invest huge amounts of time in this. Build just enough to get you moving, nothing more.
One Data Set Isn't Enough
If your system is large or complex enough, you’ll likely want several sets of data in different sizes and shapes. For example, perhaps you’re working with a platform that offers both forums and blogs. Having separate data sets skewed heavily in both directions, plus a balanced mix, can help inform you if you’re running into performance issues in different environments.
Above, I said it’s irresponsible to test with only an empty or nearly empty data set. Note that I specifically said “only.” Actually, an empty database—or whatever your initial default data set is—can give you some interesting results about your system, and it’s tremendously easy to set up!
Mix of Use Cases
Creating a realistic load test scenario means setting up more than just three or four use cases and throwing a thousand virtual users against those scripts. You need to plan ahead and create use cases that model actual system usage as closely as possible.
For existing systems, your web server logs are a great place to get some modeling information for this kind of activity. Your logs will let you know how users are accessing your system. Your load testing tools may even be able to generate load use cases directly from these logs, which is a huge benefit!
If you’re working with a new system without any history of real-world usage, then you’re going to need to go back to your brainstorming work with your team. Think of all the various system routes through which your users might traipse, and keep in mind that many users may start not at your site’s home page but instead at other areas of your site through external sources such as links from articles or search engines.