There is a popular saying: On the Internet, the competition is only a mouse click away. It's a fundamental fact of the Internet that poor performance leads to dissatisfied users, and that dissatisfied users may abandon a Web site and never return to it again. Predicting how a particular Web site will respond to a specific load is a real challenge. Since Web sites are complex systems with hardware, software, and networking components-from many different vendors, and with widely different performance profiles-it's virtually impossible to predict how a given system will behave under load. The only reliable way to gain insight into a system's scalability is to perform a load test in which the volume and characteristics of the anticipated traffic are simulated as realistically as possible.
In this article, I will guide you through the three basic steps necessary to design highly realistic and accurate Web site load tests: understanding the nature of the load, estimating target load levels, and documenting your design.
Understanding the Nature of the Load
The first step in designing a Web site load test is to understand as accurately and as objectively as possible the nature of the load that must be generated.
Unfortunately, it is very easy-even for experienced, well-meaning, and hard-working test engineers-to design and develop load tests that don't even come close to matching real loads. I believe that the primary reason for this is that Internet load testing is a relatively new (and therefore poorly understood and documented) branch of testing. Even if we ignore the recently added Internet component, load testing has always been one of the more esoteric and less covered types of tests, with relatively few experts to go around. Considering the amount of learning and setup that it takes to plan and execute a Web site load test, and that most Internet-based projects have tremendous time pressures, most test engineers are happy to generate any load and get back any results.
However, I firmly believe that if you have already committed the significant time and expense required for load testing, the additional effort required to make it realistic is minor compared to the significant increase in confidence in the results. The first step toward realism is to understand and characterize the anticipated load as objectively and systematically as possible. For this task, your most important resource will be your Web site's log files, and your most valuable tool will be a log file analyzer.
Log Files and Log Analyzers
Log files are a real gold mine for load test design (See "Mining Gold from Server Logs" by Karen Johnson in the January/February 2001 issue of STQE for more on this topic). Every time a visitor comes to your Web site, the details of that visit are recorded (providing that the logging option is enabled on the server). Along with the IP address of the visitor, the log file records the date and time each file or page was accessed, the size of the file or page that was accessed, a code that indicates if the access was successful, and other information about the visitor's browser, operating system, search engine, etc. This gold mine of data, however, is stored in a machine-readable format that must be processed to extract the information you need. Fortunately, there are many public domain and commercially available log file analyzers currently available online; search for "log file analyzer" using your favorite search engine to see which analyzer best fits your needs and budget.
Most of the information you need to extract during log file analysis revolves around user sessions. The concept of the user session, which is defined as a sequence of related page views by a unique visitor, is central to Web site load testing, so let's look at it in some detail.
The typical Web site is visited by a wide range of users with a wide range of objectives. Of all the visitors to an e-commerce site, for example, some may be there to browse, some others to buy, and others to check the status of their order. Even when they share a common objective, like buying a book, different users will behave in different ways. Some will move from one page to another in rapid succession, barely taking any time to read the content; others will take their time with each page and move more slowly and deliberately. Some will browse and read reviews for several books before deciding to buy one, while others will go straight for the purchase page.
Understanding this wide range of actions and behaviors is critical for load test design, since a well-designed load test should replicate these actions and behaviors as accurately as possible. The best way to capture the nature of Web site load is to identify and track, using a log analyzer, a set of key user session variables that are applicable and relevant to your Web site traffic. Some of the variables that you should always track include:
In addition to these basic metrics, however, you also should track any variables that are likely to make a nontrivial difference in the load; use your intuition and talk to the people who developed the system to determine what these additional variables should be.
One can easily suspect that, for example, a seven-page session that results in a purchase is going to create more load on the Web site than a seven-page session that involves only browsing. A browsing session might only involve the serving of static pages, while a purchase session will involve a number of elements, including the inventory database, the customer database, a credit card transaction with verification going through a third-party system, and a notification email. A single purchase session might put as much load on some of the system's resources as twenty browsing sessions.
Similar reasoning may apply to purchases from new vs. returning users. A new user purchase might involve a significant amount of account setup and verification-something existing users don't require. The database load created by a single new user purchase may equal that of five purchases by existing users, so you should differentiate the two types of purchases.
Once you have determined what user session variables you need to track, use the log analyzer to find out the range and distribution of values for these variables. Most log analyzers will automatically report information on basic metrics (such as session length or session duration). For some of your custom metrics, however, you may need to supply your own interpretation of the data. In order to find out the number of sessions that result in a purchase, for example, you might need to leverage the fact that all sessions that result in a completed order include the page "order_confirmation.html." If the log file reveals that out of 5,500 sessions, the order confirmation page was invoked 220 times, you can deduce that 4% of sessions resulted in a purchase and factor that percentage into your test design.
For each of the session variables you decide to track you should, at the very least, know their average value-but it is worthwhile and often necessary to go well beyond that. The average number of pages per session, for example, may turn out to be 4.0, but since that average may be the result of 50% of the users just visiting the home page and 50% of the users visiting seven pages, a load test in which all sessions are four pages long would match the average-but it would not be realistic.
You could use statistical tools such as standard deviation, but I believe that the best way to characterize most Web load testing variables is to use a discrete distribution. In the case of pages per session, the distribution might look like this:
The amount of detail and precision with which you track these variables depends on the complexity of the Web site, as well as the time you have to design, develop, execute, and analyze the results. Due to space constraints, the examples used in this article are necessarily simple, but if you are dealing with a megasite like Amazon.com or Schwab.com you will probably need to track more variables, and track them even more accurately.
Page Request Distribution
One user session variable that deserves special consideration is the page request distribution. This critical metric tells you what pages are being requested, and in what proportions. Developing a page request distribution is a two-step process: first you define a set of page categories, and then you calculate the percentage of page requests in each category.
You need to define a set of discrete page categories-because for most Web sites you couldn't possibly list all the pages, and because in terms of load many pages are equivalent. If all product description pages, for example, contain a couple of pictures and several lines of text, the load they will present to the Web site will be very similar-even if the product they depict is very different.
By the same token, a home page and a purchase page are going to present very different loads. The home page may be a purely static page (and will probably be cached in memory), so it will take the server very little effort to respond to the request. The purchase page, on the other hand, will undoubtedly require database access and, possibly, communication with several other subsystems. A histogram, as shown at the top of Figure 1, is a great way to visualize and communicate page request distribution.
Figure 1, showing our histogram as well as three blocks of related data, is actually an example of my effort to integrate and keep all this information together. To do so, I coined the term and developed the concept of Web site Usage Signature and the (somewhat unfortunate but definitely memorable) acronym WUS. A WUS is a table that allows us to display all the key metrics and data related to Web site usage in a single easy-to-use form.
What to Do If You Don't Have Log Files
Every time I explain the concept and application of the WUS, most people nod their heads in violent agreement but, inevitably, somebody raises the question: "This is great if you have Web logs, but how do you use the WUS on a pre-launch site?"
This is an excellent question; and fortunately, there is a relatively simple answer that satisfies most people. Before you announce the site to the entire world, plan a much smaller beta test release and use the logs from those sessions to come up with a first approximation of your WUS. The WUS from a beta test with as few as 100-200 users will give you a pretty good idea of how people are going to use and navigate your site. Fortunately, most organizations are smart enough to plan alpha and beta tests for functionality and usability testing, so getting some log data should not be difficult.
Estimating Target Load Levels
The next step in designing the load test is to understand the volume patterns, and to determine what load levels your Web site might be subjected to (and must therefore be tested for). There are four key variables that must be understood in order to estimate target load levels:
My metric of choice for measuring load levels is user sessions per unit of time (e.g., user sessions per hour or user sessions per week). I prefer this metric to the many other options, such as concurrent users or page views. The concept of user sessions is easily understood by technical and nontechnical people at all levels of the organization and is tracked by most log file analyzers.
Estimating Overall Traffic Growth
Estimating the growth of the overall amount of anticipated traffic is important, because the magnitude of the peak levels is going to be proportional to the magnitude of the overall traffic. If an overall traffic of, say, 100,000 sessions per week results in a peak of 1,500 sessions per hour, you might expect that an overall traffic of 200,000 sessions per week would double that peak to 3,000 sessions per hour.
Overall traffic growth is typically derived from two sources: historical growth data and sales/marketing projections. In many cases, you'll use both of these components. To estimate this number, take the current weekly traffic (e.g., 100,000 sessions per week) and ask the appropriate people in your company (e.g., CEO, VP of Marketing, VP of Sales) what they anticipate in terms of month-to-month growth. Ask them, too, if they anticipate any unusually high volumes due to special sales/marketing events. From this conversation you might learn that the company anticipates a 20% month-to-month growth-and that a new product launch two months from now is expected to draw an additional 200,000 visitors to the Web site during the week of the launch. Combining this information, you can estimate that the Web site will have to handle a volume of approximately 350,000 visitors the week of the launch (144,000 by applying the 20% month-to-month growth for two months, and an additional 200,000 from the new product launch).
Estimating Peak Levels
Once you have an estimate of overall traffic growth, you'll need to estimate the peak level you might expect within that overall volume. This is necessary because Web traffic is rarely uniformly distributed, and most Web sites exhibit very noticeable peaks in their volume patterns. Fortunately, you can use the ratios between overall traffic and peak number of sessions to make educated guesses about target volume levels. Typically, there are a few points in time (one or two days out of the week, or a couple of hours each day) when the traffic to the Web site is highest. A weather information Web site, for example, may experience its highest traffic on Fridays and Saturdays, as people make plans for their weekend. An online trading site will usually experience its highest traffic around the market's opening and at the market's close.
Suppose you have traffic log data for all of 2000, and you want to estimate when the peak volumes will occur in 2001. The monthly levels will show if there is a clear seasonality in volume pattern (e.g., typical consumer e-commerce sites will peak in the October-December time frame, in preparation for the holidays). Weekly levels, on the other hand, will show if there is a pattern related to days of the week (e.g., online brokers might experience peak volumes on Mondays and Fridays). At the end of this analysis you should compile the data in a volume patterns table, as seen in Table 1.
This volume patterns table makes it pretty clear that for this site-if the volume patterns will be similar for the year 2001-peak volumes will be experienced on Saturdays between 10:00 and 11:00 a.m. in the month of December. You can now use the ratio of peak to average volume to set target peak levels for your load test.
You might also think about tracking and recording lowest volume periods; that information would come in very handy for scheduling a load test on the production site. Running your load tests between 2:00 and 3:00 a.m. on Tuesdays, for example, might allow you to run tests while causing the minimum amount of disruption to your users.
Estimating Peak Ramp-up and Sustain Time
Estimating how quickly target peak levels will be reached, and for how long they will be sustained, is almost as critical as estimating the magnitude of those levels. Let's look at design considerations for two different kinds of sites.
An online stock trading site will experience an extremely sharp peak in usage when triggered by certain timed events, like the opening of the stock market. In a matter of minutes this site could go from handling virtually no trades at all to juggling thousands of trades. A load test for such a site should be designed to ramp up to the target level extremely rapidly-say five to ten minutes. A more gradual ramp-up is not going to duplicate the stresses experienced under real load, and will result in overestimating the site's capacity.
An online clothing retailer, on the other hand, might experience a much more gradual buildup to a lunchtime peak, since its users' actions are not synchronized to a specific event. Load tests for this site will be different from our stock market site-these tests should be designed to ramp up over a period of one or two hours. A faster ramp-up will cause stress levels higher than what the site will experience under a real load, which will result in an underestimation of the site's capacity.
The duration of the peak is also very important-a Web site that may deal very well with a peak level for five or ten minutes may crumble if that same load level is sustained longer than that.
Documenting the Design
Once you have a good understanding of the nature and volumes of the anticipated load, you've already done the bulk of the work-and you can design the load test with ease, confidence, and lots of information to justify your decisions.
The key elements of a load test design are:
Remember that most people these days are too busy to read anything longer than a couple of pages, so you should try to make the design document as simple and direct as possible.
In the test objectives section you should explicitly mention what you are trying to accomplish with the load test, and what actions are going to be taken based on its results. For example:
Load Test Objective The objective of this load test is to determine if the Web site, as currently configured, will be able to handle the 12,000 sessions/hr peak load level anticipated for the coming holiday season. If the system fails to scale as anticipated, the results will be analyzed to identify the bottlenecks, and the test will be run again after the suspected bottlenecks have been addressed.
A test should have some predefined pass or fail criteria associated with it. Unfortunately, most load tests are executed without a clear understanding of what constitutes success or failure. Make sure that you, and everyone else in the organization, discuss what results will be acceptable and document the outcome of the discussion. For example:
Pass/Fail Criteria The load test will be considered a success if the Web site will handle the target load of 12,000 sessions/hr while maintaining the average page response times defined below. The page response time will be measured over T1 lines and will represent the elapsed time between a page request and the time the last byte is received:
Next, you should document the types of scripts you plan to use in the load test. The number of scripts you need will be determined by the complexity of the Web site and your work on the WUS. Since in most cases the user sessions follow just a few navigation patterns, you will not need hundreds of individual scripts to achieve realism-if you choose carefully, a dozen scripts will take care of most Web sites. But choosing them carefully requires some work, since you want this combination of scripts to be able to closely replicate the WUS, especially the page distribution.
The best way to arrive at the right combination of scripts and their relative frequencies is a two-step process. First, you'll make an educated guess about the number and types of scripts that will be needed. Second, you'll set up a spreadsheet to calculate page distributions and other key target variables-using an iterative process in which you adjust the various parameters and, if needed, create additional scripts until you get a distribution that matches your target.
For the first step, each script should be assigned a name and described in terms of how it will navigate the Web site. For example, the sequence Home-->Product Information(1)-->Product Information(2)-->Exit describes a three-page script that visits the home page, then two product information pages, and then terminates. The script names and their description should be listed in a script table for easy reference, as shown in Table 2.
For the second step, the sample spreadsheet shown in Table 3 can be used to calculate the target page distribution based on the types of pages hit by each script, and the relative frequency with which each script will be executed.
The first half of the spreadsheet lists each of the scripts, along with its relative frequency of execution and how many times the script will request each page type. In the second half, we calculate the relative number of page views that we will obtain if we execute this set of scripts with this relative frequency-and then compare that to our target page distribution. You should try to keep the difference between the resulting and target page distribution to 5% or less. For pages that will impose a particularly heavy load (e.g., order entry and credit card confirmation) you should aim for a difference of no more than 1% or 2%, since small percentage differences for these pages will have a major impact on the load.
Finally, scripts should be combined to describe a load testing scenario. A basic scenario includes the scripts that will be executed, the percentages in which those scripts will be executed, and a description of how the load will be ramped up. Table 4 shows a sample scenario for a load test with three script types and four load levels.
If you have the time and resources, you should plan and design multiple scenarios to simulate a range of different peak volumes your site might experience. Make sure that each scenario has its own clearly stated objectives and pass/fail criteria.
In order to be worth anything, I believe that Web site load tests should reproduce the anticipated loads as accurately and realistically as possible. In order to do that you will need to study previous load patterns and design test scenarios that closely recreate them. This is a task that will require a serious amount of hard work, intelligence, intuition, and communication skills. But the extra hours will pay off; this task will also force you to learn new technologies, tools, and ways of thinking that, in the end, will greatly expand your skill set and significantly increase your own marketability and career options.