Considering the working range of processor utilization, linear models can be often used instead of queuing models for the modern multi-processor machines (it is less so for single-processor machines). If there are no bottlenecks, throughput (the number of requests per unit of time) as well as processor utilization should increase proportionally to the workload (for example, the number of users) while response time should grow insignificantly. If we don't see this, it means that a bottleneck exists somewhere in the system and we need to discover where it is.
For example, a simple queuing model was built using TeamQuest's modeling tool for a specific workload executing on a four-way server. It was simulated at eight different load levels (step 1 to step 8, where step 1 represents a one hundred-user workload and each next step added 200 users, so that finally step 8 represents 1,500 users). Figures 1, 2, and 3 show throughput, response time, and CPU utilization from the modeling effort.
Figure 1: Throughput
Figure 2: Response time
Figure 3: CPU Utilization