do some research to get this information. For example, within the AWS cloud, shared resources include the network and the disk subsystem. This means that you will want to pay particular attention to performance metrics that those resources would affect throughout your testing, because other customers who run an instance on your server can affect the availability of those resources and thereby your user’s performance experience.
There are some protective mechanisms that you can put in place to limit the risk of excessive resource utilization from other instances utilized by other customers. For example, AWS offers different levels of I/O, including a very high I/O option (10 GB Ethernet) with its cluster computing instance option. While you may still share that I/O controller with other users, the increased amount of bandwidth you have offers significantly more protection from typical spikes in activity. Additionally, the vendor may offer options that restrict the number of users sharing your server or cluster of servers, possibly including an option in which you are the exclusive user.
The key thing to be wary of is that the cloud is a highly shared platform, which will affect your application’s performance depending on the activity of other users sharing the underlying physical infrastructure. This demand will be largely unpredictable in your public cloud production environment and is an ongoing risk that must be mediated.
Having an extensive amount of performance data in the form of a baseline from testing in the environment will allow you to protect against some of this risk. One of your most powerful tools is to run multiple longevity tests to establish this baseline. A longevity test is a series of tests performed at a steady and controlled load rate over an extended period of time (eight hours or more) that will allow you to closely monitor the behavior and performance data of your application. Although you are monitoring your application behavior and performance metrics during these tests, you are also indirectly monitoring the behavior of the other users who are sharing server resources with you. For example, if you have lengthy baseline data of a certain performance metric that suddenly spikes higher for a given period of time and then levels back off, you may have just indirectly measured the effect of another user taking up a significant amount of network bandwidth or briefly utilizing a large portion of the disk subsystem. As long as your application performance metrics have not exceeded their performance requirements during the longevity test, you can reasonably assume that, over time, this activity will average out and continue to result in acceptable performance of your application.
Of course, you will not be able to definitively determine that any brief periods of degradation of your application under load are caused by another user’s spike in activity, but that is why you need to run many longevity tests and build up tens of hours of raw performance data over an extended period of time. With all of this data, you should be able to conclude that any net effect of other users’ activity on your shared cloud resources will be of minimal consequence to your application’s performance. The key here is having the collective visibility and analysis over an extended period of time from which to draw these conclusions.
Every vendor will have different virtual instance types from which to choose, so research how your vendor defines its types. The instance types on AWS are further categorized into instance families, so there are two dimensions to consider. Grouping types into families allows for fine-grained selection of computing power