Latency, throughput implications for the Cloud

The key considerations for any website are latency and throughput. These two parameters are extremely important to web designers as the response time of the web site and the ability to handle large amounts of traffic are directly related to the user experience and the loyalty of returning users.

What are these two parameters and why are they significant? Before looking at latency we need to understand what the response time of the web application is. Ideally this could be defined as the time between the receipt of the HTTP request and the emitting of the corresponding response. Unfortunately any web site hosted on the World Wide Web adds a lot more delay than the response time. This delay comes as the latency of the web site and is primarily due to the propagation and transmission delays on the internet. There are many contributors to this latency starting from the DNS lookup, to the link bandwidth etc.

Throughput on the other hand represents the maximum simultaneous queries or transactions per second that the web application is capable of handling. This is usually measured as transactions-per-second (tps) or queries-per-second (qps).

A good way to understand response time and throughput is to use a oft used example, of a retail store handling customers. Assuming that there are 5 counter clerks who take 1 minute to check out a customer we can readily see that as the number of customers to the store increases the throughput increases from 1 customer/minute to a maximum of 5 customers/minute. Since the cashiers are able to process in 1 minute the response time for the customer is 1 minute/customer. Assuming a 6^th customer enters and needs to checkout he/she will have to wait, for e.g.1 minute, if the 5 counter clerks are busy processing 5 other clients,. Hence the response time for the customer will be 1 minute (waiting) + 1 minute (servicing) = 2 minute. The response time increases from 1 minute to 2 minute. If further clients are ready to check out the length of the wait in the queue will increase and hence the response time. Clearly the throughput cannot increase beyond 5 customers/minute while the response time will increase non-linearly as the clients enter the store faster than they can checked out by the counter clerks.

This is precisely the behavior of web applications. When the traffic to a web site is increased the throughput increases linearly and finally reaches a throughput “plateau”. After this point as the load is increased the throughput remains saturated at this level. While on the other hand the response time is low at low traffic it starts to increase non-linearly with increasing load and continues to increase as it maxes out system resources like the CPU and memory.

When deploying applications on the cloud the latency and throughput are key considerations which are needed to determine the kind of computing resources that are needed in the cloud. Assuming the web application has been optimized and performance tuned for optimum performance what needs to be done is run load testing of the application on the cloud using different CPU instances. For example assume that application is load tested on a small CPU instance. We need to get the response times and throughput plots with increasing loads. Similarly we now need to deploy the web application on a medium instance and plot response times and the throughput plateaus on the medium instances.

Now the choice as to whether to go for a small CPU instance or medium CPU instance can be calculated as follows. Assuming that the requirements of the web application is to have a response time of ‘t’ seconds then we determine the corresponding traffic handling capacity , for the small CPU instance, say ‘c’ and for the medium CPU instance, let’s assume ‘C’. If the web site has to handle to total traffic of T then we determine the number of instances needed in each case. For the

small CPU instance it will be n= (T/c) + 1

and for

the medium CPU instance it will be N =( T/C)+1.

Now we compute the relative costs of the small and medium CPU instances and identify which is more economical. For example if r1 is the cost per hour of the small CPU instance and R1 is the cost of the medium CPU instance we choose

The small CPU instance if r1 *n < R1 *N (per hour)

While on the other hand if R1 *N < r1 *n then we will choose the medium instance.

Hence the determination of which CPU instance and the configuration of the web application on the cloud will depend on appropriate performance tuning and proper load testing on the cloud. Do also ready my other posts on latency namely ‘The Many faces of latency” and “The Anatomy of Latency“.

Also see latency and throughput in action in the following series of posts

– Bend it like Bluemix, MongoDB with autoscaling – Part 1

– Bend it like Bluemix, MongoDB with autoscaling – Part 2

– Bend it like Bluemix, MongoDB with autoscaling – Part 3

Find me on Google+