Profiting from a cloud deployment

Cloud computing does offer enterprises and organizations a mixed bag of goodies. For one it provides for a utility style computing, the ability to grow and shrink with changing loads, zero upfront costs etc. The benefits of cloud computing are many but does it all add up to profit for an enterprise? That is the critical question that needs to be answered.

This post will take a look on what it takes for a cloud deployment to be profitable for an organization.

The critical parameters for any web application are latency and throughput.  A well designed web application whether it is an e-retail site or an ad serving application will try to minimize the latency or response time while at the same time maximizing the throughput of the application. For any application while the latency can be kept within specified limits the throughput will tend to plateau at a certain level and will not increase with increasing traffic. Utilizing a larger instance can improve the throughput plateau slightly. In any case the reality is that throughput tends to flatten as the traffic is increased.

A typical cloud application will be made of several compute instances, database instances, DNS services etc. Cloud usage is billed by the hour. Hence we can represent the cost of a cloud deployment as follows

Cost (cloud deployment) = m * compute instance + n * database instance + o * network bytes + P

Where P = cost of DNS + Elastic IPs + other costs.

This can be represented by the formula

C = a * D * t

where C = cost of cloud deployment

D = costs per hour of the deployment

and ‘a’ is some arbitrary constant and ‘t’ is the time

Let us assume that for the cloud deployment we get a throughput of T.

The revenue for a web application whether it is an e-commerce site, an e-ticketing site or an ad serving engine will all depend on the throughput i.e. larger the throughput, larger the revenue and hence profit. We can then say that ‘R’ the revenue is

R (revenue) α k * T * t

In others words  the revenue is proportional to the throughput.

Hence to determine the profitability of a particular cloud deployment we need to compare the cost of the deployment for a given throughput versus a projected  profit margin. As long the cost of the deployment is less than the revenue arising from the throughput, the deployment will be profitable.  This can be represented pictorially as below.

The graph clearly shows that for a profitable deployment

d/dt (k * T *t) > d/dt (a * D * t) or

k * T > a * D

Hence as can seen from the picture as long as the slope of the cumulative deployment costs are less that the slope of the revenue the deployment will be profitable.

Find me on Google+

Managing Multi-Region Deployments

If there is one lesson from this year’s major Amazon’s EC2 outage it is “don’t deploy all your application instances in a single region”. The outage has clearly demonstrated that entire regions are not immune to disasters. Thus, it has become imperative for designers and architects to deploy applications spanning major regions. Currently there are 4 major regions – US-West, US-East, Europe and APAC.

Both fundamentally and from a strategic point of view it makes sense to deploy web applications in different regions for e.g. both in US-East and US-West. This will build into the application a certain amount of geographical resiliency . In this way you are protected from major debacles like the Amazon’s EC2 outage in April 2011 or a possible meteor crashing and burning in one of the data centers.

Deploying instances in different regions is almost like minimizing risk by diversifying your portfolio. The design of application besides including other methods of fault tolerance should also incorporate geographical resilience.

Currently Amazon’s ELB does not support load balancing across regions. The ELB can only distribute traffic among instances in different availability zones of a region. The solution is to go for other DNS services like UltraDNS, DNSMadeEasy or DynDNS.

These DNS services provide geoIP based load balancer that can distribute traffic based on the region from which it originated. Currently there are 4 major regions in the world – US-East, US-West, Europe and APAC. GeoIP based traffic distribution besides balancing the load based on origination also has the added benefit of getting to the application closest to the origination thus reducing latencies.

The GeoIP based traffic distributor can distribute traffic to the closest region. An Amazon’s ELB can then internally distribute the traffic among the instances within that region. For a look at some typical problems in multi-region cloud deployments do look at my post “Cache-22

INWARDi Technologies

Deploying across regions

Find me on Google+

Latency, throughput implications for the Cloud

The key considerations for any website are latency and throughput. These two parameters are extremely important to web designers as the response time of the web site and the ability to handle large amounts of traffic are directly related to the user experience and the loyalty of returning users.

What are these two parameters and why are they significant? Before looking at latency we need to understand what the response time of the web application is. Ideally this could be defined as the time between the receipt of the HTTP request and the emitting of the corresponding response. Unfortunately any web site hosted on the World Wide Web adds a lot more delay than the response time. This delay comes as the latency of the web site and is primarily due to the propagation and transmission delays on the internet. There are many contributors to this latency starting from the DNS lookup, to the link bandwidth etc.

Throughput on the other hand represents the maximum simultaneous queries or transactions per second that the web application is capable of handling. This is usually measured as transactions-per-second (tps) or queries-per-second (qps).

A good way to understand response time and throughput is to use a oft used example, of a retail store handling customers.  Assuming that there are 5 counter clerks who take 1 minute to check out a customer  we can readily see that as the number of customers to the store increases the throughput increases from 1 customer/minute to a maximum of 5 customers/minute.  Since the cashiers are able to process in 1 minute the response time for the customer is 1 minute/customer. Assuming a 6th customer enters and needs to checkout he/she will have to wait, for e.g.1 minute, if the 5 counter clerks are busy processing 5 other clients,. Hence the response time for the customer will be 1 minute (waiting) + 1 minute (servicing) = 2 minute. The response time increases from 1 minute to 2 minute.  If further clients are ready to check out the length of the wait in the queue will increase and hence the response time. Clearly the throughput cannot increase beyond 5 customers/minute while the response time will increase non-linearly as the clients enter the store faster than they can checked out by the counter clerks.

This is precisely the behavior of web applications. When the traffic to a web site is increased the throughput increases linearly and finally reaches a throughput “plateau”. After this point as the load is increased the throughput remains saturated at this level.  While on the other hand the response time is low at low traffic  it starts to increase non-linearly with increasing load and continues to increase as it maxes out  system resources like the CPU and memory.

When deploying applications on the cloud the latency and throughput are key considerations which are needed to determine the kind of computing resources that  are needed in  the cloud.  Assuming the web application has been optimized and performance tuned for optimum performance what needs to be done is run load testing of the application on the cloud using different CPU instances. For example assume that application is load tested on a small CPU instance.  We need to get the response times and throughput plots with increasing loads. Similarly we now need to deploy the web application on a medium instance and plot response times and the throughput plateaus on the medium instances.

Now the choice as to whether to go for a small CPU instance or medium CPU instance can be calculated as follows. Assuming that the requirements of the web application is to have a response time of ‘t’ seconds then we determine the corresponding traffic handling capacity , for the small CPU instance, say ‘c’ and for the medium CPU instance, let’s assume ‘C’. If the web site has to handle to total traffic of T then we determine the number of instances needed in each case. For the

small CPU instance it will be n= (T/c) + 1

and for

the medium CPU instance it will be N =( T/C)+1.

Now we compute the relative costs of the small and medium CPU instances and identify which is more economical. For example if r1 is the cost per hour of the small CPU instance and R1 is the cost of the medium CPU instance we choose

The small CPU instance if r1 *n < R1 *N (per hour)

While on the other hand if R1 *N < r1 *n then we will choose the medium instance.

Hence the determination of which CPU instance and the configuration of the web application on the cloud will depend on appropriate performance tuning and proper load testing on the cloud. Do also ready my other posts on latency namely ‘The Many faces of latency” and “The Anatomy of Latency“.

Also see latency and throughput in action in the following series of posts

– Bend it like Bluemix, MongoDB with autoscaling – Part 1

– Bend it like Bluemix, MongoDB with autoscaling – Part 2

– Bend it like Bluemix, MongoDB with autoscaling – Part 3

Find me on Google+

The Business of Cloud Computing

Cloud Computing is the spanking new paradigm in the world of computing. The key differentiator in this technology is that the enterprise only pays for the amount of resources used – be it CPUs, memory or databases. While it does away with Capital Expenditure for organizations by providing a utility model of pricing it results in recurring Operating Expenses for the organization. However the important thing is that the cloud grows and shrinks according to demand and hence the cost to the organization is dependent on the traffic it generates. While web based applications are prime candidates for the cloud other equally eligible candidates are batch processing jobs, nightly builds or CPU intensive analytics. Except for the case of web application, for other types of applications, a reasonable estimate can be made on the resources needed and appropriate choice be made on the cloud.

This article looks at web applications where the traffic on the site can be seasonal and can vary during periods of the day. Besides web sites should be capable of handling bursty traffic with enormous loads at particular intervals.

The important consideration for web sites is to ensure that the application is truly optimized and exhibits the property of scaling horizontally. While it appears that scaling out will occur for any reasonably designed application the issue is that as the number of hits increase on the web site the response time increases steeply but the number of transactions per second plateaus at some particular load level and does not increase after that. It can be said that for a certain CPU instance configuration the peak transaction per second will reach a particular limit and cannot be increased any further. However the cloud also provides a key component namely the load balancer along with auto scaling which create a new instances when this threshold is reached.

What are the business considerations that need to be taken while designing for the cloud?

One needs to be conservative in choosing the instance type. While larger instances will provide a better performance they also cost more. Hence the instance type should be large enough and no larger. It would be wasteful of using extremely large instances where the last instance only uses a part of the total traffic while costing a lot more.

The analogy is that if 16 units if task have to be performed it is better to have a small CPU instance capable of handling 3 units of task requiring a total of 6 CPUs (6 * 3 = 18 > 16) rather than having a large CPU instance capable of handling 5 units of task requiring a total of 4 large CPUs (5 * 4= 20> 16). The second option would result in a waste processing power.

Assuming that the upfront cost to the organization for hosting the website in-house is ‘P’ and the cost amortized over a period of 1 years is ‘p’ per hour. Further if the instance cost is ‘c’ and ‘n’ is number of instances needed to support the projected demand and the revenue to the organization hosting the website is ‘r’ per 1000 hits then a cloud deployment will make business sense when

(rh– n * ch) – ph > 0 where h is the hour

As long as the right hand side is positive the organization will profit. However as the traffic increases and the throughput of website plateaus the enterprise will hit a ‘window of diminishing returns’.

However if the performance of the application is poor and the number of instances needed to support the traffic is disproportionately large then the above equation will be negative and will result in loss to the organization.

(rh – n * ch) – ph < 0

Hence deployment to the cloud besides requiring a strong technical background also needs a sound business sense in order to reap the benefits of the cloud.

Find me on Google+

Cloud Computing – Design Considerations

Cloud Computing is definitely turning out to be the proverbial carrot for enterprises to host their applications on the public cloud. The cloud promises many benefits to users of the cloud. Cloud Computing obviates the need for upfront capital expenses for computing infrastructure, real estate and maintenance personnel. This technology allows for scaling up or scaling down as demand on the application fluctuates.

While the advantages are many, migrating application onto the cloud is no trivial task.  The cloud is essentially composed of commodity servers. The cloud creates multiple instances of the application and runs it on the same or on different servers. The benefit of executing in parallel is that the same task can be completed faster. The cloud offers enterprises the ability to quickly scale to handle increasing demands,

But the process of deploying applications on to the cloud requires that the application be re architected to take advantage of this parallelism that the cloud provides. But the ability to handle parallelization is no simple task. The key attributes that need to be handled by distributed systems is the need for consistency and availability. If there are variables that need to be shared across the parallel instances then the application must make special provisions to handle this and ensure consistency. Similarly the application must be designed to handle failures.

Applications that are intended to be deployed on the cloud must be designed to scale-out rather than having the ability to scale-up. Scaling up refers to the process of adding more horse power by way of faster CPUs, more RAM and faster throughput.  But applications that need to be deployed on the cloud need to have the ability to scale out or scale horizontally where more servers are added without any change in processing horsepower.  The design for horizontal scalability is the key to cloud computing architectures.

Some of the key principles to keep in mind while designing for the cloud is to ensure that the application is composed of loosely coupled processes preferably based on SOA principles.  While a multi-threaded architecture where resource sharing through mutexes works in monolithic applications such a architecture is of no help when there are multiple instances of the same application running on different servers. How does one maintain consistency of the shared resource across instances?  This is a tough problem to solve. Ideally the application should be thread safe and should be based on a shared – nothing kind of architecture. One such technique is to use queues that the cloud provides as a means of sharing across instances. However this may impact the performance of the system.  Other methods include using ‘memcached’ which has been used successfully by Facebook, Twitter, Livejournal, Zynga etc deployed on the cloud. Still another method is to use the Map-Reduce algorithm where the variables across instances are handled by ‘map’ and the ‘reduce’ part handles the consistency across instances.

Another key consideration is the need to support availability requirements. Since the cloud is made up of commodity hardware there is every possibility of servers failing.  The application must be designed with inbuilt resilience to handle such failures. This could by designing active-standby architecture or by providing for checkpointing so that application can restart from some known previous point.

Hence while cloud computing is the way to go in the future there is a need to be able to carefully design the application so that full advantage of the cloud can be taken.