Cache-22

If you want performance you need to partition data. If you partition data you will not get performance! That sounded clever but is it true? Well, it can be if the architecture of your application is naïve.

The problem I am describing here is when there is a need to partition data across multiple geographical regions. Partitioning data essentially spreads data among several servers resulting in fast accesses. But when the data is spread across large geographical distances then this will result in significant network latencies. This is something that cannot be avoided.

Memcached is a common technique to store commonly read data into in-memory caches preventing frequent dips to the database. Memcached accesses data through “gets” and updates data through “sets”. Data is accessed based on a key which is hashed to one of several participating servers. Thus memcached distributes the data among several participating servers in a server list.  Reads and writes are of the order of O(1) and extremely fast. This works fine as long as servers belong to a single region or if the data center is in the same region. The network latencies will be low and the latency of the application will not be severely affected.

Now consider a situation where the memcached servers have to be distributed to multi-region data centers. While this is an excellent scheme for Disaster Recovery (DR) it introduces its own set of attendant problems.

Memcached will hash the entire data set and distribute it over the entire server list. Now “gets” of data from one geographical region to another will have significant latency. Since the laws of physics mandate that nothing can exceed the speed of light, we will be stuck with appreciable latencies for inter-region reads and writes.  So while a multi-region deployment provides for geographical resiliency it does introduce issues of latency and degraded throughput.

So what is the solution? One possible solution is to replicate the data across the regions. The solution to this problem is to replicate data in all the regions.  One technique that I can think of is to have the application to implement “local reads & global writes”.  This technique provides for the AP part of the CAP Theorem. The CAP theorem states that it is impossible to completely provide Consistency, Availability and Partition tolerance to distributed application. The “local reads & global writes” method will assure availability and partition tolerance while providing for eventual consistency.

In this technique, updates are done both on local servers along with asynchronous writes to all data centers. The writes are hence global in nature. The updates will not wait for all writes to complete before moving along. However reads will be local ensuring that the latency is low. Data reads based on data proximity will ensure that latency is really low.

Since writes are asynchronous the data will tend to be “eventually consistent” rather than being “strongly consistent” but this is a tradeoff that can be taken into account. Ideally it will be essential to implement the quorum protocol along with the “local reads & global writes” technique to ensure that you read your writes.

The application could have a modified quorum protocol such that R+ W > N where R is the number of data reads and W is the number of writes to servers and N is the total number of servers in the memcached server list.

Similar technique has been used in Cassandra & CouchDB etc.

With the “local reads & global writes” technique it is possible keep the latencies within reasonable limits since data reads will be based on proximity. Also replication the data to all regions will also ensure that eventually all regions will have a consistent view of the data.

Find me on Google+

Eliminating the Performance Drag

Nothing is more important to designers and architects of web applications than performance.   Everybody dreams of applications that can roar and spit fire! Attributes that are most sought after in large scale web applications are low latency and high throughput.  Performance is money. So it is really important to work the design and churn the best possible design.

Squeezing performance from your application is truly an art. The challenges and excitement are somewhat similar to what a race car designer or an aeronautical engineer would experience in reducing the drag on the machine while increasing the thrust of the engine. Understanding the physics of program execution is truly a rare art.

This post will attempt to touch upon some key aspects that have to be looked at a little more closely to wring the maximum performance from your application.

The Store Clerk Pattern: This is an oft quoted analogy to describe the relation between latency and throughput. In this example the application can be likened to retail store with store clerks checking out customers who queue with their purchases. Assume that there are 5 store clerks and each take 1sec to process a customer. If there 5 customers the response time for each customer will be 1 sec and a total throughput of 5 customers/sec can be achieved. However if a 6th customer enters then he/she will have to wait 1 sec in the queue while 5 customers are being handled by the store clerks. For this customer the response time will be 1 sec (waiting) + 1 sec (processing) or a total response time of 2 secs. It can be readily seen that as more customers queue up the wait times will increase and the latency will keep increasing. Also it has to be noted that the throughput will plateau at 5 customers/sec and cannot go above that.

The first point of attack in improving performance is to identify the store clerk pattern in your application. Identify where you application has a queue of incoming requests and a thread pool to address these requests as they get processed.  The latency and throughput are governed by the number of parallel thread (store clerks) who process and the wait times in the queue. One naïve technique is to increase the number of threads in the pool or increase the number of pools. However this may be limited by the CPU and system limitations. What is extremely important is to identify what factors contribute to the processing of each request. While processing, do threads need to access and retrieve data? Do they have to make API or SQL calls?  Identify what is the worst case performance of the thread and determine if this worst case can be improved by a different algorithm. So the key in the store clerk pattern is a) to optimize the threads in the pool and b) to improve the worst case performance of processing the request.

Resource Contention: This is another area of the application that needs to be looked at very closely. It is quite likely that data is being shared by many threads. Access to shared data is going to involve locks and waits. Identify and determine the worst case wait for threads. Is your application read-heavy and write-light or write-heavy and read-light? In the former situation it may be worthwhile to use a Reader-Writer locking algorithm in which many number of readers can simultaneously read data by updating a semaphore. However a write, which happens occasionally, will result in locking the resource and cause the  wait of all reader threads. However if the application is write-heavy then other alternatives like message based locking could be used. Clearly thread waits can be a drain on performance.

Algorithmic changes: If there are modules that perform enormous number of insertions, updates or deletions on data in memory then this has to be looked at closely. Determine the type of data structures or STLs being used.  The solution is to be able to re-organize data so that the operation happens much more efficiently ideally reaching towards   O (1). Maybe the data may need to be organized as hash map of lists or a hash map pointing to n-ary trees instead of a list of lists. This will really require deep thought and careful analysis to identify the best possible approach that provides the least possible times for the most common operation.

From Relational to NoSQL : Though the transition from a RDBMS to a NoSQL databases like Cassandra, CouchDB  etc would really be based on scalability, the ability to partition data horizontally and hash the key for accesses, updations and deletions will be really fast and is an avenue that is worth looking into.

Caching : This is a widely used technique to reduce frequent SQL queries to the database. Data that is commonly used can be cached in-memory. One such technique is to use memcached.  Memcached caches data across several servers. Access to data is through hashing and is of the order of O (1). If there is a miss of data in the memcached server’s then data is accessed through a SQL query. Access to data is through simple get, put methods in which the key is hashed to identify the server in which the data is stored.

Profiling : The judicious use of profiling tools  is extremely important in  optimizing performance. Tools like valgrind truly help in identifying bottlenecks. Other tools also help in monitoring thread pools and identifying where resource contention is taking place. It may also be worthwhile to timestamp different modules and collect data over several thousand runs, average them and pin-point trouble spots.

These are some technique that can be used for optimizing performance. However improving performance beyond a point will really depend on being able to visualize the application in execution and divining problem hot spots.

Find me on Google+

To Hadoop, or not to Hadoop

Published in Telecom Asia, Jul 23, 2012 – To Hadoop or not to Hadoop

To Hadoop, or not to Hadoop: that is the question.  In many  of my discussions I find that Hadoop with its implementation of Map-Reduce crops  ups time and time and again.  To many Map-Reduce is the panacea for all kinds of performance evils. It appears that somehow using the Map-Reduce in your application will magically transform your application into a high performing, screaming application.

The fact is the Map-Reduce algorithm is applicable to only certain class of problems.  Ideally it is suited to what is commonly referred to as “embarrassingly parallel” class of problems.  These are problems that are inherently parallel for e.g. the creation of inverted indices from web crawled documents.

Map-Reduce is an algorithm that has popularized by Google. The term map –reduce actually originates from Lisp in which the “map” function takes a list of arguments and performs the same operation on all of its argument. The” reduce” then applies a common criterion to pick a reduced set of values from this list. Google uses the Map-Reduce to create an inverted index. An inverted index basically provides a mapping of a word with the list of documents in which it occurs. This typically happens in two stages. A set of parallel “map” tasks take as input documents, parse them and emits a sequence of (word, document id) pairs. In other words, the map takes as input a key value pair (k1, v1) and maps it into an intermediate (k2, v2) pair. The reduce tasks take the pair of (word, document id), reduce them, and emit a (word, list {document id}). Clearly applications, like the inverted index, make sense for the Map-Reduce algorithm as several mapping tasks can work in parallel on separate documents. Another typical application is counting the occurrence of words in documents or the number of times a web URL has been hit from a traffic log.

The key point in all these typical class of problems is that the problem can be handled in parallel. Tasks that can execute independently besides being inherently parallel are eminently suitable for Hadoop processing. These tasks work on extraordinarily large data sets.  This is also another criterion for Hadoop worthy applications.

Hadoop uses a large number of commodity servers to execute the algorithm. A complementary technology along with Hadoop is the Hadoop Distributed File System (HDFS). The HDFS is a storage system in which the input data is partitioned across several servers. Google uses the Google File System (GFS) for its inverted index and page ranking algorithm.

Typical applications that are prime candidates for Hadoop are those applications that have to operate on terabytes of data. Also the additional requirement is that the application can run some sort of transformation or “map” algorithm on the data independently and produce an intermediate result for the “reduce” part of the algorithm. The “reduce” essentially applies some criteria on the intermediate sets to produce a zero or 1 output.

However several real world applications do not fall into this category where we can parallelize the execution of the application. For example an e-retail application which allows users to search for book, electronic products, add to shopping card and finally make the purchase, in my opinion,  is not really suited for Hadoop as each individual transaction is separate  and typically has its own unique sequential flow. An Ad serving application also is not ideally suited for Hadoop. Each individual transaction has its own individual flow in time.

However on closer look we can see that there are certain aspects of the application that are conducive to Hadoop based Map-Reduce algorithm. For e.g. if the application needs to search through large data sets for example the e-retail application will have tens of thousands of electronic products and books from different vendors with their own product id.  We could use Hadoop to pre-process these large amounts of data, classify and create smaller data sets which the e-retail or other application can use. Hadoop is a clear winner when large data sets have to searched, sorted or some subset selected from.

In these kinds of applications Hadoop has a clear edge over other types as it can really crunch data. Hadoop is also resilient to failures and is based on the principle of “data locality” which allows the “map” or “reduce” to use data stored locally on its sever or in a neighboring machine.

Hence while Hadoop is no silver bullet for all types of applications if due diligence is performed we can identify aspects of the application which can be crunched by Hadoop.

Find me on Google+

The Future is C-cubed: Computing, Communication and the Cloud

We are the on the verge of the next great stage of technological evolution. The trickle of different trends clearly point to what I would like to term as C-cubed (C3) representing the merger of computing technologies, communication advances and the cloud.

There are no surprises in this assessment. Clearly it does not fall into the category of Chaos theories’ “butterfly effect” where a seemingly unrelated cause has a far-reaching effect, typically the fluttering of a butterfly in Puerto Rico is enough to cause an earthquake in China.

The C-cubed future that seems very probable is based on the advances in mobile broadband, advances in communication and the emergence of cloud computing.  A couple of years back Scott McNealy of Sun Microsystems believed that the “network is the computer”. Now with the introduction of Google’s Chrome book this trend will soon catch on. In fact I can easily visualize a ubiquitous device which I would like to call as the “cloudbook”.

The cloudbook would be a device that would resemble a tablet like the iPad, Playbook etc but would carry little or no hard disk.  Local storage will be through USB devices or SD-Cards which these days come with large storage in the range of 80GB and above. The Cloud book would have no operating system. It would simply have a bootstrap program which will allow the user to choose from several different Operating Systems (OS) namely Window’s, Linux, Solaris and Mac etc which will execute on the cloud. All applications will be executed directly on the cloud. The user will also store all his programs and data on the cloud.  Some amount of offline storage will be possible in portable storage devices like the memory stick, SD card etc.

The cloudbook will be a ubiquitous device.  It will access the internet through mobile broadband.  The access could be through a GPRS, WCDMA or a LTE connection. With the blazing speeds of 56 Mbps promised by LTE the ability to access the public cloud for executing programs and for storing of data is extremely feasible. Access should be almost instantaneous. Using the mobile broadband for access and the cloud for computing and storage will be the trend in the future.

Besides its use for computing, the cloudbook will also be used for making voice or video calls. This is the promise of IP Multimedia Systems (IMS) technology. IMS is a technology that has been in the wings for quite some time. IMS technology envisages an all-IP Core Network that will be used for transporting voice, data and video. As the speeds of the IP pipes become faster and the algorithms to iron out QOS issues are worked out the complete magnificence of the vision of IMS will become a reality and high speed video applications will become common place.

The cloudbook will use the WCDMA, 3G, network to make voice and video calls to others. The 3G RNC or the 4G eNodeB’s will enable the transmission and reception of voice, data or video to and from the Core Network. LTE networks will either user Circuit Switched Fall Back (CSFB) or VOLTE (Voice over LTE) to transfer voice and video over either the 3G network or over the Evolved Packet Core (EPC).  In the future high speed video based calls and applications will be extremely prevalent and a device like the cloudbook will increase the user experience manifold.

Besides IMS also envisions Applications Server (AS) spread across the network providing other services like Video-on-Demand, Real-time multi player gaming. It is clear that these AS may actually be instances sitting off the public cloud.

Hence the future clearly points to a marriage of computing, communication and the cloud where each will have a symbiotic relationship with the other resulting in each other. The network can be visualized as one large ambient network of IMS Call Session Control Function Servers (CSCFs) , Virtualized Servers on the Cloud and Application servers (AS).

Mobile broadband will become commonplace and all computing and communication will be through 3G or 4G networks.

The future is almost here and the future is C-cubed (C3)!!!

Published in Telecom Asia, Jul 8 2011 – The Future is C-cubed

Find me on Google+

Managing Multi-Region Deployments

If there is one lesson from this year’s major Amazon’s EC2 outage it is “don’t deploy all your application instances in a single region”. The outage has clearly demonstrated that entire regions are not immune to disasters. Thus, it has become imperative for designers and architects to deploy applications spanning major regions. Currently there are 4 major regions – US-West, US-East, Europe and APAC.

Both fundamentally and from a strategic point of view it makes sense to deploy web applications in different regions for e.g. both in US-East and US-West. This will build into the application a certain amount of geographical resiliency . In this way you are protected from major debacles like the Amazon’s EC2 outage in April 2011 or a possible meteor crashing and burning in one of the data centers.

Deploying instances in different regions is almost like minimizing risk by diversifying your portfolio. The design of application besides including other methods of fault tolerance should also incorporate geographical resilience.

Currently Amazon’s ELB does not support load balancing across regions. The ELB can only distribute traffic among instances in different availability zones of a region. The solution is to go for other DNS services like UltraDNS, DNSMadeEasy or DynDNS.

These DNS services provide geoIP based load balancer that can distribute traffic based on the region from which it originated. Currently there are 4 major regions in the world – US-East, US-West, Europe and APAC. GeoIP based traffic distribution besides balancing the load based on origination also has the added benefit of getting to the application closest to the origination thus reducing latencies.

The GeoIP based traffic distributor can distribute traffic to the closest region. An Amazon’s ELB can then internally distribute the traffic among the instances within that region. For a look at some typical problems in multi-region cloud deployments do look at my post “Cache-22

INWARDi Technologies

Deploying across regions

Find me on Google+

Designing a Scalable Architecture for the Cloud

The promise of the cloud is the unlimited computing power and storage capacities coupled with the pay-per-use policy. This makes the cloud particularly irresistible for hosting web applications and applications whose demand vary periodically. In order to take full advantage of the cloud the application must be designed for optimum performance. Though the cloud provides resources on-demand a badly designed application can hog resources and prove to be extremely expensive in the long run.

One of the first requirements for deploying applications on the cloud is that it should be scalable. Scalability denotes the ability to handle increasing traffic simply by adding more computing resources of the same kind rather than adding resources with greater horse power. This is also referred to scaling horizontally.

Assuming that the application has been sufficiently profiled and tuned for high performance there are certain key considerations that need to be taken into account while deploying on the cloud – public or private.  Some of them are being able to scale on demand, providing for high availability, resiliency and having sufficient safeguards against failures.

Given these requirements a scalable design for the Cloud can be viewed as being made up of the following 5 tiers of layers

The DNS tier – In this tier the user domain is hosted on a DNS service like Ultra DNS or Route 53. These DNS services distribute the DNS lookups geographically. This results in connecting to a DNS Server that is geographically closer to the user thus speeding the DNS lookup times. Moreover since the DNS lookups are distributed geographically it also builds geographic resiliency as far as DNS lookups are concerned

Load Balancer-Auto Scaling Tier – This tier is responsible for balancing the incoming traffic among compute instances in the cloud. The load balancing may be made on a simple round-robin technique or may be based on the actual CPU utilization of the individual instances. Typically at this layer we should also have an auto-scaling policy which will add more instances if the traffic to the application increases above a threshold or terminate instances when the traffic falls below a specific threshold.

Compute-Instance Tier – This layer hosts the actual application in individual compute instances on the cloud. It is assumed that the application has been tuned for maximum performance. The choice of small, medium or large CPU should be based on the traffic handling capacity of the instance type versus the cost/hr of the instance.

Cache Tier – This is an important layer in the cloud application where there are multiple instances. The cache tier provides a distributed cache for all the instances. With a distributed caching system like memcached it is possible to share global data between instances. The memcached application uses a consistent-hashing technique to distribute data among a set of participating servers. The consistent hashing method allow for handling of server crashes and new servers joining into the cache layer.

Database Tier – The Database tier is one of the most critical layers of the application. At a minimum the database should be configured in an active-standby mode. Ideally it is always better to have the active and standby in different availability zones to better handle disasters in a particular zone. Another consideration is have separate read replicas that handle reads to database while the primary database handles the write operations

Besides the above considerations it is always good to host the web application in different availability zone thus safeguarding against disasters in a particular region.

Working with Amazon’s EBS, ELB and Route 53

Here are some key learning’s  to get going on Amazon’s Elastic Block Storage (EBS), Elastic Load Balancer (ELB) and Route 53 which Amazon’s DNS  service

Amazon’s EBS: Amazon’s Elastic Block Storage provided persistent storage for your applications. It is extremely useful when migrating from a small/medium instance to a large/extra large instance. The EBS is akin to a hard disk. The steps that are needed to migrate are

– Create an EBS volume from your snapshot of your small/medium instance

– Launch a large instance

– Attach your EBS volume to your large instance (for e.g. /dev/sda2)

– Open a ssh window to your large instance

– Create a test directory (/home/ec2-user/test)

– Mount your volume (mount /dev/sda2 /home/ec2-user/test)

– Copy all your files and directories to their appropriate location

– Unmount the mounted volume (umount /dev/sda2)

– Now you have all the files from your medium instance

– Detach the volume

Amazon’s ELB: The key thing about the Amazon’s ELB is the fact that the ELB created (my-load-balancer-nnnn-abc.amazon.com) actually maps to a set of IP addresses internally. Amazon suggests CNAMEing a subdomain to point to the ELB for better performance. Also an important thing to understand about Amazon’s ELB is that it performs significantly better if user requests come from different IPs rather from a single machine. So a performance tool that simulates users from multiple IPs will give a better throughput. The alternative is run the performance tool from multiple machines

Amazon’s Route 53: Route 53 is Amazon’s DNS service.  Route 53 distributes your domains to multiple geographical zones enabling quicker DNS lookup. To use Route 53 you need to

– create a hosted zone for your domain (for e.g http://www.mydomain.com) in Route 53

– migrate all your A, MX, CNAME resource records from your current registered domain to Route 53.

Since Route 53 is distributed it will speed name lookups. Currently updates to Route 53 are through dnscurl.pl a Perl script. However there are good GUI tools that make the job very simple.

This should get you started on the EBS, ELB and Route 53. Do also take a look at my post “Managing multi-region deployments“.

Find me on Google+

The Many Faces of Latency

Nothing is more damaging to a website than poor response times. Latency is probably the most serious issue that website application developers have to contend with. Whether it is retail application or a e-ticketing application poor response times play havoc on user experience. Latency has many faces each contributing in a little way to the overall response times of the application. This article looks at some of the key culprits that contribute to a website latency

Link Latencies: This is one of major contributors. The link speeds from the host computer to the website plays a major role. For those applications that are hosted on the public cloud it makes sense to deploy in multiple availability zones dispersed geographically. This will ensure that people across the globe get to the website from a cloud deployment closest to them. Besides, with the recent Amazon EC2 outage it definitely makes sense to be able to deploy across availability zones promoting geographical resiliency in the application. Dispersing the applications geographically helps in connecting the user with the least number of intervening hops thus reducing the response times.

DNS latencies: This is another area which needs to be focused on. DNS lookup can be fairly expensive. Hence it makes sense to speed DNS lookups by using some DNS services that provide additional name servers across geographical regions. There are many such DNS services that speed DNS lookups by propagating DNS lookup across geographies. Some examples are Amazon’s Route 53, UltraDNS etc.

Load Balancer Latencies: Typical cloud deployments will multiple instances usually be behind a load balancer. Depending on what algorithm the load balancer adopts for balancing the incoming traffic it is definitely going to contribute to the latency. Amazon’s Elastic Load Balancer is usually a set of participating IPs.

Application Latencies: When the load balancer sends the request to the Web application the logic in processing the request is a key contributor. This latency is within the control of the developer so it makes sense to bring this down to the absolute minimum.

Web page Rendering Latencies: A poorly designed web page can also result in large latencies. A webpage that needs to download a lot of items prior to being able to render it will definitely affect the user’s experience. Hence it is necessary to design an efficient web page that renders quickly. A standard technique to deliver content to a website is to use a Content Delivery Network (CDN) to deliver content. CDNs typically distribute content across multiple servers dispersed geographically. The content server selected for content delivery is based on user proximity based on the fewest number of hops. Major players in CDNS are Akamai, Edgecast andAmazon’s Cloudfront.

These are the many aspects that contribute to overall latencies. Focus should being trying to optimize in all areas while deploying a web application either in a hosted network or the public cloud.

Find me on Google+

Latency, throughput implications for the Cloud

The key considerations for any website are latency and throughput. These two parameters are extremely important to web designers as the response time of the web site and the ability to handle large amounts of traffic are directly related to the user experience and the loyalty of returning users.

What are these two parameters and why are they significant? Before looking at latency we need to understand what the response time of the web application is. Ideally this could be defined as the time between the receipt of the HTTP request and the emitting of the corresponding response. Unfortunately any web site hosted on the World Wide Web adds a lot more delay than the response time. This delay comes as the latency of the web site and is primarily due to the propagation and transmission delays on the internet. There are many contributors to this latency starting from the DNS lookup, to the link bandwidth etc.

Throughput on the other hand represents the maximum simultaneous queries or transactions per second that the web application is capable of handling. This is usually measured as transactions-per-second (tps) or queries-per-second (qps).

A good way to understand response time and throughput is to use a oft used example, of a retail store handling customers.  Assuming that there are 5 counter clerks who take 1 minute to check out a customer  we can readily see that as the number of customers to the store increases the throughput increases from 1 customer/minute to a maximum of 5 customers/minute.  Since the cashiers are able to process in 1 minute the response time for the customer is 1 minute/customer. Assuming a 6th customer enters and needs to checkout he/she will have to wait, for e.g.1 minute, if the 5 counter clerks are busy processing 5 other clients,. Hence the response time for the customer will be 1 minute (waiting) + 1 minute (servicing) = 2 minute. The response time increases from 1 minute to 2 minute.  If further clients are ready to check out the length of the wait in the queue will increase and hence the response time. Clearly the throughput cannot increase beyond 5 customers/minute while the response time will increase non-linearly as the clients enter the store faster than they can checked out by the counter clerks.

This is precisely the behavior of web applications. When the traffic to a web site is increased the throughput increases linearly and finally reaches a throughput “plateau”. After this point as the load is increased the throughput remains saturated at this level.  While on the other hand the response time is low at low traffic  it starts to increase non-linearly with increasing load and continues to increase as it maxes out  system resources like the CPU and memory.

When deploying applications on the cloud the latency and throughput are key considerations which are needed to determine the kind of computing resources that  are needed in  the cloud.  Assuming the web application has been optimized and performance tuned for optimum performance what needs to be done is run load testing of the application on the cloud using different CPU instances. For example assume that application is load tested on a small CPU instance.  We need to get the response times and throughput plots with increasing loads. Similarly we now need to deploy the web application on a medium instance and plot response times and the throughput plateaus on the medium instances.

Now the choice as to whether to go for a small CPU instance or medium CPU instance can be calculated as follows. Assuming that the requirements of the web application is to have a response time of ‘t’ seconds then we determine the corresponding traffic handling capacity , for the small CPU instance, say ‘c’ and for the medium CPU instance, let’s assume ‘C’. If the web site has to handle to total traffic of T then we determine the number of instances needed in each case. For the

small CPU instance it will be n= (T/c) + 1

and for

the medium CPU instance it will be N =( T/C)+1.

Now we compute the relative costs of the small and medium CPU instances and identify which is more economical. For example if r1 is the cost per hour of the small CPU instance and R1 is the cost of the medium CPU instance we choose

The small CPU instance if r1 *n < R1 *N (per hour)

While on the other hand if R1 *N < r1 *n then we will choose the medium instance.

Hence the determination of which CPU instance and the configuration of the web application on the cloud will depend on appropriate performance tuning and proper load testing on the cloud. Do also ready my other posts on latency namely ‘The Many faces of latency” and “The Anatomy of Latency“.

Also see latency and throughput in action in the following series of posts

– Bend it like Bluemix, MongoDB with autoscaling – Part 1

– Bend it like Bluemix, MongoDB with autoscaling – Part 2

– Bend it like Bluemix, MongoDB with autoscaling – Part 3

Find me on Google+

Cloud Computing – Show me the money!

Published in Telecom Lead – Cloud Computing – Show me the money!

A lot has been said about the merits of cloud computing and how it is going to be the technological choice of most enterprises in the not so distant future. But the key question that is bound to keep cropping up in the higher echelons of the enterprise is whether the cloud makes good business sense. While most know that cloud computing adopts a pay-per-use model similar to regular utilities like electricity and water and does away with upfront infrastructure costs to the organization the nagging question to most senior management people is whether cloud computing is prudent choice in the long term.

This is not an easy question to answer and depends on a multitude of factors. The alternative to cloud computing is to have an in-house infrastructure of servers, hardware and software, software licenses, broadband links, firewalls etc. All these will form the Capital Expenditure (CAPEX) for the organization. In addition to these expenses will be the Operational Expenditures (OPEX) of real estate to house the equipment, power supply systems, cooling systems, maintenance personnel, annual maintenance contracts (AMC) etc which will be recurring expenses for the organization.

Cloud Computing does away completely with procurement of hardware, software, databases, licenses etc and an enterprise should be able to host their application in a couple of hours provided they know ahead of time the resources their application will need.

Hence as can be seen while the upfront costs and the running costs of maintaining a data center will be high in comparison to the zero upfront costs of the deploying on the cloud the steeper operational costs of the cloud will eventually catch up with the in-house infrastructure.

Depending on how well the application is designed the point at which the cumulative running costs of the cloud breaks even with in-house data center can be made to occur a couple of years down the line after the application is deployed.  Assuming that the break even happens in 3 years the advantage of cloud deployment is that the enterprise does not have to worry about equipment obsolescence, upgrading of software etc not to mention the depreciation of the equipment costs.

Moreover cloud technology is extremely useful to enterprises which are planning to deploy application in which there is difficulty in forecasting the type of traffic that will be hit their application. Where the traffic may be intermittent, bursty or seasonal then a cloud makes perfect business sense since can it scale up or scale down depending on the traffic.

Some typical applications which are prime candidates for the cloud are CRM software, office tools, testing tools, online retail stores, webmail etc.

One possible worry of the enterprise will be the security concerns while deploying to the public cloud. In such situations the organization can take a hybrid strategy where their sensitive data are hosted in in-house data centers and their main application is hosted on a public cloud.

Hence in most situation cloud deployments do have a definite edge for certain key application of the enterprise.

Find me on Google+