Bend it like Bluemix, MongoDB with auto-scaling – Part 3

In this last post of this series, I test the performance of Bluemix & MongoDB against  concurrent queries and deletes to the cloud based app with Mongo DB, with auto-scaling on. Before I started these series of tests I moved the Overload policy a couple of notches higher and made it scale out if  memory utilization > 75% for 120 secs and < 30% for 120 secs (from the earlier 55% memory utilization) as shown below.

27

The code for bluemixMongo app can be forked from Devops at bluemixMongo or can be cloned from GitHub at  bluemix-mongo-autoscale. The multi-mechanize scripts can be downloaded from GitHub at multi-mechanize     Before starting the testing I checked the current number of documents inserted by the concurrent inserts (see Bend it like Bluemix., MongoDB using Auto-scaling – Part 2). The total number as determined by checking the logs was 1380 17   Sure enough with the scaling policy change after 2 minutes the number of instanced dropped from 3 to 2

18

1. Querying the bluemixMongo app with Multi-mechanize

The Multi-mechanize Python script used for querying the bluemixMongo app simply invokes the app’s userlist URL (resp=br.open(“http://bluemixmongo.mybluemix.net/userlist/&#8221;)

v_user.py

def run(self):
# create a Browser instance
br = mechanize.Browser()
# don"t bother with robots.txt
br.set_handle_robots(False)
# start the timer
start_timer = time.time()
#print("Display userlist")
# Display 5 random documents
resp=br.open("http://bluemixmongo.mybluemix.net/userlist/")
assert("Example Mongo Page" in resp.get_data())
# stop the timer
latency = time.time() - start_timer
self.custom_timers["Userlist"] = latency
r = random.uniform(1, 2)
time.sleep(r)
self.custom_timers['Example_Timer'] = r

The configuration setup for this script creates 2 sets of 10 concurrent threads

config.cfg
run_time = 300
rampup = 0
results_ts_interval = 10
progress_bar = on
console_logging = off
xml_report = off
[user_group-1]
threads = 10
script = v_user.py
[user_group-2]
threads = 10
script = v_user.py

The corresponding userlist.js for querying the app is shown below. Here the query is constructed by creating a ‘RegularExpression’ with  a random Firstname, consisting of a random letter and a random number. Also the query is also limited to 5 documents.

function(callback)
{
// Display a random set of 5 records based on a regular expression made with random letter, number
var randnum = Math.floor((Math.random() * 10) + 1);
var alpha = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','X','Y','Z'];
var randletter = alpha[Math.floor(Math.random() * alpha.length)];
var val =  randletter + ".*" + randnum + ".*";
// Limit the display to 5 documents
var results = collection.find({"FirstName": new RegExp(val)}).limit(5).toArray(function(err, items){
if(err) {
console.log(err + " Error getting items for display");
}
else {
res.render('userlist',
{ "userlist" : items
}); // end res.render
} //end else
db.close(); // Ensure that the open connection is closed
}); // end toArray function
callback(null, 'two');
}

2. Running the userlist query

The following screenshot shows the userlist query being executed concurrently with Multi-mechanize. Note that the number of  instances also drops down to 1

18

3. Deleting documents with Multi-mechanize

The multi-mechanize script for deleting a document is shown below. This script calls the URL with resp = br.open(“http://bluemixmongo.mybluemix.net/remuser&#8221;). No values are required to be entered into the form and the ‘submit’ is simulated.

v_user.py
def run(self):
# create a Browser instance
br = mechanize.Browser()
# don"t bother with robots.txt
br.set_handle_robots(False)
br.addheaders = [("User-agent", "Mozilla/5.0Compatible")]
# start the timer
start_timer = time.time()
# submit the request
resp = br.open("http://bluemixmongo.mybluemix.net/remuser")
#resp = br.open("http://localhost:3000/remuser")
resp.read()
# stop the timer
latency = time.time() - start_timer
# store the custom timer
self.custom_timers["Load_Front_Page"] = latency
# think-time
time.sleep(2)
# select first (zero-based) form on page
br.select_form(nr=0)
# set form field
br.form["firstname"] = ""
br.form["lastname"] = ""
br.form["mobile"] = ""
# start the timer
start_timer = time.time()
# submit the form
resp = br.submit()
resp.read()
print("Removed")
# stop the timer
latency = time.time() - start_timer
# store the custom timer
self.custom_timers["Delete"] = latency
# think-time
time.sleep(2)

config.cfg

The config file is set to start 2 sets of 10 concurrent threads and execute for 300 secs

[global]
run_time = 300
rampup = 0
results_ts_interval = 10
progress_bar = on
console_logging = off
xml_report = off
[user_group-1]
threads = 10
script = v_user.py
[user_group-2]
threads = 10
script = v_user.py
;

deleteuser.js

This Node.js script does a findOne() document and does a remove with the ‘justOne’ set to true

collection.findOne(function(err, item) {
// Delete just a single record
collection.remove(item, {justOne:true},(function (err, doc) {
if (err) {
// If it failed, return error
res.send("There was a problem removing the information to the database.");
}
else {
// If it worked redirect to userlist
res.location("userlist");
// And forward to success page
res.redirect("userlist");
}
}));
});
collection.find().toArray(function(err, items) {
console.log("Length =----------------" + items.length);
db.close();
});
callback(null, 'two');

4. Running the deleteuser multimechanize script

The output of the script executing and the reduction of the number of instances because of the change in the memory utilization policy is shown

21

5. Multi-mechanize

As mentioned in the previous posts

The multi-mechanize commands are executed as follows
To create a new project
multimech-newproject.exe userlist
This will create 2 folders a) results b) test_scripts and the file c) config.cfg. The v_user.py needs to be updated as required

To run the script
multimech-run.exe userlist

The details of the response times for the query is shown below .

All_Transactions_response_times_intervals

 

More details on latency and throughput for the queries and the deletes are included in the results folder of multi-mechanize    

6. Autoscaling The details of the auto-scaling service is shown below

a. Scaling Metric Statistics

22

b. Scaling history 23

7. Monitoring and Analytics (M & A) The output from M & A is shown  below

 

a. Performance Monitoring 24  

b. Log Analysis output The log analysis give a detailed report on the calls made to the app, the console log output and other useful information

25

The series of the 3 posts Bend it like Bluemix, MongoDB with auto-scaling demonstrated the ability of the cloud to expand and shrink based on the load on the cloud.An important requirement for Cloud Architects is design applications that can  scale horizontally without impacting the performance while keeping the costs optimum. The real challenge to auto-scaling is the need to make the application really distributed as opposed to the monolithic architectures we are generally used to.   I hope to write another post on creating simple distributed application later.

Hasta la Vista!

Also see
1.  Bend it like Bluemix, MongoDB with autoscaling – Part 1
2. Bend it like Bluemix, MongoDB with autoscaling – Part 2

You may like :
a) Latency, throughput implications for the cloud
b) The many faces of latency
c) Brewing a potion with Bluemix, PostgreSQL & Node.js in the cloud
d)  A Bluemix recipe with MongoDB and Node.js
e)Spicing up IBM Bluemix with MongoDB and NodeExpress
f) Rock N’ Roll with Bluemix, Cloudant & NodeExpress

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Bend it like Bluemix, MongoDB using Auto-scaling – Part 2!

This post takes off from my previous post Bend it like Bluemix, MongoDB using Auto-scale –  Part 1! In this post I generate traffic using Multi-Mechanize a performance test framework and check out the auto-scaling on Bluemix, besides also doing some rudimentary check on the latency and throughput for this test application. In this particular post I generate concurrent threads which insert documents into MongoDB.

Note: As mentioned in my earlier post this is more of a prototype and the typical situation when architecting cloud applications. Clearly I have not optimized my cloud app (bluemixMongo) for maximum efficiency. Also this a simple 2 tier application with a rudimentary Web interface and a NoSQL DB at This is more of a Proof of Concept (PoC) for the auto-scaling service on Bluemix.

As earlier mentioned the bluemixMongo app is a modification of my earlier post Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress. The bluemixMongo cloud app that was used for this auto-scaling test can be forked from Devops at bluemixMongo or from GitHib at bluemix-mongo-autoscale. The Multi-mechanize config file, scripts and results can be found at GitHub in multi-mechanize

The document to be inserted into MongoDB consists of 3 fields – Firstname, Lastname and Mobile. To simulate the insertion of records into MongoDB I created a Multi-Mechanize script that will generate random combination of letters and numbers for the First and Last names and a random 9 digit number for the mobile. The code for this script is shown below

1. The snippet below measure the latency for loading the ‘New User’ page

v_user.py
def run(self):
# create a Browser instance
br = mechanize.Browser()
# don"t bother with robots.txt
br.set_handle_robots(False)
print("Rendering new user")
br.addheaders = [("User-agent", "Mozilla/5.0Compatible")]
# start the timer
start_timer = time.time()
# submit the request
resp = br.open("http://bluemixmongo.mybluemix.net/newuser")
#resp = br.open("http://localhost:3000/newuser")
resp.read()
# stop the timer
latency = time.time() - start_timer
# store the custom timer
self.custom_timers["Load Add User Page"] = latency
# think-time
time.sleep(2)

The script also measures the time taken to submit the form containing the Firstname, Lastname and Mobile

# select first (zero-based) form on page
br.select_form(nr=0)
# Create random Firstname
a = (''.join(random.choice(string.ascii_uppercase) for i in range(5)))
b = (''.join(random.choice(string.digits) for i in range(5)))
firstname = a + b
# Create random Lastname
a = (''.join(random.choice(string.ascii_uppercase) for i in range(5)))
b = (''.join(random.choice(string.digits) for i in range(5)))
lastname = a + b
# Create a random mobile number
mobile = (''.join(random.choice(string.digits) for i in range(9)))
# set form field
br.form["firstname"] = firstname
br.form["lastname"] = lastname
br.form["mobile"] = mobile
# start the timer
start_timer = time.time()
# submit the form
resp = br.submit()
print("Submitted.")
resp.read()
# stop the timer
latency = time.time() - start_timer
# store the custom timer
self.custom_timers["Add User"] = latency

2. The config.cfg file is setup to generate 2 asynchronous thread pools of 10 threads for about 400 seconds

config.cfg
run_time = 400
rampup = 0
results_ts_interval = 10
progress_bar = on
console_logging = off
xml_report = off
[user_group-1]
threads = 10
script = v_user.py
[user_group-2]
threads = 10
script = v_user.py

3. The code to add a new user in the app (adduser.js) uses the ‘async’ Node module to enforce sequential processing.

adduser.js
async.series([
function(callback)
{
collection = db.collection('phonebook', function(error, response) {
if( error ) {
return; // Return immediately
}
else {
console.log("Connected to phonebook");
}
});
callback(null, 'one');
},
function(callback)
// Insert the record into the DB
collection.insert({
"FirstName" : FirstName,
"LastName" : LastName,
"Mobile" : Mobile
}, function (err, doc) {
if (err) {
// If it failed, return error
res.send("There was a problem adding the information to the database.");
}
else {
// If it worked, redirect to userlist - Display users
res.location("userlist");
// And forward to success page
res.redirect("userlist")
}
});
collection.find().toArray(function(err, items) {
console.log("**************************>>>>>>>Length =" + items.length);
db.close(); // Make sure that the open DB connection is close
});
callback(null, 'two');
}
]);

4. To checkout auto-scaling the instance memory was kept at 128 MB. Also the scale-up policy was memory based and based on the memory of the instance exceeding 55% of 128 MB for 120 secs. The scale up based on CPU utilization was to happen when the utilization exceed 80% for 300 secs.

6

5. Check the auto-scaling policy

7

6. Initially as seen there is just a single instance

9

7. At around 48% of the script with around 623 transactions the instance is increased by 1. Note that the available memory is decreased by 640 MB – 128 MB = 512 MB.

10

8. At around 1324 transactions another instance is added

Note: Bear in mind

a) The memory threshold was artificially brought down to 55% of 128 MB.b) The app itself is not optimized for maximum efficiency

12

9. The Metric Statistics tab for the Autoscaling service shows this memory breach and the trigger for autoscaling

13

10. The Scaling history Tab for the Auto-scaling service displays the scale-up and scale-down and the policy rules based on which the scaling happened

14

11. If you go to the results folder for the Multi-mechanize tool the response and throughput are captured.

The multi-mechanize commands are executed as follows
To create a new project
multimech-newproject.exe adduser
This will create 2 folders a) results b) test_scripts and the file c) config.cfg. The v_user.py needs to be updated as required

To run the script
multimech-run.exe adduser

12.The results are shown below

a) Load Add User page (Latency)

Load Add User Page_response_times_intervals

b) Load Add User (Throughput)

Load Add User Page_throughput

c)Load Add User (Latency)

Add User_response_times_intervals

d) Load Add User (Throughput)

Add User_throughput

The detailed results can be seen at GitHub at multi-mechanize

13. Check the Monitoring and Analytics Page

a) Availability

16

b) Performance monitoring

15

So once the auto-scaling happens the application can be fine-tuned and for performance. Obviously one could do it the other way around too.

As can be seen adding NoSQL Databases like MongoDB, Redis, Cloudant DB etc. Setting up the auto-scaling policy is also painless as seen above.

Of course the real challenge in cloud applications is to make them distributed and scalable while keeping the applications themselves lean and mean!

See also

Also see
1.  Bend it like Bluemix, MongoDB with autoscaling – Part 1
3. Bend it like Bluemix, MongoDB with autoscaling – Part 3

You may like :
a) Latency, throughput implications for the cloud
b) The many faces of latency
c) Brewing a potion with Bluemix, PostgreSQL & Node.js in the cloud
d)  A Bluemix recipe with MongoDB and Node.js
e)Spicing up IBM Bluemix with MongoDB and NodeExpress
f) Rock N’ Roll with Bluemix, Cloudant & NodeExpress

a) Latency, throughput implications for the cloud

b) The many faces of latency

c) Design principles of scalable, distributed systems

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Designing a Scalable Architecture for the Cloud

The promise of the cloud is the unlimited computing power and storage capacities coupled with the pay-per-use policy. This makes the cloud particularly irresistible for hosting web applications and applications whose demand vary periodically. In order to take full advantage of the cloud the application must be designed for optimum performance. Though the cloud provides resources on-demand a badly designed application can hog resources and prove to be extremely expensive in the long run.

One of the first requirements for deploying applications on the cloud is that it should be scalable. Scalability denotes the ability to handle increasing traffic simply by adding more computing resources of the same kind rather than adding resources with greater horse power. This is also referred to scaling horizontally.

Assuming that the application has been sufficiently profiled and tuned for high performance there are certain key considerations that need to be taken into account while deploying on the cloud – public or private.  Some of them are being able to scale on demand, providing for high availability, resiliency and having sufficient safeguards against failures.

Given these requirements a scalable design for the Cloud can be viewed as being made up of the following 5 tiers of layers

The DNS tier – In this tier the user domain is hosted on a DNS service like Ultra DNS or Route 53. These DNS services distribute the DNS lookups geographically. This results in connecting to a DNS Server that is geographically closer to the user thus speeding the DNS lookup times. Moreover since the DNS lookups are distributed geographically it also builds geographic resiliency as far as DNS lookups are concerned

Load Balancer-Auto Scaling Tier – This tier is responsible for balancing the incoming traffic among compute instances in the cloud. The load balancing may be made on a simple round-robin technique or may be based on the actual CPU utilization of the individual instances. Typically at this layer we should also have an auto-scaling policy which will add more instances if the traffic to the application increases above a threshold or terminate instances when the traffic falls below a specific threshold.

Compute-Instance Tier – This layer hosts the actual application in individual compute instances on the cloud. It is assumed that the application has been tuned for maximum performance. The choice of small, medium or large CPU should be based on the traffic handling capacity of the instance type versus the cost/hr of the instance.

Cache Tier – This is an important layer in the cloud application where there are multiple instances. The cache tier provides a distributed cache for all the instances. With a distributed caching system like memcached it is possible to share global data between instances. The memcached application uses a consistent-hashing technique to distribute data among a set of participating servers. The consistent hashing method allow for handling of server crashes and new servers joining into the cache layer.

Database Tier – The Database tier is one of the most critical layers of the application. At a minimum the database should be configured in an active-standby mode. Ideally it is always better to have the active and standby in different availability zones to better handle disasters in a particular zone. Another consideration is have separate read replicas that handle reads to database while the primary database handles the write operations

Besides the above considerations it is always good to host the web application in different availability zone thus safeguarding against disasters in a particular region.

The Anatomy of Latency

Latency is a measure of the time delay experienced in a system. In data communications, latency would be measured as the round-trip delay between sending a packet and receiving response from the destination. In the world of web applications latency is the response time of a web site. In web applications latency is dependent on both the round trip time on the communication link and also the processing time of the application, Hence we could say that

latency = 2 * round trip time  + Processing time

The round trip time is probably less susceptible to increasing traffic than the processing time taken for handling the increased loads. The processing time of the application is particularly pernicious in that it susceptible to changing traffic. This article tries to analyze why the latency or response times of web applications typically increase with increasing traffic. While the latency increases exponentially as the traffic increases the throughput increases to a point and then finally starts to drop substantially.  The ideal situation for all internet applications is to have the ability to scale horizontally allowing the application to handle increasing traffic by simply adding more commodity servers to the application while maintaining the response times to acceptable limits. However in the real world this never happens.

The price of Latency

Latency hurts business. Amazon found out that every 100 ms of latency cost them 1% of sales.  Similarly Google realized that a 0.5 second increase in search results dropped the search traffic by 20%. Latency really matters.    Reactions to bad response times in web sites range from minor annoyance to complete frustration and loss of users and business.

The cause of processing latency

One of the fundamental requirements of scalable systems is that they should be loosely coupled. The application needs to have a modular architecture with well defined interfaces with the other modules.  Ideally, applications which have been designed with fairly efficient processing times of the order of O(logn) or O(nlogn)  will be immune to changing loads but will be impacted by changes in number of data elements  So the algorithms adopted by the applications themselves do not contribute the increasing response times for increase traffic. So finally what really is the performance bottleneck for increasing latencies and decreasing throughput for increased loads?

Contention- the culprit

One of the culprits behind the deteriorating response is the thread locking and resource contention. Assuming that application has been designed with Reader-Writer locks or message queue based synchronization mechanism then the time spent in waiting for resources to become free, while traffic increases, will result in the degraded performance.

Let us assume that the application is read-heavy, write-light and has implemented Reader-Writer synchronization mechanism. Further let us assume that a write-thread locks a resource for 250 ms.  At low loads we could have 4 such threads each locking the resource for 250 ms for a total span of 1s.  Hence in 1s there can be a maximum of 4 threads each of which has executed a write lock for 250 ms for a total of 1s. In this interval all reader threads will be forced to wait. When the traffic load is low the number of reader threads waiting for the lock to be released will be low and will not have much impact but as the traffic increases the number of threads that are waiting for the lock to be released will be increase. Since a write lock takes a finite amount of time to complete processing we cannot go over the 4 write threads in 1 second with the given CPU speed.

However as the traffic further increases the number of waiting threads not only increases but also consume CPU and memory. Now this adversely impacts the writer threads which find that they have lesser CPU cycles and less memory and hence take longer times to complete. This downward cycle worsens and hence results in an increase in the response time and a worsening throughput in the application.

The solution to this problem is not easy. We need to revisit the areas where the application blocks waiting for something. Locking besides causing threads to wait also adds the overhead of getting scheduled prior to being able to execute again. We need to minimize the time a thread holds a resource before allowing others threads access to it.

Find me on Google+

The Business of Cloud Computing

Cloud Computing is the spanking new paradigm in the world of computing. The key differentiator in this technology is that the enterprise only pays for the amount of resources used – be it CPUs, memory or databases. While it does away with Capital Expenditure for organizations by providing a utility model of pricing it results in recurring Operating Expenses for the organization. However the important thing is that the cloud grows and shrinks according to demand and hence the cost to the organization is dependent on the traffic it generates. While web based applications are prime candidates for the cloud other equally eligible candidates are batch processing jobs, nightly builds or CPU intensive analytics. Except for the case of web application, for other types of applications, a reasonable estimate can be made on the resources needed and appropriate choice be made on the cloud.

This article looks at web applications where the traffic on the site can be seasonal and can vary during periods of the day. Besides web sites should be capable of handling bursty traffic with enormous loads at particular intervals.

The important consideration for web sites is to ensure that the application is truly optimized and exhibits the property of scaling horizontally. While it appears that scaling out will occur for any reasonably designed application the issue is that as the number of hits increase on the web site the response time increases steeply but the number of transactions per second plateaus at some particular load level and does not increase after that. It can be said that for a certain CPU instance configuration the peak transaction per second will reach a particular limit and cannot be increased any further. However the cloud also provides a key component namely the load balancer along with auto scaling which create a new instances when this threshold is reached.

What are the business considerations that need to be taken while designing for the cloud?

One needs to be conservative in choosing the instance type. While larger instances will provide a better performance they also cost more. Hence the instance type should be large enough and no larger. It would be wasteful of using extremely large instances where the last instance only uses a part of the total traffic while costing a lot more.

The analogy is that if 16 units if task have to be performed it is better to have a small CPU instance capable of handling 3 units of task requiring a total of 6 CPUs (6 * 3 = 18 > 16) rather than having a large CPU instance capable of handling 5 units of task requiring a total of 4 large CPUs (5 * 4= 20> 16). The second option would result in a waste processing power.

Assuming that the upfront cost to the organization for hosting the website in-house is ‘P’ and the cost amortized over a period of 1 years is ‘p’ per hour. Further if the instance cost is ‘c’ and ‘n’ is number of instances needed to support the projected demand and the revenue to the organization hosting the website is ‘r’ per 1000 hits then a cloud deployment will make business sense when

(rh– n * ch) – ph > 0 where h is the hour

As long as the right hand side is positive the organization will profit. However as the traffic increases and the throughput of website plateaus the enterprise will hit a ‘window of diminishing returns’.

However if the performance of the application is poor and the number of instances needed to support the traffic is disproportionately large then the above equation will be negative and will result in loss to the organization.

(rh – n * ch) – ph < 0

Hence deployment to the cloud besides requiring a strong technical background also needs a sound business sense in order to reap the benefits of the cloud.

Find me on Google+

Designing for Cloud Worthiness

Cloud Computing is changing the rules of computing to the enterprise. Enterprises are no longer constrained by capital costs of upfront equipment purchase. Rather they can concentrate on the application and deploy it on the cloud and pay in a utility style based on usage. Cloud computing essentially presents a virtualized platform on which applications can be deployed.

The Cloud exhibits the property of elasticity by automatically adding more resources to the application as demand grows and shrinking the resources when the demand drops. It is this property of elasticity of the cloud and the ability to pay based on actual usage that makes Cloud Computing so alluring.

However to take full advantage of the Cloud the application must use the available cloud resources judiciously.  It is important for applications that are to be deployed on the cloud to have the property of scaling horizontally.  What this implies is that the application should be able to handle more transactions per second when more resources are added to application. For example if the application has been designed to run in a small CPU instance of 1.7GHz,32 bit and 160 GB of instance storage with a throughput of 800 transactions per second then one should be able to add 4 such instances and scale to handling 4000 transactions per second.

However there is a catch in this. How does one determine what should be theoretical limit of transactions per second for a single instance?  Ideally we should maximize the throughput and minimize the latency for each instance prior to going to the next step of adding more instances on the cloud. One should squeeze the maximum performance from the application in the instance of choice prior to using multiple instances on the cloud. Typical applications perform reasonably well under small loads but as the traffic is increased the response time increases and the throughput also starts dipping.

There is a need to run some profiling tools and remove bottlenecks in the application. The standard refrain for applications to be deployed on the cloud is that they should be loosely coupled and also be stateless. However, most applications tend to be multi-threaded with resource sharing in various modules.  The performance of the application because of locks and semaphores should be given due consideration. Typically a lot of time wasted in the wait state of threads in the application. A suitable technique should be used for providing concurrency among threads. The application should be analyzed whether it read-heavy and write-light or write-heavy and read-light. Suitable synchronization techniques like reader-Writer, message queue based exclusion or monitors should be used.

I have found callgrind for profiling and gathering performance characteristics along with KCachegrind for providing a graphical display of performance times extremely useful.

Another important technique to improve performance is the need to maintain in-memory cache of frequently accessed data. Rather than making frequent queries to the database periodic updates from the database need to be made and stored in in-memory cache. However while this technique works fine with a single instance the question of how to handle in-memory caches for multiple instances in the cloud represents quite a challenge. In the cloud when there are multiple instances there is a need for a distributed cache which is  shared among multiple instances. Memcached is appropriate technique for maintaining a distributed cache in the cloud.

Once the application has been ironed out for maximum performance the application can be deployed on the cloud and stress tested for peak loads.

Some good tools that can be used for generating loads on the application are loadUI and multi-mechanize. Personally I prefer multi-mechanize as it uses test scripts that are based on Python which can be easily modified for the testing. One can simulate browser functionality to some extent with Python in multi-mechanize which can prove useful.

Hence while the cloud provides CPUs, memory and database resources on demand the enterprise needs to design applications such that the use of these resources are done judiciously. Otherwise the enterprise will not be able to reap the benefits of utility computing if it deploys inefficient applications that hog a lot of resources without appropriate revenue generating performance.

INWARDi Technologies

Cloud Computing – Design Considerations

Cloud Computing is definitely turning out to be the proverbial carrot for enterprises to host their applications on the public cloud. The cloud promises many benefits to users of the cloud. Cloud Computing obviates the need for upfront capital expenses for computing infrastructure, real estate and maintenance personnel. This technology allows for scaling up or scaling down as demand on the application fluctuates.

While the advantages are many, migrating application onto the cloud is no trivial task.  The cloud is essentially composed of commodity servers. The cloud creates multiple instances of the application and runs it on the same or on different servers. The benefit of executing in parallel is that the same task can be completed faster. The cloud offers enterprises the ability to quickly scale to handle increasing demands,

But the process of deploying applications on to the cloud requires that the application be re architected to take advantage of this parallelism that the cloud provides. But the ability to handle parallelization is no simple task. The key attributes that need to be handled by distributed systems is the need for consistency and availability. If there are variables that need to be shared across the parallel instances then the application must make special provisions to handle this and ensure consistency. Similarly the application must be designed to handle failures.

Applications that are intended to be deployed on the cloud must be designed to scale-out rather than having the ability to scale-up. Scaling up refers to the process of adding more horse power by way of faster CPUs, more RAM and faster throughput.  But applications that need to be deployed on the cloud need to have the ability to scale out or scale horizontally where more servers are added without any change in processing horsepower.  The design for horizontal scalability is the key to cloud computing architectures.

Some of the key principles to keep in mind while designing for the cloud is to ensure that the application is composed of loosely coupled processes preferably based on SOA principles.  While a multi-threaded architecture where resource sharing through mutexes works in monolithic applications such a architecture is of no help when there are multiple instances of the same application running on different servers. How does one maintain consistency of the shared resource across instances?  This is a tough problem to solve. Ideally the application should be thread safe and should be based on a shared – nothing kind of architecture. One such technique is to use queues that the cloud provides as a means of sharing across instances. However this may impact the performance of the system.  Other methods include using ‘memcached’ which has been used successfully by Facebook, Twitter, Livejournal, Zynga etc deployed on the cloud. Still another method is to use the Map-Reduce algorithm where the variables across instances are handled by ‘map’ and the ‘reduce’ part handles the consistency across instances.

Another key consideration is the need to support availability requirements. Since the cloud is made up of commodity hardware there is every possibility of servers failing.  The application must be designed with inbuilt resilience to handle such failures. This could by designing active-standby architecture or by providing for checkpointing so that application can restart from some known previous point.

Hence while cloud computing is the way to go in the future there is a need to be able to carefully design the application so that full advantage of the cloud can be taken.