The Future is C-cubed: Computing, Communication and the Cloud

We are the on the verge of the next great stage of technological evolution. The trickle of different trends clearly point to what I would like to term as C-cubed (C³) representing the merger of computing technologies, communication advances and the cloud.

There are no surprises in this assessment. Clearly it does not fall into the category of Chaos theories’ “butterfly effect” where a seemingly unrelated cause has a far-reaching effect, typically the fluttering of a butterfly in Puerto Rico is enough to cause an earthquake in China.

The C-cubed future that seems very probable is based on the advances in mobile broadband, advances in communication and the emergence of cloud computing. A couple of years back Scott McNealy of Sun Microsystems believed that the “network is the computer”. Now with the introduction of Google’s Chrome book this trend will soon catch on. In fact I can easily visualize a ubiquitous device which I would like to call as the “cloudbook”.

The cloudbook would be a device that would resemble a tablet like the iPad, Playbook etc but would carry little or no hard disk. Local storage will be through USB devices or SD-Cards which these days come with large storage in the range of 80GB and above. The Cloud book would have no operating system. It would simply have a bootstrap program which will allow the user to choose from several different Operating Systems (OS) namely Window’s, Linux, Solaris and Mac etc which will execute on the cloud. All applications will be executed directly on the cloud. The user will also store all his programs and data on the cloud. Some amount of offline storage will be possible in portable storage devices like the memory stick, SD card etc.

The cloudbook will be a ubiquitous device. It will access the internet through mobile broadband. The access could be through a GPRS, WCDMA or a LTE connection. With the blazing speeds of 56 Mbps promised by LTE the ability to access the public cloud for executing programs and for storing of data is extremely feasible. Access should be almost instantaneous. Using the mobile broadband for access and the cloud for computing and storage will be the trend in the future.

Besides its use for computing, the cloudbook will also be used for making voice or video calls. This is the promise of IP Multimedia Systems (IMS) technology. IMS is a technology that has been in the wings for quite some time. IMS technology envisages an all-IP Core Network that will be used for transporting voice, data and video. As the speeds of the IP pipes become faster and the algorithms to iron out QOS issues are worked out the complete magnificence of the vision of IMS will become a reality and high speed video applications will become common place.

The cloudbook will use the WCDMA, 3G, network to make voice and video calls to others. The 3G RNC or the 4G eNodeB’s will enable the transmission and reception of voice, data or video to and from the Core Network. LTE networks will either user Circuit Switched Fall Back (CSFB) or VOLTE (Voice over LTE) to transfer voice and video over either the 3G network or over the Evolved Packet Core (EPC). In the future high speed video based calls and applications will be extremely prevalent and a device like the cloudbook will increase the user experience manifold.

Besides IMS also envisions Applications Server (AS) spread across the network providing other services like Video-on-Demand, Real-time multi player gaming. It is clear that these AS may actually be instances sitting off the public cloud.

Hence the future clearly points to a marriage of computing, communication and the cloud where each will have a symbiotic relationship with the other resulting in each other. The network can be visualized as one large ambient network of IMS Call Session Control Function Servers (CSCFs) , Virtualized Servers on the Cloud and Application servers (AS).

Mobile broadband will become commonplace and all computing and communication will be through 3G or 4G networks.

The future is almost here and the future is C-cubed (C³)!!!

Published in Telecom Asia, Jul 8 2011 – The Future is C-cubed

Find me on Google+

Managing Multi-Region Deployments

If there is one lesson from this year’s major Amazon’s EC2 outage it is “don’t deploy all your application instances in a single region”. The outage has clearly demonstrated that entire regions are not immune to disasters. Thus, it has become imperative for designers and architects to deploy applications spanning major regions. Currently there are 4 major regions – US-West, US-East, Europe and APAC.

Both fundamentally and from a strategic point of view it makes sense to deploy web applications in different regions for e.g. both in US-East and US-West. This will build into the application a certain amount of geographical resiliency . In this way you are protected from major debacles like the Amazon’s EC2 outage in April 2011 or a possible meteor crashing and burning in one of the data centers.

Deploying instances in different regions is almost like minimizing risk by diversifying your portfolio. The design of application besides including other methods of fault tolerance should also incorporate geographical resilience.

Currently Amazon’s ELB does not support load balancing across regions. The ELB can only distribute traffic among instances in different availability zones of a region. The solution is to go for other DNS services like UltraDNS, DNSMadeEasy or DynDNS.

These DNS services provide geoIP based load balancer that can distribute traffic based on the region from which it originated. Currently there are 4 major regions in the world – US-East, US-West, Europe and APAC. GeoIP based traffic distribution besides balancing the load based on origination also has the added benefit of getting to the application closest to the origination thus reducing latencies.

The GeoIP based traffic distributor can distribute traffic to the closest region. An Amazon’s ELB can then internally distribute the traffic among the instances within that region. For a look at some typical problems in multi-region cloud deployments do look at my post “Cache-22”

INWARDi Technologies

Find me on Google+

Design Principles of Scalable, Distributed Systems

Designing scalable, distributed systems involves a completely different set of principles and paradigms when compared to regular monolithic client-server systems. Typical large distributed systems of Google, Facebook or Amazon are made up of commodity servers. These servers are expected to fail, have disk crashes, run into network issues or be struck by any natural disasters.

Rather than assuming that failures and disasters will be the exception these systems are designed assuming the worst will happen. The principles and protocols assume that failures are the rule rather than the exception. Designing distributed systems to accommodate failures is the key to a good design of distributed scalable systems. A key consideration of distributed system is the need to maintain consistency, availability and reliability. This of course is limited by the CAP Theorem postulated by Eric Brewer which states that a system can only provide any two of “consistency, availability and partition tolerance” fully,

Some key techniques in distributed systems

Vector Clocks: An obvious issue in distributed systems with hundreds of servers is that each server will have its own clock running at a slightly different rate. It is difficult to get a view of a global time considering that each system has slightly different clock speeds. How does one determine causality in such a distributed system? The solution to this problem is provided by Vector Clocks devised by Leslie Lamport. Vector Clocks provide a way of determining the causal ordering of events. Each system maintains an array of timestamps based on its own internal clock which it keeps incrementing. When a system needs to send an event to another system it sends the message with the timestamp generated from its internal array. When the receiving system receives the message at a timestamp that is less than the sender’s timestamp it increments its own timestamp by 1 and continues to increments its internal array through its own internal clock. In the figure the event sent from System 1 to System 2 is assumed to be fine since the timestamp of the sender “2” < “15. However when System 3 sends an event with timestamp 40 to System 2 which received it timestamp 35, to ensure a causal ordering where System 2 knows that it received the event after it was sent from System the vector clock is incremented by 1 i.e 40 + 1 = 41 and System 2 increments at it did before, This ensures that partial ordering of events is maintained across systems.

Vector clocks have been used in Amazon’s e-retail website to reconcile updates. The use of vector clocks to manage consistency has been mentioned in Amazon’s Dynamo Architecture

Distributed Hash Table (DHT): The Distributed Hash Table uses a 128 bit hash mechanism to distribute keys over several nodes that can be conceptually assumed to reside on the circumference of a circle. The hash of the largest key coincides with the hash of the smallest key. There are several algorithms that are used to distribute the keys over this conceptual circle. One such algorithm is the Chord System. These algorithms try to get to the exact node in the smallest number of hops by storing a small amount of local data at each node. The Chord System maintains a finger table that allows it to get to the destination node in O (log n) number of hops. Other algorithms try to get to the desired node in O (1) number of hops. Databases like Cassandra, Big Table, and Amazon use a consistent hashing technique. Cassandra spreads the keys of records over distributed servers by using a 128 bit hash key.

Quorum Protocol: Since systems are essentially limited to choosing two of the three parameters of consistency, availability and partition tolerance, tradeoffs are made based on cost, performance and user experience. Google’s BigTable chooses consistency over availability while Amazon’s Dynamo chooses ‘availability over consistency”. While the CAP theorem maintains that only 2 of the 3 parameters of consistency, availability and partition tolerance are possible it does not mean that Google’s system does not support some minimum availability or the Dynamo does not support consistency. In fact Amazon’s Dynamo provides for “eventual consistency” by which data become consistent after a period of time.

Since failures are inevitable and a number of servers will fail at any instant of time writes are replicated across many servers. Since data is replicated across servers a write is considered “successful” if the data can be replicated in N/2 +1 servers. When the acknowledgement comes from N/2+1 server the write is considered successful. Similarly a quorum of reads from >N/2 servers is considered successful. Typical designs have W+R > N as their design criterion where N is the total number of servers in the system. This ensures that one can read their writes in a consistent way. Amazon’s Dynamo uses the sloppy quorum technique where data is replicated on N healthy nodes as opposed to N nodes obtained through consistent hashing.

Gossip Protocol: This is the most preferred protocol to allow the servers in the distributed system to become aware of server crashes or new servers joining into the system, Membership changes and failure detection are performed by propagating the changes to a set of randomly chosen neighbors, who in turn propagate to another set of neighbors. This ensures that after a certain period of time the view becomes consistent.

Hinted Handoff and Merkle trees: To handle server failures replicas are sometimes sent to a healthy node if the node to which it was destined was temporarily down. For e.g. data destined for Node A is delivered to Node D which maintains a hint in its metadata that the data is to be eventually handed off to Node A when it is healthy. Merkle trees are used to synchronize replicas amongst nodes. Merkle trees minimize the amount of data that needs to be transferred for synchronization.

These are some of the main design principles that are used while designing scalable, distributed systems. A related post is “Designing a scalable architecture for the cloud”

Find me on Google+

Designing a Scalable Architecture for the Cloud

The promise of the cloud is the unlimited computing power and storage capacities coupled with the pay-per-use policy. This makes the cloud particularly irresistible for hosting web applications and applications whose demand vary periodically. In order to take full advantage of the cloud the application must be designed for optimum performance. Though the cloud provides resources on-demand a badly designed application can hog resources and prove to be extremely expensive in the long run.

One of the first requirements for deploying applications on the cloud is that it should be scalable. Scalability denotes the ability to handle increasing traffic simply by adding more computing resources of the same kind rather than adding resources with greater horse power. This is also referred to scaling horizontally.

Assuming that the application has been sufficiently profiled and tuned for high performance there are certain key considerations that need to be taken into account while deploying on the cloud – public or private. Some of them are being able to scale on demand, providing for high availability, resiliency and having sufficient safeguards against failures.

Given these requirements a scalable design for the Cloud can be viewed as being made up of the following 5 tiers of layers

The DNS tier – In this tier the user domain is hosted on a DNS service like Ultra DNS or Route 53. These DNS services distribute the DNS lookups geographically. This results in connecting to a DNS Server that is geographically closer to the user thus speeding the DNS lookup times. Moreover since the DNS lookups are distributed geographically it also builds geographic resiliency as far as DNS lookups are concerned

Load Balancer-Auto Scaling Tier – This tier is responsible for balancing the incoming traffic among compute instances in the cloud. The load balancing may be made on a simple round-robin technique or may be based on the actual CPU utilization of the individual instances. Typically at this layer we should also have an auto-scaling policy which will add more instances if the traffic to the application increases above a threshold or terminate instances when the traffic falls below a specific threshold.

Compute-Instance Tier – This layer hosts the actual application in individual compute instances on the cloud. It is assumed that the application has been tuned for maximum performance. The choice of small, medium or large CPU should be based on the traffic handling capacity of the instance type versus the cost/hr of the instance.

Cache Tier – This is an important layer in the cloud application where there are multiple instances. The cache tier provides a distributed cache for all the instances. With a distributed caching system like memcached it is possible to share global data between instances. The memcached application uses a consistent-hashing technique to distribute data among a set of participating servers. The consistent hashing method allow for handling of server crashes and new servers joining into the cache layer.

Database Tier – The Database tier is one of the most critical layers of the application. At a minimum the database should be configured in an active-standby mode. Ideally it is always better to have the active and standby in different availability zones to better handle disasters in a particular zone. Another consideration is have separate read replicas that handle reads to database while the primary database handles the write operations

Besides the above considerations it is always good to host the web application in different availability zone thus safeguarding against disasters in a particular region.

Getting started with memcached-libmemcached

Memcached is the free, high performance, open source distributed caching system. It was designed to alleviate a high number of database queries by caching the data in memory. Since memcached is a distributed caching system the application data is distributed across servers. Data is inserted and retrieved from the distributed cache using a key,value pair.

Memcached uses a consistent hashing scheme to distribute the keys across the servers. The consistent hashing algorithm handles server crashes and servers joining-in by redistributing the keys across the necessary servers.

This article focuses on getting started with memcached-libmecached and making the process as painless as possible. After you have downloaded and installed memcached & libmemcached you are good to go.

First start 4 memcached servers
$ memcached -p 11221 &
$ memcached -p 11222 &
$ memcached -p 11223 &
$ memcached -p 11223 &

They start on the local host. (For full options check memcached -help)
Verify they are running using ps -ef.

libmemcached is the C client which can be used to connect to the memcached servers which you have started above.
A snippet of the libmemcached code client_test1.c is shown
client_test1.c
….
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
rc= memcached_set(memc, key, strlen(key),in_value, strlen(in_value),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” the”, strlen(” the”),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” people here”, strlen(” people here”), time_t)0, (uint32_t)0);
out_value= memcached_get(memc, key, strlen(key),&value_length, &flags, &rc);
printf(“Out value is: %s\n”,out_value);
memcached_server_list_free(servers);
free(out_value);
….

When you execute this client you should see
$ Out value is: We the people here

You can check which server the key is stored by doing
$ memdump –servers localhost:11221
fig
$memdump –servers localhost:11222
$
This shows that the key data is stored in the 1st servers localhost:11221

Now assume that we store a lot more data through client_test2.c
client test2.c
…..
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
for (i=0; i < 100; i++)
{
sprintf(str,”%d”,i);
sprintf(str1,”%d”,2*i);
printf(“String %s string1 %s\n”,str,str1);
printf(“reached here\n”);
rc= memcached_set(memc,str, strlen(str), str1, strlen(str1),(time_t)0, (uint32_t)0);
test_true(rc == MEMCACHED_SUCCESS);
}
for(i=0; i < 10; i++)
{
printf(“Input value:”);
scanf(“%s”,testvalue);
printf(“Value to search for %s”,testvalue);
value= (uint32_t *)memcached_get(memc, testvalue, strlen(testvalue), &value_length, &flags, &rc)
test_true(rc == MEMCACHED_SUCCESS);
printf(“Value is %s\n”,value);
}
…..

After executing this when we dump the key values from the servers we will see
$ memdump –servers localhost:11221
97
94
92
89
….
….

Similarly
$ memdump –servers localhost:11222
99
98
91
87
86
….
….

Hence the keys are hashed across servers. The consistent hashing mechanism takes O(log(n)) to get to cache server as against a naive hashing scheme which would take O(1).

Happy memcaching …

Find me on Google+

When NoSQL makes better sense than MySQL

In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional databases like MySQL, ORACLE, PostgreSQL etc. While the traditional databases are designed to preserve the ACID (atomic, consistent, isolated and durable) properties of data, these databases are capable of only small and frequent reads/writes.

However when there is a need to scale the application to be capable of handling millions of transactions the NoSQL model works better. There are several examples of such databases – the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB & MongoDB. These databases are based on a large number of regular commodity servers. Accesses to the data are based on get(key) or set(key,value) type of APIs.

The database is itself distributed across several commodity servers. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. In this method the key is hashed efficiently to one of the servers which can be visualized as lying on the circumference of the circle. The Chord System is one such example of the DHT algorithm. Once the destination server is identified the server does a local search in its data for the key value. Hence the key benefit of the DHT is that it is able to spread the data across multiple servers rather than having a monolithic database with a hot standby present.

The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases.

However the design of distributing data across several commodity servers has its own challenges, besides the ability to have an appropriate function to distribute the queries to. For e.g. the NoSQL database has to be able handle the requirement of new servers joining the system. Similarly since the NoSQL database is based on general purpose commodity servers the DHT algorithm must be able to handle server crashes and failures. In a distributed system this is usually done as follows. The servers in the system periodically convey messages to each other in order to update and maintain their list of the active servers in the database system. This is performed through a method known as “gossip protocol”

While databases like NoSQL, HBase, Dynamo etc do not have ACID properties they generally follow the CAP postulate. The CAP (Consistency, Availability and Partition Tolerance) theorem states that it is difficult to achieve all the 3 CAP features simultaneously. The NoSQL types of databases in order to provide for availability, typically also replicates data across servers in order to be able to handle server crashes. Since data is replicated across servers there is the issue of maintaining consistency across the servers. Amazon’s Dynamo system is based on a concept called “eventual consistency” where the data becomes consistent after a few seconds. What this signifies is that there is a small interval in which it is not consistent.

The NoSQL since it is non-relational does not provide for the entire spectrum of SQL queries. Since NoSQL is not based on the relational model queries that are based on JOINs must necessarily be iterated over in these applications. Hence the design of any application that needs to leverage the benefits of such non-relational databases must clearly separate the data management layer from the data storage layer. By separating the data management layer from how the data is stored we can easily accrue the benefits of databases like NoSQL.

While NoSQL kind of databases clearly have an excellent advantage over regular relational databases where high performance and scalability are key requirements the applications must be appropriately be tailored to take full advantage of the non-relation and distributed aspect of the database. You may also find the post “To Hadoop, or not to Hadoop” interesting.

Find me on Google+

Scaling out

Web Applications have challenges that are unique to the domain of the web. The key differentiating fact between internet technologies and other technologies is the need to scale up to handle sudden and large increases in traffic. While telecommunication and data communication also have the need to handle high traffic these products can be dimensioned on some upper threshold. Typical web applications have the need to provide low latencies, handle large throughput and also be extremely scalable to changing demands.

The ability to scale seems trivial. It appears that one could just add more CPU horsepower and throw in memory and bandwidth in large measure to get the performance we need. But unfortunately this is not as simple as it appears. For one, adding more CPU horsepower may not necessarily result in better performance. A badly designed application will only improve marginally.

Some applications are “embarrassingly parallel”. There is no dependency of one task on another. For example if the task is to search for a particular string in documents or one that requires the conversion from AVI to MPEG format then this can be done in parallel. In a public cloud this could be achieved by running more instances.

However, most real world applications are more sequential than parallel. There is a lot of dependency of data between the modules of the application. When multiple instances have to run in a public cloud the design considerations can be quite daunting.

For example if we had an application with parts 1,2,3,4 as follows

Now let us further assume that this is a web application and thousands of requests come to it. For simplicity sake, if we assume that for each request there is counter that has to be incremented. How does one keep track of the total requests that are coming to application? The challenge is how to manage such a global counter when there are multiple instances. In a monolithic application this does not pose a problem. But with multiple instances handling web requests, each having its own copy of this counter the design becomes a challenge

Each instance has its own copy of the counter which it will update based on the requests that come to that instance through a load balancer. However, how does one compute the total number of requests that e come to all the instances

One possible solution is to use the memcached approach. Memcached was developed as solution by Danga Corporation for Livejournal. Memcached is a distributed caching mechanism that stores data in multiple participating servers. Memcached has some simple API calls like get(key) and set (key,value) . Memcached uses a consistent hashing mechanism which hashes the key to one of the servers among several servers. This method is known as the Distributed Hashing Table (DHT) by which it is able to distribute the keys to one of the servers. The Consistent Hashing technique is able to handle server crashes and new servers joining in the distributed cache. Since data is distributed among servers participating in the distributed cache when a server crashes all its data is distributed to the remaining servers. Similarly a server joining in the distributed cache also distributes some of the data to it. Memcached has been used in Facebook, Zynga and Livejournal.