Where is the Cloud Computing bus going?

delorean_19813Technological innovation patterns have often repeated themselves in history. So it is with Cloud Computing. Familiar patterns of change seem to emerge today

Here are some of main trends that I see in Cloud Computing

Advent of containers: Containers are the new hot topic in cloud computing. In virtualization guest OS’es run separately. Running separate guest OS over the hypervisor is associated with a lot of overhead for each of the heavy weight OS’es. Containers can be used as an alternative to OS-level virtualization to run multiple isolated systems on a single host. Containers within a single operating system are much more efficient being light weight while being able to provide the same level of isolation. Containers run the same kernel as the host. Here is an interesting article on containers Containers, not virtual machines are the future of the cloud.

In many ways this containers over VM innovation pattern is reminiscent of the advantages of lightweight ‘threads’ over the heavy and slow ‘process’ approach in the OS world.  It is inevitable that containers will eventually score over VMs

Open ‘something’ over proprietary’ness: Technology over the decades has always moved into an ‘open’ approach over proprietary solutions. Hence, for example, we have OpenStack for creating instances, provisioning storage, network to do many things that are being done separately by VMWare, Citrix, Hyper-V. The intent is to have a common approach over several disparate approaches. In the networking world there is OpenFlow which tries to have a uniform interface to the many different standards maintained by the Ciscos, Junipers and Brocades of the world.  There are also other technologies like OpenCV (Computer Vision processing), Open VPN (VPN protocol) etc. In all these approaches there is either to move to unify or to provide a layer over and above the disparate approaches.  I am not sure whether Openstack will prevail, only time will tell. I personally think we will move to a level abstraction that will be even above that of Open Stack.

Software Defined Everything: Cloud Computing started with the need to be able to provision computing resources through a user interface or the Web portal. This was made possible, thanks to virtualization. Users could now define and request computing resources. Soon this led to the need for being able to programmatically request storage. The trick in storage is to do ‘thin-provisioning’ or to provision resources that barely satisfies the needs of the application. The application will be able to request more storage programmatically. Not to be outdone, networking followed suit when Software Defined Networking became a reality when Stanford and University of California came with the Open Flow protocol. We have now entered into the era of Software Defined Datacenter. This is a dominant theme in Cloud Computing.

These are some of the predominant trends that are emerging in the Cloud Computing arena.

I have spent more than 2 decades of my career in telecom, implementing telecom protocols, starting in the mid-1980s. The mid 1980s was the time when digital switches started to emerge. This was followed by a spate of protocols and dizzying innovations like mobile telephony, ISDN, Intelligent Networks, Softswitch, UMTS,3G, HSDPA, LTE etc.

I personally think that Cloud computing, to use a very frayed and hackneyed term, is at a similar ‘inflexion point’. Trends are emerging and we will soon be caught in the maelstrom of rapid change and innovation.

In this post I am going to do a Marty McFly of the ‘Back to Future’ trilogy. I am going to set the clock of the Delorean DMC-12 to 2020 and ‘Whoosh…..’

21 Apr 2020:

It is 21 Apr 2020 and a sunny day.  Here is a look at the Cloud Computing landscape

  • The Organization of Cloud Computing Standards (OCCS) now sets and governs the standards for all Cloud Providers of the world
  • Common APIs govern provisioning of instances on the cloud regardless of the Cloud Provider. Instances are defined by RPE values, RAM and IOPS, LB, DNS requirements
  • Networking bandwidth, security and storage are also standards based
  • Enterprises use a ‘diffuse deployment’ strategy where the organization’s workloads are deployed to multiple cloud providers.
  • Workloads are Cloud Provider agnostic.
  • Enterprise applications themselves may span multiple cloud providers for e.g. the e-commerce in Cloud Provider 1, Analytics on HPC instances on Cloud Provider 2 and secure applications on Private Cloud of Cloud Provider 3. Appropriate contracts are maintained between the Cloud Providers for charging for the usage.
  • Algorithms are used by enterprises to deploy workloads to cloud providers. The algorithms match the SLA and cost requirements of the application with those offered by the cloud provider to minimize the cost while meeting the SLA requirements of the applications.
  • Compute, storage and networking costs fluctuate and enterprises use algorithms to optimize the deployment of workloads. Workloads are migrated to take advantage of these price changes
  • Consolidation and acquisitions happen at an alarming pace. Cloud providers, storage, network and HPC providers aslo compete fiercely
  • Cloud providers are swallowed by others and some lose out. The battle scene is bloody

Time to get back to Delorean. This time the clock on Delorean is set to 2025

18 Sep 2025

Today it is 18 Sep 2025, and it is sunny again, coincidentally.

  • Cloud Computing is dead, mate. These days technology has moved to ‘Cloud Computing in a box’.
  • The technology of these times are ‘Haze works’ where the computation happens in the stratosphere over the ether …

So much for looking into the future. It is now time to get back to the reality of VMs

Presentation on Wireless Technologies – Part 2

Here is a continuation of my earlier presentation on Wireless Technologies – Part 1. These presentations trace the evolution of telecom from basic telephony all the way to the advances in LTE.

Find me on Google+

When NoSQL makes better sense than MySQL

In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional databases like MySQL, ORACLE, PostgreSQL etc. While the traditional databases are designed to preserve the ACID (atomic, consistent, isolated and durable) properties of data, these databases are capable of only small and frequent reads/writes.

However when there is a need to scale the application to be capable of handling millions of transactions the NoSQL model works better.  There are several examples of such databases –  the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB  & MongoDB. These databases are based on a large number of regular commodity servers.  Accesses to the data are based on get(key) or set(key,value) type of APIs.

The database is itself distributed across several commodity servers. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. In this method the key is hashed efficiently to one of the servers which can be visualized as lying on the circumference of the circle. The Chord System is one such example of the DHT algorithm. Once the destination server is identified the server does a local search in its data for the key value.  Hence the key benefit of the DHT is that it is able to spread the data across multiple servers rather than having a monolithic database with a hot standby present.

The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases.

However the design of distributing data across several commodity servers has its own challenges, besides the ability to have an appropriate function to distribute the queries to. For e.g. the NoSQL database has to be able handle the requirement of new servers joining the system. Similarly since the NoSQL database is based on general purpose commodity servers the DHT algorithm must be able to handle server crashes and failures.  In a distributed system this is usually done as follows. The servers in the system periodically convey messages to each other  in order to update and maintain their list of the active servers in the database system.  This is performed through a method known as “gossip protocol”

While databases like NoSQL, HBase, Dynamo etc do not have ACID properties they generally follow the CAP postulate. The CAP (Consistency, Availability and Partition Tolerance) theorem states that it is difficult to achieve all the 3 CAP features simultaneously. The NoSQL types of databases in order to provide for availability, typically also replicates data across servers in order to be able to handle server crashes. Since data is replicated across servers there is the issue of maintaining consistency across the servers. Amazon’s Dynamo system is based on a concept called “eventual consistency” where the data becomes consistent after a few seconds. What this signifies is that there is a small interval in which it is not consistent.

The NoSQL since it is non-relational does not provide for the entire spectrum of SQL queries. Since NoSQL is not based on the relational model queries that are based on JOINs must necessarily be iterated over in these applications. Hence the design of any application that needs to leverage the benefits of such non-relational databases must clearly separate the data management layer from the data storage layer. By separating the data management layer from how the data is stored we can easily accrue the benefits of databases like NoSQL.

While NoSQL kind of databases clearly have an excellent advantage over regular relational databases where high performance and scalability are key requirements the applications must be appropriately be tailored to take full advantage of the non-relation and distributed aspect of the database. You may also find the post “To Hadoop, or not to Hadoop” interesting.

Find me on Google+

Scaling out

Web Applications have challenges that are unique to the domain of the web. The key differentiating fact between internet technologies and other technologies is the need to scale up to handle sudden and large increases in traffic. While telecommunication and data communication also have the need to handle high traffic these products can be dimensioned on some upper threshold. Typical web applications have the need to provide low latencies, handle large throughput and also be extremely scalable to changing demands.

The ability to scale seems trivial. It appears that one could just add more CPU horsepower and throw in memory and bandwidth in large measure to get the performance we need. But unfortunately this is not as simple as it appears. For one, adding more CPU horsepower may not necessarily result in better performance. A badly designed application will only improve marginally.

Some applications are “embarrassingly parallel”. There is no dependency of one task on another.  For example if the task is to search for a particular string in documents or one that requires the conversion from  AVI to MPEG format then this can be done in parallel. In a public cloud this could be achieved by running more instances.

However, most real world applications are more sequential than parallel. There is a lot of dependency of data between the modules of the application.  When multiple instances have to run  in a public cloud the design considerations can be quite daunting.

For example if we had an application with parts 1,2,3,4 as follows

Now let us further assume that this is a web application and thousands of requests come to it. For simplicity sake, if we assume that for each request there is counter that has to be incremented.  How does one keep track of the total requests that are coming to application?  The challenge is how to manage such a global counter when there are multiple instances.  In a monolithic application this does not pose a problem. But with multiple instances handling web requests, each having its own copy of this counter the design becomes a challenge

Each instance has its own copy of the counter which it will update based on the requests that come to that instance through a load balancer. However, how does one compute the total number of requests that e come to all the instances

One possible solution is to use the memcached approach.  Memcached was developed as solution by Danga Corporation for Livejournal.  Memcached is a distributed caching mechanism that stores data in multiple participating servers. Memcached has some simple API calls like get(key) and set (key,value) . Memcached uses a consistent hashing mechanism which hashes the key to one of the servers among several servers. This method is known as the Distributed Hashing Table (DHT) by which it is able to distribute the keys to one of the servers. The Consistent Hashing technique is able to handle server crashes and new servers joining in the distributed cache. Since data is distributed among servers participating in the distributed cache when a server crashes all its data is distributed to the remaining servers. Similarly a server joining in the distributed cache also distributes some of the data to it. Memcached has been used in Facebook, Zynga and Livejournal.

Find me on Google+