Dissecting the Cloud – Part 1

“The Cloud brings it with it the promise of utility-style computing and the ability to pay according to usage.

Cloud Computing provides elasticity or the ability to grow and shrink based on traffic patterns.

Cloud Computing does away with CAPEX and the need to buy infrastructure upfront and replaces it with OPEX model and so on”.

All this old news and has been repeated many times. But what exactly constitutes cloud computing? What brings about the above features? What are its building blocks of the cloud that enable one to realize the above?

This post tries to look deeper into the innards of the Cloud to determine what the cloud really is.

Before we get to this I would like to dwell on an analogy to understand the Cloud better.

Let us assume, Mr. A owns a large building of about 15,000 sq feet and about 100 feet tall. Let us assume that Mr. A wants to rent this building.

Now, assume that the door of this building opens to single, large room on the inside!

Mr. X comes to rent this building. If this was the case then poor Mr. X would have to pay through his nose, presumably, for the entire building even though his requirement would have been for a small room of about 600 x 600 feet. Imagine the waste of space. Moreover this would also have resulted in an enormous waste of electricity. Imagine the lighting needed. Also an inordinate amount of water would have to be utilized if this single, large room needed to be cleaned. The cost for all of this would have to be borne by Mr. X.

This is clearly not a pleasant state of affairs for either Mr. X or for the owner Mr. A of the building.

The solution to this is easy.  What Mr. A needs to do, is to partition the building into self-contained rooms (600 x 600 sq feet) with all the amenities. Each self-contained unit would need to have its own electricity and water meter.

Now Mr. A can rent rooms to different tenants on their need basis. This is a win-win situation both for Mr, A and Mr. X. The tenants only need to pay for the rooms they occupy and the electricity and water they consume.

This is exactly the principle behind cloud computing and is known as ‘virtualization’

There are 3 computing components that one must consider. CPU, Network and Storage. The below picture shows the virtualization of CPU,RAM, NIC (network card), Disk (storage)

Server-Virtualization-Logical-View

The Cloud is essentially made up of  anywhere between 100 servers to 100,000 servers. The servers are akin to the large building. Running a single OS and application(s) on the entire server is a waste of computing, storage and network resources.

Virtualization abstracts the hardware, storage and network through the use of software known as the ‘hypervisor’. On top of the hypervisor several ‘guest OSes’ can run. Applications can then run on these guest OSes.

Hence over the CPU (single, dual or multi-core) of the server,  multiple guest OS’es  can run each with its own set of applications

This is similar to partitioning the large CPU resource of the server into smaller units.

There are 3 main Virtualization technologies namely VMware, Citrix and MS Hyper-V

Here is a diagram showing the 3 main the virtualization technologies

thumb_server_virtualization_lrg

To be continued …


Find me on Google+

The dark side of the Internet

Published in Telecom Asia 26 Sep 2012 – The dark side of the internet

Imagine a life without the internet. You can’t! That’s how inextricably enmeshed the internet is in our lives. Kids learn to play “angry birds” on the PC before they learn to say “duh”, school children hobnob on Facebook and many of us regularly browse, upload photos, watch videos and do a dozen other things on the internet.

So on one side of the internet is the user with his laptop, smartphones or iPad. So what’s on the other side of the Internet and what is the Internet? The Internet is a global system of interconnected computer network that uses the TCP/IP protocol.  The Internet or more generally the internet is network of networks made of hundreds of millions of computers.

During the early days the internet was most probably used for document retrieval, email and browsing. But with the passage of time the internet and the uses of the internet have assumed gigantic proportions. Nowadays we use the internet to search billions of documents, share photographs with our online community, blog and stream video. So, while the early internet was populated with large computers to perform the tasks, the computations of the internet of today require a substantially larger infrastructure. The internet is now powered by datacenters. Datacenters contain anywhere between 100s to 100,000s servers. A server is a more beefed up computer that is designed for high performance sans a screen and a keyboard. Datacenters contain servers stacked over one another on a rack.

These datacenters are capable of handling thousands of simultaneous users and delivering results in split second. In this age of exploding data and information overload where split second responses and blazing throughputs are the need of the hour, datacenters really fill the need. But there is a dark side to these data centers. The issue is that these datacenters consume a lot of energy and are extremely power hungry besides. In fact out of a 100% of utility power supplied to datacenter only 6 – 12 % is used for actual computation. The rest of the power is either used for air conditioning or is lost through power distribution.

In fact a recent article “Power, pollution and the Internet” in the New York Times claims that “Worldwide, the digital warehouses use about 30 billion watts of electricity, roughly equivalent to the output of 30 nuclear power plants.”  Further the article states that “it is estimated that Google’s data centers consume nearly 300 million watts and Facebook’s about 60 million watts or 60 MW”

For e.g. It is claimed that Facebook annually draws 509 million kilowatt hours  of power for its  data centers  (see Estimate: Facebook running 180,000 servers). This article further concludes “that the social network is delivering 54.27 megawatts (MW) to servers” or approximately 60 MW to its datacenter.  The other behemoths in this domain including Google, Yahoo, Twitter, Amazon, Microsoft, and Apple all have equally large or larger data centers consuming similar amounts of energy.  Recent guesstimates have placed Google’s server count at more than 1 million and consuming approximately 220 MW. Taking a look at the power generation capacities of power plants in India we can see that 60 MW is between to 20%-50% of the power generation capacity of  power plants  while 220 MW is entire capacity of medium sized power plants (see List of power stations in India”)

One of the challenges that these organizations face is the need to make the datacenter efficient. New techniques are constantly being used in the ongoing battle to reduce energy consumption in a data center. These tools are also designed to boost a data center’s Power Usage Effectiveness (PUE) rating. Google, Facebook, Yahoo, and Microsoft compete to get to the lowest possible PUE measure in their newest data centers. The earlier datacenters used to average 2.0 PUE while advanced data centers these days aim for lower ratings of the order of 1.22 or 1.16 or lower.

In the early days of datacenter technology the air-conditioning systems used to cool by brute force. Later designs segregated the aisles as hot & cold aisle to improve efficiency. Other technique use water as a coolant along with heat exchangers. A novel technique was used by Intel recently in which servers were dipped in oil. While Intel claimed that this improved the PUE rating there are questions about the viability of this method considering the messiness of removing or inserting new circuit board from the servers.

Datacenters are going to proliferate in the coming days as information continues to explode. The hot new technology “Cloud Computing” is nothing more that datacenters which uses virtualization technique or the ability to run different OS on the hardware improving server utilization.

Clearly the thrust of technology in the days to come will be on identifying renewable sources of energy and making datacenters more efficient.

Datacenters will become more and more prevalent in the internet and technologies to make them efficient as we move to a more data driven world

Find me on Google+

The data center paradox

In today’s globalized environment organizations are spread geographically across the globe. Such globalizations result in multiple advantages ranging from quicker penetration into foreign markets, cost advantage of the local workforce etc.  This globalization results in the organization having data centers that are spread in different geographical areas. Besides mergers and acquisitions of different businesses spread across the globe results in hardware and server sprawl.

Applications in these dispersed servers tend to be silo’ed with legacy hardware and different OS’es and disparate software that execute on them.

The costs of maintaining different data centers can be a prickly problem. There are different costs in managing a data center. The chief among them are operational costs, real estate costs, power and cooling costs etc. The problem of hardware and server sprawl is a real problem and the enterprise must look to ways to solve this problem.

There are two techniques to manage hardware and server sprawl.

The first method is to use virtualization technologies so that the hardware and server sprawl can be reduced. Virtualization techniques abstract the raw hardware through the use of special software called the hypervisor. Any guest OS namely Windows, Linux or Solaris can execute over the hypervisor. The key benefit that virtualization brings to the enterprise is that it abstracts the hardware, storage and the network and creates a shared pool of compute, storage and network for the different applications to utilize. Hence the server sprawl can be mitigated to some extent through the use of Virtualization Software such as VmWare, XenApp, Hyper-V etc.

The second method requires rationalization and server consolidation. This essentially requires taking a hard look at the hardware infrastructure, the application and their computing needs and trying to come up with a solution which involves more powerful mainframes or servers which can replace the existing less powerful infrastructure.  Consolidation has multiple benefits. Many distributed data centers can be replaced with a single consolidated data center with today’s powerful multi-core, multi-processor servers. This results in highly reduced operational costs, easier management, savings from reduction in the need for power and cooling requirements and real estate saving etc. Consolidation truly appears to be the “silver bullet” for server sprawl.

However this brings us to what I call “the data center paradox”.  While a consolidated data center can do away with operational expenses of multiple data centers, result in reduction in power and cooling costs and save in real estate costs it introduces WAN latencies. When geographically dispersed data centers across the globe are replaced with a consolidated data center, in a single location, the access times from different geographical areas can result in poor response times and latencies. Besides there is also an inherent cost of data access over the WAN network

The WAN network results in latencies which are difficult to eliminate. There are technologies which can lessen the bandwidth problem to some extent. WAN optimizer is one such technology.

In fact e-commerce and many web applications intentionally spread their application across geographical regions to provide a better response time.

So while on the one hand consolidation results in cost savings, better efficiencies of management of a single data center, reduced power and cooling costs and real estate savings it results in WAN latencies and associated bandwidth costs.

Unless there is a breakthrough innovation in WAN technologies this will be a paradox that architects and CIOs will have to contend with.

Find me on Google+