A Cloud medley with IBM Bluemix, Cloudant DB and Node.js

Published as a tutorial in IBM Cloudant – Bluemix tutorial and demos

Here is an interesting Cloud medley based on IBM’s Bluemix PaaS platform, Cloudant DB and Node.js. This application  creates a Webserver using Node.js and uses REST APIs to perform CRUD operations on a Cloudant DB. Cloudant DB is a NoSQL Database as a service (DBaaS) that can handle a wide variety of data types like JSON, full text and geo-spatial data. The documents  are stored, indexed and distributed across a elastic datastore spanning racks, datacenters and perform replication of data across datacenters.Cloudant  allows one to work with self-describing JSON documents through  RESTful APIs making every document in the Cloudant database accessible as JSON via a URL.

This application on Bluemix uses REST APIs to perform the operations of inserting, updating, deleting and listing documents on the Cloudant DB.  The code can be forked from Devops at bluemix-cloudant. The code can also be clone from GitHub at bluemix-cloudant.

1) Once the code is forked the application can be deployed on to Bluemix using

cf login -a https://api.ng.bluemix.net
cf push bm-cloudant -p . -m 512M

2) After this is successful go to the Bluemix dashboard and add the Cloudant DB service.  The CRUD operations can be performed by invoking REST API calls using an appropriate REST client like SureUtils ot Postman in the browser of your choice.

Here are the details of the Bluemix-Cloudant application

3) Once the Cloudant DB service has been added to the Web started Node.js application we need to parse the process.env variable to obtain the URL of the Cloudant DB and the port and host to be used for the Web server.

The Node.js Webserver is started based on the port and host values obtained from process.env

require('http').createServer(function(req, res) {
//Set up the DB connection
if (process.env.VCAP_SERVICES) {
// Running on Bluemix. Parse for  the port and host that we've been assigned.
var env = JSON.parse(process.env.VCAP_SERVICES);
var host = process.env.VCAP_APP_HOST;
var port = process.env.VCAP_APP_PORT;
....
}
....
// Perform CRUD operations through REST APIs
// Insert document
if(req.method == 'POST') {
insert_records(req,res);
}
// List documents
else if(req.method == 'GET') {
list_records(req,res);
}
// Update a document
else if(req.method == 'PUT') {
update_records(req,res);
}
// Delete a document
else if(req.method == 'DELETE') {
delete_record(req,res);
}
}).listen(port, host);

2) Access to the Cloudant DB Access to Cloudant DB is obtained as follows

if (process.env.VCAP_SERVICES) {
// Running on Bluemix. Parse the port and host that we've been assigned.
var env = JSON.parse(process.env.VCAP_SERVICES);
var host = process.env.VCAP_APP_HOST;
var port = process.env.VCAP_APP_PORT;
console.log('VCAP_SERVICES: %s', process.env.VCAP_SERVICES);
// Also parse Cloudant settings.
var cloudant = env['cloudantNoSQLDB'][0]['credentials'];
}
var db = new pouchdb('books'),
remote =cloudant.url + '/books';
opts = {
continuous: true
};
// Replicate the DB to remote
console.log(remote);
db.replicate.to(remote, opts);
db.replicate.from(remote, opts);

Access to the Cloudant DB is through the cloudant.url shown above

3)  Once the access to the DB is setup we can perform CRUD operations. There are many options for the backend DB. In this application I have PouchDB.

4) Inserting a document: To insert documents into the Cloudant DB based on Pouch DB we need to do the following

var insert_records = function(req, res) {
//Parse the process.env for the port and host that we've been assigned
if (process.env.VCAP_SERVICES) {
// Running on Bluemix. Parse the port and host that we've been assigned.
var env = JSON.parse(process.env.VCAP_SERVICES);
var host = process.env.VCAP_APP_HOST;
var port = process.env.VCAP_APP_PORT;
console.log('VCAP_SERVICES: %s', process.env.VCAP_SERVICES);
// Also parse Cloudant settings.
var cloudant = env['cloudantNoSQLDB'][0]['credentials'];
}
var db = new pouchdb('books'),
remote =cloudant.url + '/books';
opts = {
continuous: true
};
// Replicate the DB to remote
console.log(remote);
db.replicate.to(remote, opts);
db.replicate.from(remote, opts);
// Put 3 documents into the DB
db.put({
author: 'John Grisham',
Title : 'The Firm'
}, 'book1', function (err, response) {
console.log(err || response);
});
...
...
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write("3 documents is inserted");
res.end();
}; // End insert_records

The nice part about Cloudant DB is that you can access your database through the URL. The steps are shown below. Once your application is running. Click on your application. You should see the screen as below.

1

Click on Cloudant as shown by the arrow.

Next click on the “Launch’ icon

2

This should bring up the Cloudant dashboard. The database will be empty.

3

If you use a REST API Client to send a POST API call then the Application will insert 3 documents.

4

The documents inserted can be seen by sending the GET REST API call.

5

The nice part of Cloudant DB is that you can use the URL to see your database. If you refresh your screen you should see the “books” database added. Clicking this database you should see the 3 documents that have been added

6

If you click “Edit doc” you should see the details of the document

7

5) Updating a document

The process to update a document in the database is shown below

// Update book3
db.get('book3', function(err, response) {
console.log(response);
return db.put({
_id: 'book3',
_rev: response._rev,
author: response.author,
Title : 'The da Vinci Code',
});
}, function(err, response) {
if (err) {
console.log("error " + err);
} else {
console.log("Success " + response);
}
});

This is performed with a PUT REST API call

8

The updated list is shown below

9

This can be further verified in the Cloudant DB dashboard for book3.

10

6) Deleting a document

The code to delete a document in PouchDB is shown below

//Deleting document book1
db.get('book1', function(err, doc) {
db.remove(doc, function(err, response) {
console.log(err || response);
});
});

The REST calls to delete a document and the result  are shown below

11

12

Checking the Cloudant dashboard we see that only book2 & book3 are present and book1 has been deleted

13

7) Displaying documents in the database

The code for displaying the list of documents is shown below

var docs = db.allDocs(function(err, response) {
val = response.total_rows;
var details = "";
j=0;
for(i=0; i < val; i++) {
db.get(response.rows[i].id, function (err,doc){
j++;
details= details + JSON.stringify(doc.Title) + " by  " +  JSON.stringify(doc.author) + "\n";
// Kludge because of Node.js asynchronous handling. To be fixed - T V Ganesh
if(j == val) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write(details);
res.end();
console.log(details);
}
}); // End db.get
} //End for
}); // End db.allDocs

If you happened to notice, I had to use a kludge to work around Node.js’ idiosyncracy of handling asynchronous calls. I was fooled by the remarkable similarity of Node.js & hence  javascript to C language that I thought functions within functions would work sequentially. However I had undergo much grief  trying to get Node.js to work sequentially. I wanted to avoid the ‘async’ module but was unsuccessful with trying to code callbacks. So the kludge! I will work this out eventually but this workaround will have to do for now!

As always you can use the “Files and Logs” in the Bluemix dashboard to get any output that are written to stdout.

Note: As always I can’t tell how useful the command
'cf  logs <application name> -- recent is for debugging.

Hope you enjoyed this Cloud Medley of Bluemix, Cloudant and Node.js!

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

You may also like
1. Brewing a potion with Bluemix, PostgreSQL & Node.js in the cloud
2.  A Bluemix recipe with MongoDB and Node.js
3.  Spicing up IBM Bluemix with MongoDB and NodeExpress
4.  Rock N’ Roll with Bluemix, Cloudant & NodeExpress


Find me on Google+

Introducing the Software Defined Computing Pattern

We are on the verge of a new ‘Software Defined’ revolution. The phrase ‘software defined’ refers to the ability to be able to programmatically control computing elements namely compute, storage, network. We are entering into a bold, brave ‘software defined’ era. Before we delve into the ‘whats’ of this revolution I would rather like to outline the ‘whys’. What motivated this new thinking in computing?

Why “Software Defined’?

In the late 90s, IT infrastructure was unwieldy and unmanageable, Whenever new IT infrastructure had to be procured there was the need to accurately size the required hardware infrastructure, software, software licenses, routers, switches and storage elements The problem in those days had to do with dimensioning. The CIO and IT managers had to be able to calculate the requisite hardware, and software elements. The problem was that if the estimate was too conservative the infrastructure would be under-dimensioned and would not be able to handle the load. On the other hand if it was over-dimensioned then hardware and software would lie idle and would result in a wasted resources and money. So it used to be a fine balancing act. Even if the IT managers got lucky and got the size right, it is quite likely that conditions in the enterprise changed resulting in them having to take a relook at their infrastructure.

This problem of dimensioning IT infrastructure was effectively solved by a technology called ‘virtualization’. In the mid 1960s IBM created a CP-67 Mainframe computer, which had the elements of virtualization. Much later in 1998, VMWare created the VMWare workstation that could run multiple Operating Systems (OS’es). In essence virtualization abstracts the hardware of the computer, storage and network ports through a software known as the hypervisor. Over the hypervisor, the user can run any operating system like Windows, Linux, AIX etc. These OS’es which run on top of the hypervisor are known as guest OS’es. Besides, virtualization technology, enables different virtual servers to share one physical server. This process, called server consolidation, helps to increase hardware utilization, load balancing, and optimization of the IT resources.

The ability to virtualize the computer hardware really triggered some major advancements in computing. Prior to virtualization each server would run a single OS with a single application resulting in the server being idle for close to 60% of the time. Virtualization now made it possible for enterprises to run several OS’es each with its own application on a single computer. Hence the computing resources were used more effectively and efficiently. This is shown below

a

Virtualization and the dotcom bust around the year 2000 effectively paved the way for a ‘Software Defined’ future. In others words there was a need to control resources programmatically aimed at more efficient utilization of the resources.

The move to the Cloud: Prior to the advent of the cloud, enterprises hosted their applications in their internal IT infrastructure with virtualization technology. With the pay-per-use, utility style computing, spearheaded by the likes of Amazon, many enterprises moved their applications to shared, multi-tenant (multiple customers) , 3rd party hosting service provider, also known as the cloud providers

With the advent of Cloud Computing the software defined era made major advances. Here is the reason why. Computing as such stands on 3 main pillars- computing, storage and networking.

As mentioned earlier in the post, one of the thorny issues in procuring & managing IT infrastructure is the problem of dimensioning or right sizing. Virtualization did solve this problem to some extent but there was a need to provide more control to the user. This is where the ‘Software Defined’ technologies emerged. This ‘Software Defined’ paradigm is based on prudence and sound engineering judgment. The whole premise of making anything ‘software defined’ is to ensure that resources allocated for any task (computing, storage or networking) are optimal. The idea is that resources should be allocated exactly as needed and released and included into a shared, common pool, when idle. Hence we have the advent of

  • Software Defined Compute
  • Software Defined Storage
  • Software Defined Network

Software Defined Compute (SDC): In the clouds these days it is possible to precisely control the computing elements that will make up your application. So you can choose your CPU type, CPU speed, hypervisor, OS, RAM size, disks etc. You can also provision your application to expand or contract elastically to the demands of the times rather than under-provisioning or over-provisioning, This is done through a process called auto scaling. The desired configuration can be controlled through APIs provided by the Cloud Provider.

Software Defined Storage (SDS): There are multiple storage technologies that span DAS, SATA drives, SAN and NAS storage. These different storage technologies address different needs of price, storage capacity and performance, The Software Defined Storage allows the user to control the type of storage that is needed for the application through software APIs. In storage the initial allocation to each application is rather conservative. Additional storage is assigned from a common pool of storage to the applications that needs it the most. Once the storage is no longer needed it is reclaimed.

Software Defined Network(SDN): SDN is the result of pioneering effort by Stanford University and University of California, Berkeley and is based on the Open Flow Protocol and represents a paradigm shift to the way networking elements operate. Software Defined Networks (SDN) decouples the routing and switching of the data flows and moves the control of the flow to a separate network element namely, the flow controller.   The motivation for this is that the flow of data packets through the network can be controlled in a programmatic manner allowing for multiple data streams to flow over the communicating paths with each stream individually defined for speed, latency, QoS etc.

Software Defined Datacenter (SDDC): A datacenter has racks and racks of servers, storage boxes, and networking equipment. A datacenter where one is able to provision, manage and operate these equipment through APIs or through programs is a Software Defined Datacenter. Imagine being able to put together a car with the body of a BMW, the interior of a Merc, the engine of a Ferrari and the electronics of a Tesla! That is what a SDDC allows you to do!

Software Defined Computing Pattern (SDCP): Once the SDC, SDS and SDN reach a level of maturity I think the next logical step would be a move to Software Defined Computing Patterns. This is what I am implying by this. Theoretically we can reduce the different types of enterprise applications to a set of computing patterns for e.g. e-commerce, social network, email server, Web portal etc. The Software Defined Computing Pattern would allow the user to choose a computing pattern based on the enterprise application. This would result in the setting up of the appropriate computing resources, storage resources, middleware and networking elements in a cloud. . The user would them need to host their applications on this environment. Here is a good link to cloud patterns.

In this context I would like to bring to your notice that there is another parallel trend called Software Defined Architecture (SDA) coined by Gartner in 2014. The SDA Gateway is responsible for virtualizing the internal API, protocols and models used to external API, User Interface and resources. Here is a diagram of SDA

sda-2

The pace of progress in the last couple of years has been really scorching. The ability to have solve most large problem through a Software Defined Computing Pattern is sure to happen.

The dark side of the Internet

Published in Telecom Asia 26 Sep 2012 – The dark side of the internet

Imagine a life without the internet. You can’t! That’s how inextricably enmeshed the internet is in our lives. Kids learn to play “angry birds” on the PC before they learn to say “duh”, school children hobnob on Facebook and many of us regularly browse, upload photos, watch videos and do a dozen other things on the internet.

So on one side of the internet is the user with his laptop, smartphones or iPad. So what’s on the other side of the Internet and what is the Internet? The Internet is a global system of interconnected computer network that uses the TCP/IP protocol.  The Internet or more generally the internet is network of networks made of hundreds of millions of computers.

During the early days the internet was most probably used for document retrieval, email and browsing. But with the passage of time the internet and the uses of the internet have assumed gigantic proportions. Nowadays we use the internet to search billions of documents, share photographs with our online community, blog and stream video. So, while the early internet was populated with large computers to perform the tasks, the computations of the internet of today require a substantially larger infrastructure. The internet is now powered by datacenters. Datacenters contain anywhere between 100s to 100,000s servers. A server is a more beefed up computer that is designed for high performance sans a screen and a keyboard. Datacenters contain servers stacked over one another on a rack.

These datacenters are capable of handling thousands of simultaneous users and delivering results in split second. In this age of exploding data and information overload where split second responses and blazing throughputs are the need of the hour, datacenters really fill the need. But there is a dark side to these data centers. The issue is that these datacenters consume a lot of energy and are extremely power hungry besides. In fact out of a 100% of utility power supplied to datacenter only 6 – 12 % is used for actual computation. The rest of the power is either used for air conditioning or is lost through power distribution.

In fact a recent article “Power, pollution and the Internet” in the New York Times claims that “Worldwide, the digital warehouses use about 30 billion watts of electricity, roughly equivalent to the output of 30 nuclear power plants.”  Further the article states that “it is estimated that Google’s data centers consume nearly 300 million watts and Facebook’s about 60 million watts or 60 MW”

For e.g. It is claimed that Facebook annually draws 509 million kilowatt hours  of power for its  data centers  (see Estimate: Facebook running 180,000 servers). This article further concludes “that the social network is delivering 54.27 megawatts (MW) to servers” or approximately 60 MW to its datacenter.  The other behemoths in this domain including Google, Yahoo, Twitter, Amazon, Microsoft, and Apple all have equally large or larger data centers consuming similar amounts of energy.  Recent guesstimates have placed Google’s server count at more than 1 million and consuming approximately 220 MW. Taking a look at the power generation capacities of power plants in India we can see that 60 MW is between to 20%-50% of the power generation capacity of  power plants  while 220 MW is entire capacity of medium sized power plants (see List of power stations in India”)

One of the challenges that these organizations face is the need to make the datacenter efficient. New techniques are constantly being used in the ongoing battle to reduce energy consumption in a data center. These tools are also designed to boost a data center’s Power Usage Effectiveness (PUE) rating. Google, Facebook, Yahoo, and Microsoft compete to get to the lowest possible PUE measure in their newest data centers. The earlier datacenters used to average 2.0 PUE while advanced data centers these days aim for lower ratings of the order of 1.22 or 1.16 or lower.

In the early days of datacenter technology the air-conditioning systems used to cool by brute force. Later designs segregated the aisles as hot & cold aisle to improve efficiency. Other technique use water as a coolant along with heat exchangers. A novel technique was used by Intel recently in which servers were dipped in oil. While Intel claimed that this improved the PUE rating there are questions about the viability of this method considering the messiness of removing or inserting new circuit board from the servers.

Datacenters are going to proliferate in the coming days as information continues to explode. The hot new technology “Cloud Computing” is nothing more that datacenters which uses virtualization technique or the ability to run different OS on the hardware improving server utilization.

Clearly the thrust of technology in the days to come will be on identifying renewable sources of energy and making datacenters more efficient.

Datacenters will become more and more prevalent in the internet and technologies to make them efficient as we move to a more data driven world

Find me on Google+

The Next Frontier

Published in Telecom Asia – The next frontier, 21, Mar, 2012

In his classic book “The Innovator’s Dilemma” Prof. Clayton Christensen of Harvard Business School presents several compelling cases of great organizations that fail because they did not address disruptive technologies, occurring in the periphery, with the unique mindset required in managing these disruptions.

In the book the author claims that when these disruptive technologies appeared on the horizon there were few takers for these technologies because there were no immediate applications for them. For e.g. when the hydraulic excavator appeared its performance was inferior to the existing predominant manual excavator. But in course of time the technology behind hydraulic excavators improved significantly to displace existing technologies. Similarly the appearance of 3.5 inch disk had no immediate takers in desktop computers but made its way to the laptop.

Similarly the mini computer giant Digital Equipment Corporation (DEC) ignored the advent of the PC era and focused all its attention on making more powerful mini-computers. This led to the ultimate demise of DEC and several other organizations in this space. This book includes several such examples of organizations that went defunct because disruptive technologies ended up cannibalizing established technologies.

In the last couple of months we have seen technology trends pouring in.  It is now accepted that cloud computing, mobile broadband, social networks, big data, LTE, Smart Grids, and Internet of Things will be key players in the world of our future. We are now at a point in time when serious disruption is not just possible but seems extremely likely. The IT Market Research firm IDC in its Directions 2012 believes that we are in the cusp of a Third Platform that will dominate the IT landscape.

There are several technologies that have been appearing on the periphery and have only gleaned marginal interest for e.g. Super Wi-Fi or Whitespaces which uses unlicensed spectrum to access larger distances of up to 100 kms. Whitespaces has been trialed by a few companies in the last year. Another interesting technology is WiMAX which provides speeds of 40 Mbps for distances of up to 50 km. WiMAX’s deployment has been spotty and has not led to widespread adoption in comparison to its apparent competitor LTE.

In the light of the technology entrants, the disruption in the near future may occur because of a paradigm shift which I would like to refer as the “Neighborhood Area Computing (NAC)” paradigm.  It appears that technology will veer towards neighborhood computing given the bandwidth congestion issues of WAN. A neighborhood area network (NAN) will supplant the WAN for networks which address a community in a smaller geographical area

This will lead to three main trends

Neighborhood Area Networks (NAN):  Major improvements in Neighborhood Area Networks (NAN) are inevitable given the rising importance of smart grids and M2M technology in the context of WAN latencies. Residential homes of the future will have a Home Area Network (HAN) based on bluetooth or Zigbee protocols connecting all electrical appliances. In a smart grid contextNAN provides the connectivity between the Home Area Network (HAN) of a future Smart Home with the WAN network. While it is possible that the utility HAN network will be separate from the IP access network of the residential subscriber, the more likely possibility is that the HAN will be a subnet within the home network and will connect toNAN network.

The data generated from smart grids, m2m networks and mobile broadband will need to be stored and processed immediately through big data analytics on a neighborhood datacenter. Shorter range technologies like WiMAX, Super WiFi/ Whitespaces will transport the data to a neighborhood cloud on which a Hadoop based Big Data analytics will provide real time analytics

Death of the Personal Computer:  The PC/laptop will soon give way to a cloud based computing platform similar to Google’s Chrome book. Not only will we store all our data on the cloud (music, photos, videos) we will also use the cloud for our daily computing needs. Given the high speeds of theNAN this should be quite feasible in the future. The cloud will remove our worries about virus attacks, patch updates and the need to buy new software.  We will also begin to trust our data in the cloud as we progress to the future. Moreover the pay-per-use will be very attractive to consumers.

Exploding Datacenters:  As mentioned above a serious drawback of the cloud is the WAN latency. It is quite likely that with the increases in processing powers and storage capacity coupled with dropping prices that cloud providers will have hundreds of data centers with around 1000 servers for each city rather than a few mega data centers with 10,000’s of servers.  These data centers will address the computing needs of a community in a small geographical area. Such smaller data centers, typically in a small city, will solve 2 problems. One it will build into the cloud geographical redundancy besides also providing excellent performance asNAN latencies will be significantly less in comparison to WAN latencies.

These technologies will improve significantly and fill in the need for handling neighborhood high speed data

The future definitely points to computing in the neighborhood.

Find me on Google+

The data center paradox

In today’s globalized environment organizations are spread geographically across the globe. Such globalizations result in multiple advantages ranging from quicker penetration into foreign markets, cost advantage of the local workforce etc.  This globalization results in the organization having data centers that are spread in different geographical areas. Besides mergers and acquisitions of different businesses spread across the globe results in hardware and server sprawl.

Applications in these dispersed servers tend to be silo’ed with legacy hardware and different OS’es and disparate software that execute on them.

The costs of maintaining different data centers can be a prickly problem. There are different costs in managing a data center. The chief among them are operational costs, real estate costs, power and cooling costs etc. The problem of hardware and server sprawl is a real problem and the enterprise must look to ways to solve this problem.

There are two techniques to manage hardware and server sprawl.

The first method is to use virtualization technologies so that the hardware and server sprawl can be reduced. Virtualization techniques abstract the raw hardware through the use of special software called the hypervisor. Any guest OS namely Windows, Linux or Solaris can execute over the hypervisor. The key benefit that virtualization brings to the enterprise is that it abstracts the hardware, storage and the network and creates a shared pool of compute, storage and network for the different applications to utilize. Hence the server sprawl can be mitigated to some extent through the use of Virtualization Software such as VmWare, XenApp, Hyper-V etc.

The second method requires rationalization and server consolidation. This essentially requires taking a hard look at the hardware infrastructure, the application and their computing needs and trying to come up with a solution which involves more powerful mainframes or servers which can replace the existing less powerful infrastructure.  Consolidation has multiple benefits. Many distributed data centers can be replaced with a single consolidated data center with today’s powerful multi-core, multi-processor servers. This results in highly reduced operational costs, easier management, savings from reduction in the need for power and cooling requirements and real estate saving etc. Consolidation truly appears to be the “silver bullet” for server sprawl.

However this brings us to what I call “the data center paradox”.  While a consolidated data center can do away with operational expenses of multiple data centers, result in reduction in power and cooling costs and save in real estate costs it introduces WAN latencies. When geographically dispersed data centers across the globe are replaced with a consolidated data center, in a single location, the access times from different geographical areas can result in poor response times and latencies. Besides there is also an inherent cost of data access over the WAN network

The WAN network results in latencies which are difficult to eliminate. There are technologies which can lessen the bandwidth problem to some extent. WAN optimizer is one such technology.

In fact e-commerce and many web applications intentionally spread their application across geographical regions to provide a better response time.

So while on the one hand consolidation results in cost savings, better efficiencies of management of a single data center, reduced power and cooling costs and real estate savings it results in WAN latencies and associated bandwidth costs.

Unless there is a breakthrough innovation in WAN technologies this will be a paradox that architects and CIOs will have to contend with.

Find me on Google+