Bend it like Bluemix, MongoDB with auto-scaling – Part 3

In this last post of this series, I test the performance of Bluemix & MongoDB against concurrent queries and deletes to the cloud based app with Mongo DB, with auto-scaling on. Before I started these series of tests I moved the Overload policy a couple of notches higher and made it scale out if memory utilization > 75% for 120 secs and < 30% for 120 secs (from the earlier 55% memory utilization) as shown below.

The code for bluemixMongo app can be forked from Devops at bluemixMongo or can be cloned from GitHub at bluemix-mongo-autoscale. The multi-mechanize scripts can be downloaded from GitHub at multi-mechanize Before starting the testing I checked the current number of documents inserted by the concurrent inserts (see Bend it like Bluemix., MongoDB using Auto-scaling – Part 2). The total number as determined by checking the logs was 1380 Sure enough with the scaling policy change after 2 minutes the number of instanced dropped from 3 to 2

1. Querying the bluemixMongo app with Multi-mechanize

The Multi-mechanize Python script used for querying the bluemixMongo app simply invokes the app’s userlist URL (resp=br.open(“http://bluemixmongo.mybluemix.net/userlist/”)

v_user.py

def run(self): # create a Browser instance br = mechanize.Browser() # don"t bother with robots.txt br.set_handle_robots(False) # start the timer start_timer = time.time() #print("Display userlist") # Display 5 random documents resp=br.open("http://bluemixmongo.mybluemix.net/userlist/") assert("Example Mongo Page" in resp.get_data()) # stop the timer latency = time.time() - start_timer self.custom_timers["Userlist"] = latency r = random.uniform(1, 2) time.sleep(r) self.custom_timers['Example_Timer'] = r

The configuration setup for this script creates 2 sets of 10 concurrent threads

config.cfg
run_time = 300 rampup = 0 results_ts_interval = 10 progress_bar = on console_logging = off xml_report = off [user_group-1] threads = 10 script = v_user.py [user_group-2] threads = 10 script = v_user.py

The corresponding userlist.js for querying the app is shown below. Here the query is constructed by creating a ‘RegularExpression’ with a random Firstname, consisting of a random letter and a random number. Also the query is also limited to 5 documents.

function(callback) { // Display a random set of 5 records based on a regular expression made with random letter, number var randnum = Math.floor((Math.random() * 10) + 1); var alpha = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','X','Y','Z']; var randletter = alpha[Math.floor(Math.random() * alpha.length)]; var val = randletter + ".*" + randnum + ".*"; // Limit the display to 5 documents var results = collection.find({"FirstName": new RegExp(val)}).limit(5).toArray(function(err, items){ if(err) { console.log(err + " Error getting items for display"); } else { res.render('userlist', { "userlist" : items }); // end res.render } //end else db.close(); // Ensure that the open connection is closed }); // end toArray function callback(null, 'two'); }

2. Running the userlist query

The following screenshot shows the userlist query being executed concurrently with Multi-mechanize. Note that the number of instances also drops down to 1

3. Deleting documents with Multi-mechanize

The multi-mechanize script for deleting a document is shown below. This script calls the URL with resp = br.open(“http://bluemixmongo.mybluemix.net/remuser”). No values are required to be entered into the form and the ‘submit’ is simulated.

v_user.py def run(self): # create a Browser instance br = mechanize.Browser() # don"t bother with robots.txt br.set_handle_robots(False) br.addheaders = [("User-agent", "Mozilla/5.0Compatible")] # start the timer start_timer = time.time() # submit the request resp = br.open("http://bluemixmongo.mybluemix.net/remuser") #resp = br.open("http://localhost:3000/remuser") resp.read() # stop the timer latency = time.time() - start_timer # store the custom timer self.custom_timers["Load_Front_Page"] = latency # think-time time.sleep(2) # select first (zero-based) form on page br.select_form(nr=0) # set form field br.form["firstname"] = "" br.form["lastname"] = "" br.form["mobile"] = "" # start the timer start_timer = time.time() # submit the form resp = br.submit() resp.read() print("Removed") # stop the timer latency = time.time() - start_timer # store the custom timer self.custom_timers["Delete"] = latency # think-time time.sleep(2)

config.cfg

The config file is set to start 2 sets of 10 concurrent threads and execute for 300 secs

[global] run_time = 300 rampup = 0 results_ts_interval = 10 progress_bar = on console_logging = off xml_report = off [user_group-1] threads = 10 script = v_user.py [user_group-2] threads = 10 script = v_user.py ;

deleteuser.js

This Node.js script does a findOne() document and does a remove with the ‘justOne’ set to true

collection.findOne(function(err, item) { // Delete just a single record collection.remove(item, {justOne:true},(function (err, doc) { if (err) { // If it failed, return error res.send("There was a problem removing the information to the database."); } else { // If it worked redirect to userlist res.location("userlist"); // And forward to success page res.redirect("userlist"); } })); }); collection.find().toArray(function(err, items) { console.log("Length =----------------" + items.length); db.close(); }); callback(null, 'two');

4. Running the deleteuser multimechanize script

The output of the script executing and the reduction of the number of instances because of the change in the memory utilization policy is shown

5. Multi-mechanize

As mentioned in the previous posts

The multi-mechanize commands are executed as follows
To create a new project
multimech-newproject.exe userlist
This will create 2 folders a) results b) test_scripts and the file c) config.cfg. The v_user.py needs to be updated as required

To run the script
multimech-run.exe userlist

The details of the response times for the query is shown below .

More details on latency and throughput for the queries and the deletes are included in the results folder of multi-mechanize

6. Autoscaling The details of the auto-scaling service is shown below

a. Scaling Metric Statistics

b. Scaling history

7. Monitoring and Analytics (M & A) The output from M & A is shown below

a. Performance Monitoring

b. Log Analysis output The log analysis give a detailed report on the calls made to the app, the console log output and other useful information

The series of the 3 posts Bend it like Bluemix, MongoDB with auto-scaling demonstrated the ability of the cloud to expand and shrink based on the load on the cloud.An important requirement for Cloud Architects is design applications that can scale horizontally without impacting the performance while keeping the costs optimum. The real challenge to auto-scaling is the need to make the application really distributed as opposed to the monolithic architectures we are generally used to. I hope to write another post on creating simple distributed application later.

Hasta la Vista!

Also see
1. Bend it like Bluemix, MongoDB with autoscaling – Part 1
2. Bend it like Bluemix, MongoDB with autoscaling – Part 2

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Bend it like Bluemix, MongoDB using Auto-scaling – Part 2!

This post takes off from my previous post Bend it like Bluemix, MongoDB using Auto-scale – Part 1! In this post I generate traffic using Multi-Mechanize a performance test framework and check out the auto-scaling on Bluemix, besides also doing some rudimentary check on the latency and throughput for this test application. In this particular post I generate concurrent threads which insert documents into MongoDB.

Note: As mentioned in my earlier post this is more of a prototype and the typical situation when architecting cloud applications. Clearly I have not optimized my cloud app (bluemixMongo) for maximum efficiency. Also this a simple 2 tier application with a rudimentary Web interface and a NoSQL DB at This is more of a Proof of Concept (PoC) for the auto-scaling service on Bluemix.

As earlier mentioned the bluemixMongo app is a modification of my earlier post Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress. The bluemixMongo cloud app that was used for this auto-scaling test can be forked from Devops at bluemixMongo or from GitHib at bluemix-mongo-autoscale. The Multi-mechanize config file, scripts and results can be found at GitHub in multi-mechanize

The document to be inserted into MongoDB consists of 3 fields – Firstname, Lastname and Mobile. To simulate the insertion of records into MongoDB I created a Multi-Mechanize script that will generate random combination of letters and numbers for the First and Last names and a random 9 digit number for the mobile. The code for this script is shown below

1. The snippet below measure the latency for loading the ‘New User’ page

v_user.py
def run(self): # create a Browser instance br = mechanize.Browser() # don"t bother with robots.txt br.set_handle_robots(False) print("Rendering new user") br.addheaders = [("User-agent", "Mozilla/5.0Compatible")] # start the timer start_timer = time.time() # submit the request resp = br.open("http://bluemixmongo.mybluemix.net/newuser") #resp = br.open("http://localhost:3000/newuser") resp.read() # stop the timer latency = time.time() - start_timer # store the custom timer self.custom_timers["Load Add User Page"] = latency # think-time time.sleep(2)

The script also measures the time taken to submit the form containing the Firstname, Lastname and Mobile

# select first (zero-based) form on page br.select_form(nr=0) # Create random Firstname a = (''.join(random.choice(string.ascii_uppercase) for i in range(5))) b = (''.join(random.choice(string.digits) for i in range(5))) firstname = a + b # Create random Lastname a = (''.join(random.choice(string.ascii_uppercase) for i in range(5))) b = (''.join(random.choice(string.digits) for i in range(5))) lastname = a + b # Create a random mobile number mobile = (''.join(random.choice(string.digits) for i in range(9))) # set form field br.form["firstname"] = firstname br.form["lastname"] = lastname br.form["mobile"] = mobile # start the timer start_timer = time.time() # submit the form resp = br.submit() print("Submitted.") resp.read() # stop the timer latency = time.time() - start_timer # store the custom timer self.custom_timers["Add User"] = latency

2. The config.cfg file is setup to generate 2 asynchronous thread pools of 10 threads for about 400 seconds

config.cfg
run_time = 400 rampup = 0 results_ts_interval = 10 progress_bar = on console_logging = off xml_report = off [user_group-1] threads = 10 script = v_user.py [user_group-2] threads = 10 script = v_user.py

3. The code to add a new user in the app (adduser.js) uses the ‘async’ Node module to enforce sequential processing.

adduser.js
async.series([ function(callback) { collection = db.collection('phonebook', function(error, response) { if( error ) { return; // Return immediately } else { console.log("Connected to phonebook"); } }); callback(null, 'one'); }, function(callback) // Insert the record into the DB collection.insert({ "FirstName" : FirstName, "LastName" : LastName, "Mobile" : Mobile }, function (err, doc) { if (err) { // If it failed, return error res.send("There was a problem adding the information to the database."); } else { // If it worked, redirect to userlist - Display users res.location("userlist"); // And forward to success page res.redirect("userlist") } }); collection.find().toArray(function(err, items) { console.log("**************************>>>>>>>Length =" + items.length); db.close(); // Make sure that the open DB connection is close }); callback(null, 'two'); } ]);

4. To checkout auto-scaling the instance memory was kept at 128 MB. Also the scale-up policy was memory based and based on the memory of the instance exceeding 55% of 128 MB for 120 secs. The scale up based on CPU utilization was to happen when the utilization exceed 80% for 300 secs.

5. Check the auto-scaling policy

6. Initially as seen there is just a single instance

7. At around 48% of the script with around 623 transactions the instance is increased by 1. Note that the available memory is decreased by 640 MB – 128 MB = 512 MB.

8. At around 1324 transactions another instance is added

Note: Bear in mind

a) The memory threshold was artificially brought down to 55% of 128 MB.b) The app itself is not optimized for maximum efficiency

9. The Metric Statistics tab for the Autoscaling service shows this memory breach and the trigger for autoscaling

10. The Scaling history Tab for the Auto-scaling service displays the scale-up and scale-down and the policy rules based on which the scaling happened

11. If you go to the results folder for the Multi-mechanize tool the response and throughput are captured.

The multi-mechanize commands are executed as follows
To create a new project
multimech-newproject.exe adduser
This will create 2 folders a) results b) test_scripts and the file c) config.cfg. The v_user.py needs to be updated as required

To run the script
multimech-run.exe adduser

12.The results are shown below

a) Load Add User page (Latency)

b) Load Add User (Throughput)

c)Load Add User (Latency)

d) Load Add User (Throughput)

The detailed results can be seen at GitHub at multi-mechanize

13. Check the Monitoring and Analytics Page

a) Availability

b) Performance monitoring

So once the auto-scaling happens the application can be fine-tuned and for performance. Obviously one could do it the other way around too.

As can be seen adding NoSQL Databases like MongoDB, Redis, Cloudant DB etc. Setting up the auto-scaling policy is also painless as seen above.

Of course the real challenge in cloud applications is to make them distributed and scalable while keeping the applications themselves lean and mean!

a) Latency, throughput implications for the cloud

b) The many faces of latency

c) Design principles of scalable, distributed systems

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

A Cloud medley with IBM Bluemix, Cloudant DB and Node.js

Published as a tutorial in IBM Cloudant – Bluemix tutorial and demos

Here is an interesting Cloud medley based on IBM’s Bluemix PaaS platform, Cloudant DB and Node.js. This application creates a Webserver using Node.js and uses REST APIs to perform CRUD operations on a Cloudant DB. Cloudant DB is a NoSQL Database as a service (DBaaS) that can handle a wide variety of data types like JSON, full text and geo-spatial data. The documents are stored, indexed and distributed across a elastic datastore spanning racks, datacenters and perform replication of data across datacenters.Cloudant allows one to work with self-describing JSON documents through RESTful APIs making every document in the Cloudant database accessible as JSON via a URL.

This application on Bluemix uses REST APIs to perform the operations of inserting, updating, deleting and listing documents on the Cloudant DB. The code can be forked from Devops at bluemix-cloudant. The code can also be clone from GitHub at bluemix-cloudant.

1) Once the code is forked the application can be deployed on to Bluemix using

cf login -a https://api.ng.bluemix.net cf push bm-cloudant -p . -m 512M

2) After this is successful go to the Bluemix dashboard and add the Cloudant DB service. The CRUD operations can be performed by invoking REST API calls using an appropriate REST client like SureUtils ot Postman in the browser of your choice.

Here are the details of the Bluemix-Cloudant application

3) Once the Cloudant DB service has been added to the Web started Node.js application we need to parse the process.env variable to obtain the URL of the Cloudant DB and the port and host to be used for the Web server.

The Node.js Webserver is started based on the port and host values obtained from process.env

require('http').createServer(function(req, res) { //Set up the DB connection if (process.env.VCAP_SERVICES) { // Running on Bluemix. Parse for the port and host that we've been assigned. var env = JSON.parse(process.env.VCAP_SERVICES); var host = process.env.VCAP_APP_HOST; var port = process.env.VCAP_APP_PORT; .... } .... // Perform CRUD operations through REST APIs // Insert document if(req.method == 'POST') { insert_records(req,res); } // List documents else if(req.method == 'GET') { list_records(req,res); } // Update a document else if(req.method == 'PUT') { update_records(req,res); } // Delete a document else if(req.method == 'DELETE') { delete_record(req,res); } }).listen(port, host);

2) Access to the Cloudant DB Access to Cloudant DB is obtained as follows

if (process.env.VCAP_SERVICES) { // Running on Bluemix. Parse the port and host that we've been assigned. var env = JSON.parse(process.env.VCAP_SERVICES); var host = process.env.VCAP_APP_HOST; var port = process.env.VCAP_APP_PORT; console.log('VCAP_SERVICES: %s', process.env.VCAP_SERVICES); // Also parse Cloudant settings. var cloudant = env['cloudantNoSQLDB'][0]['credentials']; } var db = new pouchdb('books'), remote =cloudant.url + '/books'; opts = { continuous: true }; // Replicate the DB to remote console.log(remote); db.replicate.to(remote, opts); db.replicate.from(remote, opts);

Access to the Cloudant DB is through the cloudant.url shown above

3) Once the access to the DB is setup we can perform CRUD operations. There are many options for the backend DB. In this application I have PouchDB.

4) Inserting a document: To insert documents into the Cloudant DB based on Pouch DB we need to do the following

The nice part about Cloudant DB is that you can access your database through the URL. The steps are shown below. Once your application is running. Click on your application. You should see the screen as below.

Click on Cloudant as shown by the arrow.

Next click on the “Launch’ icon

This should bring up the Cloudant dashboard. The database will be empty.

If you use a REST API Client to send a POST API call then the Application will insert 3 documents.

The documents inserted can be seen by sending the GET REST API call.

The nice part of Cloudant DB is that you can use the URL to see your database. If you refresh your screen you should see the “books” database added. Clicking this database you should see the 3 documents that have been added

If you click “Edit doc” you should see the details of the document

5) Updating a document

The process to update a document in the database is shown below

// Update book3 db.get('book3', function(err, response) { console.log(response); return db.put({ _id: 'book3', _rev: response._rev, author: response.author, Title : 'The da Vinci Code', }); }, function(err, response) { if (err) { console.log("error " + err); } else { console.log("Success " + response); } });

This is performed with a PUT REST API call

The updated list is shown below

This can be further verified in the Cloudant DB dashboard for book3.

6) Deleting a document

The code to delete a document in PouchDB is shown below

//Deleting document book1 db.get('book1', function(err, doc) { db.remove(doc, function(err, response) { console.log(err || response); }); });

The REST calls to delete a document and the result are shown below

Checking the Cloudant dashboard we see that only book2 & book3 are present and book1 has been deleted

7) Displaying documents in the database

The code for displaying the list of documents is shown below

If you happened to notice, I had to use a kludge to work around Node.js’ idiosyncracy of handling asynchronous calls. I was fooled by the remarkable similarity of Node.js & hence javascript to C language that I thought functions within functions would work sequentially. However I had undergo much grief trying to get Node.js to work sequentially. I wanted to avoid the ‘async’ module but was unsuccessful with trying to code callbacks. So the kludge! I will work this out eventually but this workaround will have to do for now!

As always you can use the “Files and Logs” in the Bluemix dashboard to get any output that are written to stdout.

Note: As always I can’t tell how useful the command
'cf logs <application name> -- recent is for debugging.

Hope you enjoyed this Cloud Medley of Bluemix, Cloudant and Node.js!

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Find me on Google+

Natural selection of database technology through the years

Charles Darwin in his landmark book “The Origin of Species’ discusses how flora and fauna evolved through the centuries. The different features of each individual species would undergo a process of natural selection by which modification of attributes would naturally occur that would enable the species to adapt and propagate through time. Those modifications that failed to adapt would naturally become extinct.

In this post I discuss how database (DB) technology has evolved over the years. As new requirements arose database technology has had to adapt and newer paradigms have evolved. However, unlike species which became extinct the older versions still exist as they continue o address the earlier problems that remain today.

Here is a short & brief history of evolution of databases

Relational databases: Relational databases had their genesis when E.F Codd of IBM came up with a relational model of organizing data. In this model all data is organized as tables with several rows. In a relational model each row has several columns and one of the columns contains a unique value for each row called primary key. Relational databases have ruled the enterprise domain for more than 3 decades. An enterprise’s data is organized as a set of related tables. Users can query the database using Structured Query Language or SQL.

I remember in the late 1980’s when I started to work in the industry, programming jobs were much sought after by all of us engineering graduates. In those days database jobs were ‘uncool’ and system programming jobs dealing with writing assemblers, compilers were the really cool jobs. I was also susceptible to this prevailing opinion and stayed away from databases. As fate would have it I eventually moved into telecom and telecom protocol work in which I worked for more than 2 decades and have largely maintained my distance from DB.

However it recent times I did want to look brush up whatever little I knew of DB. Recently I was listening to the Coursera course “Introduction to Data Science‘ by Bill Howe. In one of the lectures the professor uttered something that really caught my fancy. He mentions that SQL is probably the closest to natural language. How true! Once the DB schema and tables have been set up, querying the DB for all sorts of data can be done in SQL which is close to natural language. For e.g.

SELECT a,b,c from TABLE S,T where condition X1 AND/OR condition X2

The power of DBs comes from the fact that all the data is organized as tables and enables one to retrieve any sort of data from it. Trying to accomplish this with any other high level programming language would take several hundreds of lines of code and we would have to write functions for each in individual query.

NoSQL databases: However the utility of relational databases decreases as we scale to hundreds of Gbs of data. In this age of the internet and the worldwide web data is easily of the order of several terabytes to a few petabytes. For e.g. Weather modelling, Social networks like FB,Twitter or LinkedIn all need to operate on millions of status updates or tweets per day. Traditional relational databases cannot handle such large sets of data. This is where the concept of NoSQL DB came into existence. NoSQL databases typically store data as key, value pairs. The singular advantage of NoSQL is that the database can scale horizontally or in other words the performance does not degrade with large increases in data size. In NoSQL databases data is hashed and uniformly distributed across commodity servers through a technique known as ‘consistent hashing’. Also data in NoSQL databases is replicated across servers. This architecture of NoSQL databases is based on common, commodity servers which are expected to crash. However this would not affect the NoSQL DB to function correctly. The strength of NoSQL databases comes from the fact that servers can join or leave the NoSQL DB without affecting the functioning of the DB. Some of the more popular examples of NoSQL DB are CouchDB, MongoDB, Riak, Voldemort, Dynamo etc.Do take a look at my post “When NoSQL makes better sense that MySQL”

NewSQL: This variation of DB came into existence as there was a need for extremely fast performance for computing tasks like analytics etc. These DBs exist completely in memory and so the access is blazingly fast. The most famous of DBs of this paradigm is SAP’s HANA.

Graph Databases: Graph databases are the recent entrants into database technology. This strain of databases came into existence to handle associative data more efficiently. In a graph database data is represented as a graph. Nodes in the graph can be entities and edges can be relationships. A search on a graph database will result in a traversal from a specified start node to a specified terminating node. “Friends’ in Facebook, ‘followers/following’ in Twitter and ‘connections’ in LinkedIn all use Graph Database to map association and enable easy search. Graph Databases is what allows these databases to make recommendations like ‘You may know’. E.g. of Graph Database Google’s Graph DB, Neo4j

As we move ahead database technology will continue to evolve into newer architectures to handle

Find me on Google+

Technologies to watch: 2012 and beyond

Published in Telecom Asia – Technologies to watch:2012 and beyond

Published in Telecoms Europe – Hot technologies for 2012 and beyond

A keen observer of the technological firmament, today, will observe a grand spectacle of diverse technological events. Some technological trends will blaze a trail and will become trend setters while others will vanish without a trace. The factors that make certain technologies to endure in comparison to others could be many, ranging from pure necessity to a coolness factor, from innovativeness to a cost factor. This article looks at some of the technologies that are certain to be trail blazers in the years to come

Software Defined Networks (SDNs): Software Defined Networks (SDNs) are based on the path breaking paradigm of separating the control of a network flow from the actual flow of data. SDN is the result of pioneering effort by Stanford University and University of California, Berkeley and is based on the Open Flow Protocol and represents a paradigm shift to the way networking elements operate. Software Defined Networks (SDN) decouples the routing and switching of the data flows and moves the control of the flow to a separate network element namely, the Flow controller. The motivation for this is that the flow of data packets through the network can be controlled in a programmatic manner. The OpenFlow Protocol has 3 components to it. The Flow Controller that controls the flows, the OpenFlow switch and the Flow Table and a secure connection between the Flow Controller and the OpenFlow switch. Software Define Networks (SDNs) also include the ability to virtualize the network resources. Virtualized network resources are known as a “network slice”. A slice can span several network elements including the network backbone, routers and hosts. The ability to control multiple traffic flows programmatically provides enormous flexibility and power in the hands of users. SDNs are bound to be the networks elements of the future.

Smart Grids: The energy industry is delicately poised for a complete transformation with the evolution of the smart grid concept. There is now an imminent need for an increased efficiency in power generation, transmission and distribution coupled with a reduction of energy losses. In this context many leading players in the energy industry are coming up with a connected end-to-end digital grid to smartly manage energy transmission and distribution. The digital grid will have smart meters, sensors and other devices distributed throughout the grid capable of sensing, collecting, analyzing and distributing the data to devices that can take action on them. The huge volume of collected data will be sent to intelligent device which will use the wireless 3G networks to transmit the data. Appropriate action like alternate routing and optimal energy distribution would then happen. Smart Grids are a certainty given that this technology addresses the dire need of efficient energy management. Smart Grids besides managing energy efficiently also save costs by preventing inefficiency and energy losses.

The NoSQL Paradigm: In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional relational databases. There are several examples of such databases – the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB & MongoDB. These databases partition the data horizontally and distribute it among many regular commodity servers. Accesses to the data are based on get(key) or set(key, value) type of APIs. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases. Applications that have to frequently access and manage petabytes of data will clearly have to move to the NoSQL paradigm of databases.

Near Field Communications (NFC): Near Field Communications (NFC) is a technology whose time has come. Mobile phones enabled with NFC technology can be used for a variety of purposes. One such purpose is integrating credit card functionality into mobile phones using NFC. Already the major players in mobile are integrating NFC into their newer versions of mobile phones including Apple’s iPhone, Google’s Android, and Nokia. We will never again have to carry in our wallets with a stack of credit cards. Our mobile phone will double up as a Visa, MasterCard, etc. NFC also allows retail stores to send promotional coupons to subscribers who are in the vicinity of the shopping mall. Posters or trailers of movies running in a theatre can be sent as multi-media clips when travelling near a movie hall. NFC also allows retail stores to send promotional coupons to subscribers who are in the vicinity of the shopping mall besides allowing exchanging contact lists with friends when they are close proximity.

The Other Suspects: Besides the above we have other usual suspects

Long Term Evolution (LTE): LTE enables is latest wireless technology that enables wireless access speeds of up to 56 Mbps. With the burgeoning interest in tablets, smartphones with the countless apps LTE will be used heavily as we move along. For a vision of where telecom is headed, do read my post ‘The Future of Telecom“.

Cloud Computing: Cloud Computing is the other technology that is bound to gain momentum in the years ahead. Besides obviating the need for upfront capital expenditure the cloud enables quick and easy deployment of applications. Moreover the elasticity of the cloud will make it irresistible to large enterprises and corporations.

The above is a list of technologies to watch as create new paths and blaze new trails. All these technologies are bound to transform the world as we know it and make our lives easier, better and more comfortable. These are the technologies that we need to focus on as we move bravely into our future. Do read my post for the year 2011 “Technology Trends – 2011 and beyond”

Find me on Google+

When NoSQL makes better sense than MySQL

In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional databases like MySQL, ORACLE, PostgreSQL etc. While the traditional databases are designed to preserve the ACID (atomic, consistent, isolated and durable) properties of data, these databases are capable of only small and frequent reads/writes.

However when there is a need to scale the application to be capable of handling millions of transactions the NoSQL model works better. There are several examples of such databases – the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB & MongoDB. These databases are based on a large number of regular commodity servers. Accesses to the data are based on get(key) or set(key,value) type of APIs.

The database is itself distributed across several commodity servers. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. In this method the key is hashed efficiently to one of the servers which can be visualized as lying on the circumference of the circle. The Chord System is one such example of the DHT algorithm. Once the destination server is identified the server does a local search in its data for the key value. Hence the key benefit of the DHT is that it is able to spread the data across multiple servers rather than having a monolithic database with a hot standby present.

The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases.

However the design of distributing data across several commodity servers has its own challenges, besides the ability to have an appropriate function to distribute the queries to. For e.g. the NoSQL database has to be able handle the requirement of new servers joining the system. Similarly since the NoSQL database is based on general purpose commodity servers the DHT algorithm must be able to handle server crashes and failures. In a distributed system this is usually done as follows. The servers in the system periodically convey messages to each other in order to update and maintain their list of the active servers in the database system. This is performed through a method known as “gossip protocol”

While databases like NoSQL, HBase, Dynamo etc do not have ACID properties they generally follow the CAP postulate. The CAP (Consistency, Availability and Partition Tolerance) theorem states that it is difficult to achieve all the 3 CAP features simultaneously. The NoSQL types of databases in order to provide for availability, typically also replicates data across servers in order to be able to handle server crashes. Since data is replicated across servers there is the issue of maintaining consistency across the servers. Amazon’s Dynamo system is based on a concept called “eventual consistency” where the data becomes consistent after a few seconds. What this signifies is that there is a small interval in which it is not consistent.

The NoSQL since it is non-relational does not provide for the entire spectrum of SQL queries. Since NoSQL is not based on the relational model queries that are based on JOINs must necessarily be iterated over in these applications. Hence the design of any application that needs to leverage the benefits of such non-relational databases must clearly separate the data management layer from the data storage layer. By separating the data management layer from how the data is stored we can easily accrue the benefits of databases like NoSQL.

While NoSQL kind of databases clearly have an excellent advantage over regular relational databases where high performance and scalability are key requirements the applications must be appropriately be tailored to take full advantage of the non-relation and distributed aspect of the database. You may also find the post “To Hadoop, or not to Hadoop” interesting.

Find me on Google+