Natural selection of database technology through the years

Charles Darwin in his landmark book “The Origin of Species’ discusses how flora and fauna evolved through the centuries. The different features of each individual species would undergo a process of natural selection by which modification of attributes would naturally occur that would enable the species to adapt and propagate through time. Those modifications that failed to adapt would naturally become extinct.

In this post I discuss how database (DB) technology has evolved over the years. As new requirements arose database technology has had to adapt and newer paradigms have evolved. However, unlike species which became extinct the older versions still exist as they continue o address the earlier problems that remain today.

Here is a short & brief history of evolution of databases

Relational databases: Relational databases had their genesis when E.F Codd of IBM came up with a relational model of organizing data. In this model all data is organized as tables with several rows. In a relational model each row has several columns and one of the columns contains a unique value for each row called primary key. Relational databases have ruled the enterprise domain for more than 3 decades. An enterprise’s data is organized as a set of related tables. Users can query the database using Structured Query Language or SQL.

I remember in the late 1980’s when I started to work in the industry, programming jobs were much sought after by all of us engineering graduates. In those days database jobs were ‘uncool’ and system programming jobs dealing with writing assemblers, compilers were the really cool jobs. I was also susceptible to this prevailing opinion and stayed away from databases. As fate would have it I eventually moved into telecom and telecom protocol work in which I worked for more than 2 decades and have largely maintained my distance from DB.

However it recent times I did want to look brush up whatever little I knew of DB. Recently I was listening to the Coursera course “Introduction to Data Science‘ by Bill Howe. In one of the lectures the professor uttered something that really caught my fancy. He mentions that SQL is probably the closest to natural language. How true! Once the DB schema and tables have been set up, querying the DB for all sorts of data can be done in SQL which is close to natural language. For e.g.

SELECT a,b,c from TABLE S,T where condition X1 AND/OR condition X2

The power of DBs comes from the fact that all the data is organized as tables and enables one to retrieve any sort of data from it. Trying to accomplish this with any other high level programming language would take several hundreds of lines of code and we would have to write functions for each in individual query.

NoSQL databases: However the utility of relational databases decreases as we scale to hundreds of Gbs of data. In this age of the internet and the worldwide web data is easily of the order of several terabytes to a few petabytes. For e.g. Weather modelling, Social networks like FB,Twitter or LinkedIn all need to operate on millions of status updates or tweets per day. Traditional relational databases cannot handle such large sets of data. This is where the concept of NoSQL DB came into existence. NoSQL databases typically store data as key, value pairs. The singular advantage of NoSQL is that the database can scale horizontally or in other words the performance does not degrade with large increases in data size. In NoSQL databases data is hashed and uniformly distributed across commodity servers through a technique known as ‘consistent hashing’. Also data in NoSQL databases is replicated across servers. This architecture of NoSQL databases is based on common, commodity servers which are expected to crash. However this would not affect the NoSQL DB to function correctly. The strength of NoSQL databases comes from the fact that servers can join or leave the NoSQL DB without affecting the functioning of the DB. Some of the more popular examples of NoSQL DB are CouchDB, MongoDB, Riak, Voldemort, Dynamo etc.Do take a look at my post “When NoSQL makes better sense that MySQL”

NewSQL: This variation of DB came into existence as there was a need for extremely fast performance for computing tasks like analytics etc. These DBs exist completely in memory and so the access is blazingly fast. The most famous of DBs of this paradigm is SAP’s HANA.

Graph Databases: Graph databases are the recent entrants into database technology. This strain of databases came into existence to handle associative data more efficiently. In a graph database data is represented as a graph. Nodes in the graph can be entities and edges can be relationships. A search on a graph database will result in a traversal from a specified start node to a specified terminating node. “Friends’ in Facebook, ‘followers/following’ in Twitter and ‘connections’ in LinkedIn all use Graph Database to map association and enable easy search. Graph Databases is what allows these databases to make recommendations like ‘You may know’. E.g. of Graph Database Google’s Graph DB, Neo4j

As we move ahead database technology will continue to evolve into newer architectures to handle

Find me on Google+