Presentation on the “Design principles of scalable, distributed systems”

Design principles of scalable, distributed systems from Tinniam Ganesh

Also see my blog post on this topic “Design principles of scalable, distributed system“

Getting started with memcached-libmemcached

Memcached is the free, high performance, open source distributed caching system. It was designed to alleviate a high number of database queries by caching the data in memory. Since memcached is a distributed caching system the application data is distributed across servers. Data is inserted and retrieved from the distributed cache using a key,value pair.

Memcached uses a consistent hashing scheme to distribute the keys across the servers. The consistent hashing algorithm handles server crashes and servers joining-in by redistributing the keys across the necessary servers.

This article focuses on getting started with memcached-libmecached and making the process as painless as possible. After you have downloaded and installed memcached & libmemcached you are good to go.

First start 4 memcached servers
$ memcached -p 11221 &
$ memcached -p 11222 &
$ memcached -p 11223 &
$ memcached -p 11223 &

They start on the local host. (For full options check memcached -help)
Verify they are running using ps -ef.

libmemcached is the C client which can be used to connect to the memcached servers which you have started above.
A snippet of the libmemcached code client_test1.c is shown
client_test1.c
….
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
rc= memcached_set(memc, key, strlen(key),in_value, strlen(in_value),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” the”, strlen(” the”),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” people here”, strlen(” people here”), time_t)0, (uint32_t)0);
out_value= memcached_get(memc, key, strlen(key),&value_length, &flags, &rc);
printf(“Out value is: %s\n”,out_value);
memcached_server_list_free(servers);
free(out_value);
….

When you execute this client you should see
$ Out value is: We the people here

You can check which server the key is stored by doing
$ memdump –servers localhost:11221
fig
$memdump –servers localhost:11222
$
This shows that the key data is stored in the 1st servers localhost:11221

Now assume that we store a lot more data through client_test2.c
client test2.c
…..
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
for (i=0; i < 100; i++)
{
sprintf(str,”%d”,i);
sprintf(str1,”%d”,2*i);
printf(“String %s string1 %s\n”,str,str1);
printf(“reached here\n”);
rc= memcached_set(memc,str, strlen(str), str1, strlen(str1),(time_t)0, (uint32_t)0);
test_true(rc == MEMCACHED_SUCCESS);
}
for(i=0; i < 10; i++)
{
printf(“Input value:”);
scanf(“%s”,testvalue);
printf(“Value to search for %s”,testvalue);
value= (uint32_t *)memcached_get(memc, testvalue, strlen(testvalue), &value_length, &flags, &rc)
test_true(rc == MEMCACHED_SUCCESS);
printf(“Value is %s\n”,value);
}
…..

After executing this when we dump the key values from the servers we will see
$ memdump –servers localhost:11221
97
94
92
89
….
….

Similarly
$ memdump –servers localhost:11222
99
98
91
87
86
….
….

Hence the keys are hashed across servers. The consistent hashing mechanism takes O(log(n)) to get to cache server as against a naive hashing scheme which would take O(1).

Happy memcaching …

Find me on Google+

When NoSQL makes better sense than MySQL

In large web applications where performance and scalability are key concerns a non –relational database like NoSQL is a better choice to the more traditional databases like MySQL, ORACLE, PostgreSQL etc. While the traditional databases are designed to preserve the ACID (atomic, consistent, isolated and durable) properties of data, these databases are capable of only small and frequent reads/writes.

However when there is a need to scale the application to be capable of handling millions of transactions the NoSQL model works better. There are several examples of such databases – the more reputed are Google’s BigTable, HBase, Amazon’s Dynamo, CouchDB & MongoDB. These databases are based on a large number of regular commodity servers. Accesses to the data are based on get(key) or set(key,value) type of APIs.

The database is itself distributed across several commodity servers. Accesses to the data are based on a consistent hashing scheme for example the Distributed Hash Table (DHT) method. In this method the key is hashed efficiently to one of the servers which can be visualized as lying on the circumference of the circle. The Chord System is one such example of the DHT algorithm. Once the destination server is identified the server does a local search in its data for the key value. Hence the key benefit of the DHT is that it is able to spread the data across multiple servers rather than having a monolithic database with a hot standby present.

The ability to distribute data and the queries to one of several servers provides the key benefit of scalability. Clearly having a single database handling an enormous amount of transactions will result in performance degradation as the number of transaction increases.

However the design of distributing data across several commodity servers has its own challenges, besides the ability to have an appropriate function to distribute the queries to. For e.g. the NoSQL database has to be able handle the requirement of new servers joining the system. Similarly since the NoSQL database is based on general purpose commodity servers the DHT algorithm must be able to handle server crashes and failures. In a distributed system this is usually done as follows. The servers in the system periodically convey messages to each other in order to update and maintain their list of the active servers in the database system. This is performed through a method known as “gossip protocol”

While databases like NoSQL, HBase, Dynamo etc do not have ACID properties they generally follow the CAP postulate. The CAP (Consistency, Availability and Partition Tolerance) theorem states that it is difficult to achieve all the 3 CAP features simultaneously. The NoSQL types of databases in order to provide for availability, typically also replicates data across servers in order to be able to handle server crashes. Since data is replicated across servers there is the issue of maintaining consistency across the servers. Amazon’s Dynamo system is based on a concept called “eventual consistency” where the data becomes consistent after a few seconds. What this signifies is that there is a small interval in which it is not consistent.

The NoSQL since it is non-relational does not provide for the entire spectrum of SQL queries. Since NoSQL is not based on the relational model queries that are based on JOINs must necessarily be iterated over in these applications. Hence the design of any application that needs to leverage the benefits of such non-relational databases must clearly separate the data management layer from the data storage layer. By separating the data management layer from how the data is stored we can easily accrue the benefits of databases like NoSQL.

While NoSQL kind of databases clearly have an excellent advantage over regular relational databases where high performance and scalability are key requirements the applications must be appropriately be tailored to take full advantage of the non-relation and distributed aspect of the database. You may also find the post “To Hadoop, or not to Hadoop” interesting.

Find me on Google+

Scaling out

Web Applications have challenges that are unique to the domain of the web. The key differentiating fact between internet technologies and other technologies is the need to scale up to handle sudden and large increases in traffic. While telecommunication and data communication also have the need to handle high traffic these products can be dimensioned on some upper threshold. Typical web applications have the need to provide low latencies, handle large throughput and also be extremely scalable to changing demands.

The ability to scale seems trivial. It appears that one could just add more CPU horsepower and throw in memory and bandwidth in large measure to get the performance we need. But unfortunately this is not as simple as it appears. For one, adding more CPU horsepower may not necessarily result in better performance. A badly designed application will only improve marginally.

Some applications are “embarrassingly parallel”. There is no dependency of one task on another. For example if the task is to search for a particular string in documents or one that requires the conversion from AVI to MPEG format then this can be done in parallel. In a public cloud this could be achieved by running more instances.

However, most real world applications are more sequential than parallel. There is a lot of dependency of data between the modules of the application. When multiple instances have to run in a public cloud the design considerations can be quite daunting.

For example if we had an application with parts 1,2,3,4 as follows

Now let us further assume that this is a web application and thousands of requests come to it. For simplicity sake, if we assume that for each request there is counter that has to be incremented. How does one keep track of the total requests that are coming to application? The challenge is how to manage such a global counter when there are multiple instances. In a monolithic application this does not pose a problem. But with multiple instances handling web requests, each having its own copy of this counter the design becomes a challenge

Each instance has its own copy of the counter which it will update based on the requests that come to that instance through a load balancer. However, how does one compute the total number of requests that e come to all the instances

One possible solution is to use the memcached approach. Memcached was developed as solution by Danga Corporation for Livejournal. Memcached is a distributed caching mechanism that stores data in multiple participating servers. Memcached has some simple API calls like get(key) and set (key,value) . Memcached uses a consistent hashing mechanism which hashes the key to one of the servers among several servers. This method is known as the Distributed Hashing Table (DHT) by which it is able to distribute the keys to one of the servers. The Consistent Hashing technique is able to handle server crashes and new servers joining in the distributed cache. Since data is distributed among servers participating in the distributed cache when a server crashes all its data is distributed to the remaining servers. Similarly a server joining in the distributed cache also distributes some of the data to it. Memcached has been used in Facebook, Zynga and Livejournal.