Getting started with memcached-libmemcached

Memcached is the free, high performance, open source distributed caching system. It was designed to alleviate a high number of database queries by caching the data in memory. Since memcached is a distributed caching system the application data is distributed across servers. Data is inserted and retrieved from the distributed cache using a key,value pair.

Memcached uses a consistent hashing scheme to distribute the keys across the servers. The consistent hashing algorithm handles server crashes and servers joining-in by redistributing the keys across the necessary servers.

This article focuses on getting started with memcached-libmecached and making the process as painless as possible. After you have downloaded and installed memcached & libmemcached you are good to go.

First start 4 memcached servers
$ memcached -p 11221 &
$ memcached -p 11222 &
$ memcached -p 11223 &
$ memcached -p 11223 &

They start on the local host. (For full options check memcached -help)
Verify they are running using ps -ef.

libmemcached is the C client which can be used to connect to the memcached servers which you have started above.
A snippet of the libmemcached code client_test1.c is shown
client_test1.c
….
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
rc= memcached_set(memc, key, strlen(key),in_value, strlen(in_value),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” the”, strlen(” the”),(time_t)0, (uint32_t)0);
rc= memcached_append(memc, key, strlen(key),” people here”, strlen(” people here”), time_t)0, (uint32_t)0);
out_value= memcached_get(memc, key, strlen(key),&value_length, &flags, &rc);
printf(“Out value is: %s\n”,out_value);
memcached_server_list_free(servers);
free(out_value);
….

When you execute this client you should see
$ Out value is: We the people here

You can check which server the key is stored by doing
$ memdump –servers localhost:11221
fig
$memdump –servers localhost:11222
$
This shows that the key data is stored in the 1st servers localhost:11221

Now assume that we store a lot more data through client_test2.c
client test2.c
…..
const char *server_string= “localhost:11221, localhost:11222, localhost:11223, localhost:11224″;
memc= memcached_create(NULL);
servers= memcached_servers_parse(server_string);
rc= memcached_server_push(memc, servers);
rc= memcached_flush(memc, 0);
for (i=0; i < 100; i++)
{
sprintf(str,”%d”,i);
sprintf(str1,”%d”,2*i);
printf(“String %s string1 %s\n”,str,str1);
printf(“reached here\n”);
rc= memcached_set(memc,str, strlen(str), str1, strlen(str1),(time_t)0, (uint32_t)0);
test_true(rc == MEMCACHED_SUCCESS);
}
for(i=0; i < 10; i++)
{
printf(“Input value:”);
scanf(“%s”,testvalue);
printf(“Value to search for %s”,testvalue);
value= (uint32_t *)memcached_get(memc, testvalue, strlen(testvalue), &value_length, &flags, &rc)
test_true(rc == MEMCACHED_SUCCESS);
printf(“Value is %s\n”,value);
}
…..

After executing this when we dump the key values from the servers we will see
$ memdump –servers localhost:11221
97
94
92
89
….
….

Similarly
$ memdump –servers localhost:11222
99
98
91
87
86
….
….

Hence the keys are hashed across servers. The consistent hashing mechanism takes O(log(n)) to get to cache server as against a naive hashing scheme which would take O(1).

Happy memcaching …

Find me on Google+

Scaling out

Web Applications have challenges that are unique to the domain of the web. The key differentiating fact between internet technologies and other technologies is the need to scale up to handle sudden and large increases in traffic. While telecommunication and data communication also have the need to handle high traffic these products can be dimensioned on some upper threshold. Typical web applications have the need to provide low latencies, handle large throughput and also be extremely scalable to changing demands.

The ability to scale seems trivial. It appears that one could just add more CPU horsepower and throw in memory and bandwidth in large measure to get the performance we need. But unfortunately this is not as simple as it appears. For one, adding more CPU horsepower may not necessarily result in better performance. A badly designed application will only improve marginally.

Some applications are “embarrassingly parallel”. There is no dependency of one task on another.  For example if the task is to search for a particular string in documents or one that requires the conversion from  AVI to MPEG format then this can be done in parallel. In a public cloud this could be achieved by running more instances.

However, most real world applications are more sequential than parallel. There is a lot of dependency of data between the modules of the application.  When multiple instances have to run  in a public cloud the design considerations can be quite daunting.

For example if we had an application with parts 1,2,3,4 as follows

Now let us further assume that this is a web application and thousands of requests come to it. For simplicity sake, if we assume that for each request there is counter that has to be incremented.  How does one keep track of the total requests that are coming to application?  The challenge is how to manage such a global counter when there are multiple instances.  In a monolithic application this does not pose a problem. But with multiple instances handling web requests, each having its own copy of this counter the design becomes a challenge

Each instance has its own copy of the counter which it will update based on the requests that come to that instance through a load balancer. However, how does one compute the total number of requests that e come to all the instances

One possible solution is to use the memcached approach.  Memcached was developed as solution by Danga Corporation for Livejournal.  Memcached is a distributed caching mechanism that stores data in multiple participating servers. Memcached has some simple API calls like get(key) and set (key,value) . Memcached uses a consistent hashing mechanism which hashes the key to one of the servers among several servers. This method is known as the Distributed Hashing Table (DHT) by which it is able to distribute the keys to one of the servers. The Consistent Hashing technique is able to handle server crashes and new servers joining in the distributed cache. Since data is distributed among servers participating in the distributed cache when a server crashes all its data is distributed to the remaining servers. Similarly a server joining in the distributed cache also distributes some of the data to it. Memcached has been used in Facebook, Zynga and Livejournal.

Find me on Google+

Cloud Computing – Design Considerations

Cloud Computing is definitely turning out to be the proverbial carrot for enterprises to host their applications on the public cloud. The cloud promises many benefits to users of the cloud. Cloud Computing obviates the need for upfront capital expenses for computing infrastructure, real estate and maintenance personnel. This technology allows for scaling up or scaling down as demand on the application fluctuates.

While the advantages are many, migrating application onto the cloud is no trivial task.  The cloud is essentially composed of commodity servers. The cloud creates multiple instances of the application and runs it on the same or on different servers. The benefit of executing in parallel is that the same task can be completed faster. The cloud offers enterprises the ability to quickly scale to handle increasing demands,

But the process of deploying applications on to the cloud requires that the application be re architected to take advantage of this parallelism that the cloud provides. But the ability to handle parallelization is no simple task. The key attributes that need to be handled by distributed systems is the need for consistency and availability. If there are variables that need to be shared across the parallel instances then the application must make special provisions to handle this and ensure consistency. Similarly the application must be designed to handle failures.

Applications that are intended to be deployed on the cloud must be designed to scale-out rather than having the ability to scale-up. Scaling up refers to the process of adding more horse power by way of faster CPUs, more RAM and faster throughput.  But applications that need to be deployed on the cloud need to have the ability to scale out or scale horizontally where more servers are added without any change in processing horsepower.  The design for horizontal scalability is the key to cloud computing architectures.

Some of the key principles to keep in mind while designing for the cloud is to ensure that the application is composed of loosely coupled processes preferably based on SOA principles.  While a multi-threaded architecture where resource sharing through mutexes works in monolithic applications such a architecture is of no help when there are multiple instances of the same application running on different servers. How does one maintain consistency of the shared resource across instances?  This is a tough problem to solve. Ideally the application should be thread safe and should be based on a shared – nothing kind of architecture. One such technique is to use queues that the cloud provides as a means of sharing across instances. However this may impact the performance of the system.  Other methods include using ‘memcached’ which has been used successfully by Facebook, Twitter, Livejournal, Zynga etc deployed on the cloud. Still another method is to use the Map-Reduce algorithm where the variables across instances are handled by ‘map’ and the ‘reduce’ part handles the consistency across instances.

Another key consideration is the need to support availability requirements. Since the cloud is made up of commodity hardware there is every possibility of servers failing.  The application must be designed with inbuilt resilience to handle such failures. This could by designing active-standby architecture or by providing for checkpointing so that application can restart from some known previous point.

Hence while cloud computing is the way to go in the future there is a need to be able to carefully design the application so that full advantage of the cloud can be taken.