Software Defined Networks (SDNs): A glimpse of tomorrow

Published in Telecom Asia, Jul 28,2011 – A glimpse into the future of networking

Published in Telecoms Europe, Jul 28 2011 – SDNs are new era for networking

Networks and networking, as we know it, is on the verge of a momentous change, thanks to a path breaking technological concept known as Software Defined Networks (SDN). SDN is the result of pioneering effort by Stanford University and University of California, Berkeley and is based on the Open Flow Protocol and represents a paradigm shift to the way networking elements operate.

Networks and network elements, of today, have been largely closed and have been based on proprietary architectures. In today’s network and switching and routing of data packets happen in the same network elements for e.g. the router.

Software Defined Networks (SDN) decouples the routing and switching of the data flows and moves the control of the flow to a separate network element namely, the flow controller.   The motivation for this is that the flow of data packets through the network can be controlled in a programmatic manner. A Flow Controller can be typically implemented in a standard PC.  In some ways this is reminiscent of Intelligent Networks and Intelligent Network Protocol which delinked the service logic from the switching and moved it a network element known as the Service Control Point.

The OpenFlow Protocol has 3 components to it. The Flow Controller that controls the flows, the OpenFlow switch and the Flow Table and a secure connection between the Flow Controller and the OpenFlow switch. The OpenFlow Protocol is an open source API specification for modifying the flow table that exists in all routers, Ethernet switches and hubs.  The ability to securely control the flow of traffic programmatically opens ups amazing possibilities.

OpenFlow Specification

Alternatively, existing branded routers can implement the OpenFlow Protocol as an added feature to their existing routers and Ethernet switches. This will enable these routers and Ethernet switches to support both production traffic and research based traffic using the same set of network resources.

The single greatest advantage of separating the control and data plane of network routers and Ethernet switches is the ability to modify and control different traffic flows through a set of network resources. In addition to this benefit Software Define Networks (SDNs) also include the ability to virtualize the network resources. Virtualized network resources are known as a “network slice”. A slice can span several network elements including the network backbone, routers and hosts.

Computing resources can be virtualized through the use of the Hypervisor which abstracts the hardware and enables several guest OS to run in complete isolation. Similarly when a network element a FlowVisor, experimentally demonstrated, is used along with the OpenFlow Controller it is possible to virtualize the network resources. Hence each traffic flow gets a combination of bandwidth, routers, traffic flows and computing resources. Hence Software Defined Networks (SDNs) are also known as Virtualized Programmable Networks owing to the ability of different traffic flows being able to co-exist in perfect isolation of one another allowing for traffic flows through the resources to be controlled by programs in the Flow Controller.

The ability to manage different types of traffic flows across network resources opens up endless possibilities. SDNs have been successfully demonstrated in wireless handoffs between networks and in running multiple different flows through a common set of resources. SDNs in public and private clouds allow appropriate resources to be pooled during different times of the day based on the geographical location of the requests. Telcos could optimize the usage of their backbone network based on peak and lean traffic periods through the Core Network.

The OpenFlow Protocol has already gained widespread support in the industry and has resulted in the formation of the Open Networking Foundation (ONF). The members of ONF include behemoths like Google, Facebook, Yahoo, and Deutsche Telekom to networking giants like Cisco, Juniper, IBM and Brocade etc. Currently the ONF has around 43 member companies

Software Define Networks is a tectonic shift in the way networks operate and truly represent the dawn of a new networking era. A related post of interest is “Adding the OpenFlow variable in the IMS equation

Find me on Google+

The Case for a Cloud Based IMS Solution

IP Multimedia Systems (IMS) has been in the wings for some time. There have been several deployments by the major equipment manufacturers, but IMS is simply not happening. The vision of IMS is truly grandiose. IMS envisages an all-IP core with several servers known as Call Session Control Function (CSCF) participating to setup, maintain and release call sessions.

In the 3GPP Release 5 Architecture IMS draws an architecture of Proxy CSCF (P-CSCF), Serving CSCF(S-CSCF), Interrogating CSCF(I-CSCF), Breakout CSCF(B-CSCF), Home Subscriber Server(HSS) and Application Servers (AS) acting in concert in setting up, maintaining and release media sessions. The main protocols used in IMS are SIP/SDP for managing media sessions which could be voice, data or video and DIAMETER for connecting to the HSS and the Application Servers.

IMS is also access agnostic and is capable of handling landline or wireless calls over multiple devices from the mobile, laptop, PDA, smartphones or tablet PCs. The application possibilities of IMS are endless from video calling, live multi-player games to video chatting and mobile handoffs of calls from mobile phones to laptop. Despite the numerous possibilities IMS has not made prime time. While IMS technology paints a grand picture it has somehow not caught on. IMS as a technology, holds a lot of promise but has remained just that – promising technology.

The technology has not made the inroads into people’s imaginations or turned into a money spinner for Operators. One of the reasons may be that Operators are averse to investing enormous amounts into new technology and turning their network upside down.

This article provides an innovative approach to introducing IMS in the network by taking advantage of the public cloud!

Since IMS is an all-IP network and the protocol between the CSCF servers is SIP/SDP over TCP IP it can be readily seen that IMS is a prime candidate for the public cloud. An IMS architecture that has to be deployed on the cloud would have several instances of P-CSCFs, S-CSCFs, B-CSCFs, HSS and ASes all sitting on the cloud. An architectural diagram is shown below.

Deploying the CSCFs on the public cloud has multiple benefits. For one it a cloud deployment will eliminate the upfront CAPEX costs for the Operator. The cost savings can be passed on to the consumers whose video, data or voice calls will be cheaper. Besides, the absence of CAPEX will provide better margins to the operator. Lower costs to the consumer and better margins for the Operator is truly an unbeatable combination.

Also the elasticity of the cloud can be taken advantage of by the operator who can start small and automatically scale as the user base grows.

Thus a cloud based IMS deployment is truly a great combination both for the subscriber, the operator and the equipment manufactures. The cloud’s elasticity will automatically provide for growth as the irresistibility of  IMSes high speed video applications catches public imagination.

If IMS as a technology needs to become common place then Operators should plan on deploying their IMS on the public cloud and reap the manifold benefits.

Please see my post for a more detailed view of the above post in “Architecting a cloud based IP Multimedia System (IMS)

A related post of relevance is “Adding the OpenFlow variable to the IMS equation“.

Find me on Google+

Cache-22

If you want performance you need to partition data. If you partition data you will not get performance! That sounded clever but is it true? Well, it can be if the architecture of your application is naïve.

The problem I am describing here is when there is a need to partition data across multiple geographical regions. Partitioning data essentially spreads data among several servers resulting in fast accesses. But when the data is spread across large geographical distances then this will result in significant network latencies. This is something that cannot be avoided.

Memcached is a common technique to store commonly read data into in-memory caches preventing frequent dips to the database. Memcached accesses data through “gets” and updates data through “sets”. Data is accessed based on a key which is hashed to one of several participating servers. Thus memcached distributes the data among several participating servers in a server list.  Reads and writes are of the order of O(1) and extremely fast. This works fine as long as servers belong to a single region or if the data center is in the same region. The network latencies will be low and the latency of the application will not be severely affected.

Now consider a situation where the memcached servers have to be distributed to multi-region data centers. While this is an excellent scheme for Disaster Recovery (DR) it introduces its own set of attendant problems.

Memcached will hash the entire data set and distribute it over the entire server list. Now “gets” of data from one geographical region to another will have significant latency. Since the laws of physics mandate that nothing can exceed the speed of light, we will be stuck with appreciable latencies for inter-region reads and writes.  So while a multi-region deployment provides for geographical resiliency it does introduce issues of latency and degraded throughput.

So what is the solution? One possible solution is to replicate the data across the regions. The solution to this problem is to replicate data in all the regions.  One technique that I can think of is to have the application to implement “local reads & global writes”.  This technique provides for the AP part of the CAP Theorem. The CAP theorem states that it is impossible to completely provide Consistency, Availability and Partition tolerance to distributed application. The “local reads & global writes” method will assure availability and partition tolerance while providing for eventual consistency.

In this technique, updates are done both on local servers along with asynchronous writes to all data centers. The writes are hence global in nature. The updates will not wait for all writes to complete before moving along. However reads will be local ensuring that the latency is low. Data reads based on data proximity will ensure that latency is really low.

Since writes are asynchronous the data will tend to be “eventually consistent” rather than being “strongly consistent” but this is a tradeoff that can be taken into account. Ideally it will be essential to implement the quorum protocol along with the “local reads & global writes” technique to ensure that you read your writes.

The application could have a modified quorum protocol such that R+ W > N where R is the number of data reads and W is the number of writes to servers and N is the total number of servers in the memcached server list.

Similar technique has been used in Cassandra & CouchDB etc.

With the “local reads & global writes” technique it is possible keep the latencies within reasonable limits since data reads will be based on proximity. Also replication the data to all regions will also ensure that eventually all regions will have a consistent view of the data.

Find me on Google+