# Deep Learning from first principles in Python, R and Octave – Part 2

“What does the world outside your head really ‘look’ like? Not only is there no color, there’s also no sound: the compression and expansion of air is picked up by the ears, and turned into electrical signals. The brain then presents these signals to us as mellifluous tones and swishes and clatters and jangles. Reality is also odorless: there’s no such thing as smell outside our brains. Molecules floating through the air bind to receptors in our nose and are interpreted as different smells by our brain. The real world is not full of rich sensory events; instead, our brains light up the world with their own sensuality.”
The Brain: The Story of You” by David Eagleman

The world is Maya, illusory. The ultimate reality, the Brahman, is all-pervading and all-permeating, which is colourless, odourless, tasteless, nameless and formless
Bhagavad Gita

## 1. Introduction

This post is a follow-up post to my earlier post Deep Learning from first principles in Python, R and Octave-Part 1. In the first part, I implemented Logistic Regression, in vectorized Python,R and Octave, with a wannabe Neural Network (a Neural Network with no hidden layers). In this second part, I implement a regular, but somewhat primitive Neural Network (a Neural Network with just 1 hidden layer). The 2nd part implements classification of manually created datasets, where the different clusters of the 2 classes are not linearly separable.

Neural Network perform really well in learning all sorts of non-linear boundaries between classes. Initially logistic regression is used perform the classification and the decision boundary is plotted. Vanilla logistic regression performs quite poorly. Using SVMs with a radial basis kernel would have performed much better in creating non-linear boundaries. To see R and Python implementations of SVMs take a look at my post Practical Machine Learning with R and Python – Part 4.

Checkout my book ‘Deep Learning from first principles: Second Edition – In vectorized Python, R and Octave’. My book starts with the implementation of a simple 2-layer Neural Network and works its way to a generic L-Layer Deep Learning Network, with all the bells and whistles. The derivations have been discussed in detail. The code has been extensively commented and included in its entirety in the Appendix sections. My book is available on Amazon as paperback ($18.99) and in kindle version($9.99/Rs449).

You may also like my companion book “Practical Machine Learning with R and Python:Second Edition- Machine Learning in stereo” available in Amazon in paperback($10.99) and Kindle($7.99/Rs449) versions. This book is ideal for a quick reference of the various ML functions and associated measurements in both R and Python which are essential to delve deep into Deep Learning.

Take a look at my video presentation which discusses the below derivation step-by- step Elements of Neural Networks and Deep Learning – Part 3

You can clone and fork this R Markdown file along with the vectorized implementations of the 3 layer Neural Network for Python, R and Octave from Github DeepLearning-Part2

### 2. The 3 layer Neural Network

A simple representation of a 3 layer Neural Network (NN) with 1 hidden layer is shown below.

In the above Neural Network, there are 2 input features at the input layer, 3 hidden units at the hidden layer and 1 output layer as it deals with binary classification. The activation unit at the hidden layer can be a tanh, sigmoid, relu etc. At the output layer the activation is a sigmoid to handle binary classification

# Superscript indicates layer 1
$z_{11} = w_{11}^{1}x_{1} + w_{21}^{1}x_{2} + b_{1}$
$z_{12} = w_{12}^{1}x_{1} + w_{22}^{1}x_{2} + b_{1}$
$z_{13} = w_{13}^{1}x_{1} + w_{23}^{1}x_{2} + b_{1}$

Also $a_{11} = tanh(z_{11})$
$a_{12} = tanh(z_{12})$
$a_{13} = tanh(z_{13})$

# Superscript indicates layer 2
$z_{21} = w_{11}^{2}a_{11} + w_{21}^{2}a_{12} + w_{31}^{2}a_{13} + b_{2}$
$a_{21} = sigmoid(z21)$

Hence
$Z1= \begin{pmatrix} z11\\ z12\\ z13 \end{pmatrix} =\begin{pmatrix} w_{11}^{1} & w_{21}^{1} \\ w_{12}^{1} & w_{22}^{1} \\ w_{13}^{1} & w_{23}^{1} \end{pmatrix} * \begin{pmatrix} x1\\ x2 \end{pmatrix} + b_{1}$
And
$A1= \begin{pmatrix} a11\\ a12\\ a13 \end{pmatrix} = \begin{pmatrix} tanh(z11)\\ tanh(z12)\\ tanh(z13) \end{pmatrix}$

Similarly
$Z2= z_{21} = \begin{pmatrix} w_{11}^{2} & w_{21}^{2} & w_{31}^{2} \end{pmatrix} *\begin{pmatrix} z_{11}\\ z_{12}\\ z_{13} \end{pmatrix} +b_{2}$
and $A2 = a_{21} = sigmoid(z_{21})$

These equations can be written as
$Z1 = W1 * X + b1$
$A1 = tanh(Z1)$
$Z2 = W2 * A1 + b2$
$A2 = sigmoid(Z2)$

I) Some important results (a memory refresher!)
$d/dx(e^{x}) = e^{x}$ and $d/dx(e^{-x}) = -e^{-x}$ -(a) and
$sinhx = (e^{x} - e^{-x})/2$ and $coshx = (e^{x} + e^{-x})/2$
Using (a) we can shown that $d/dx(sinhx) = coshx$ and $d/dx(coshx) = sinhx$ (b)
Now $d/dx(f(x)/g(x)) = (g(x)*d/dx(f(x)) - f(x)*d/dx(g(x)))/g(x)^{2}$ -(c)

Since $tanhx =z= sinhx/coshx$ and using (b) we get
$tanhx = (coshx*d/dx(sinhx) - sinhx*d/dx(coshx))/(cosh^{2})$
Using the values of the derivatives of sinhx and coshx from (b) above we get
$d/dx(tanhx) = (coshx^{2} - sinhx{2})/coshx{2} = 1 - tanhx^{2}$
Since $tanhx =z$
$d/dx(tanhx) = 1 - tanhx^{2}= 1 - z^{2}$ -(d)

II) Derivatives
$L=-(Ylog(A2) + (1-Y)log(1-A2))$
$dL/dA2 = -(Y/A2 + (1-Y)/(1-A2))$
Since $A2 = sigmoid(Z2)$ therefore $dA2/dZ2 = A2(1-A2)$ see Part1
$Z2 = W2A1 +b2$
$dZ2/dW2 = A1$
$dZ2/db2 = 1$
$A1 = tanh(Z1)$ and $dA1/dZ1 = 1 - A1^{2}$
$Z1 = W1X + b1$
$dZ1/dW1 = X$
$dZ1/db1 = 1$

III) Back propagation
Using the derivatives from II) we can derive the following results using Chain Rule
$\partial L/\partial Z2 = \partial L/\partial A2 * \partial A2/\partial Z2$
$= -(Y/A2 + (1-Y)/(1-A2)) * A2(1-A2) = A2 - Y$
$\partial L/\partial W2 = \partial L/\partial A2 * \partial A2/\partial Z2 * \partial Z2/\partial W2$
$= (A2-Y) *A1$ -(A)
$\partial L/\partial b2 = \partial L/\partial A2 * \partial A2/\partial Z2 * \partial Z2/\partial b2 = (A2-Y)$ -(B)

$\partial L/\partial Z1 = \partial L/\partial A2 * \partial A2/\partial Z2 * \partial Z2/\partial A1 *\partial A1/\partial Z1 = (A2-Y) * W2 * (1-A1^{2})$
$\partial L/\partial W1 = \partial L/\partial A2 * \partial A2/\partial Z2 * \partial Z2/\partial A1 *\partial A1/\partial Z1 *\partial Z1/\partial W1$
$=(A2-Y) * W2 * (1-A1^{2}) * X$ -(C)
$\partial L/\partial b1 = \partial L/\partial A2 * \partial A2/\partial Z2 * \partial Z2/\partial A1 *dA1/dZ1 *dZ1/db1$
$= (A2-Y) * W2 * (1-A1^{2})$ -(D)

IV) Gradient Descent
The key computations in the backward cycle are
$W1 = W1-learningRate * \partial L/\partial W1$ – From (C)
$b1 = b1-learningRate * \partial L/\partial b1$ – From (D)
$W2 = W2-learningRate * \partial L/\partial W2$ – From (A)
$b2 = b2-learningRate * \partial L/\partial b2$ – From (B)

The weights and biases (W1,b1,W2,b2) are updated for each iteration thus minimizing the loss/cost.

These derivations can be represented pictorially using the computation graph (from the book Deep Learning by Ian Goodfellow, Joshua Bengio and Aaron Courville)

### 3. Manually create a data set that is not lineary separable

Initially I create a dataset with 2 classes which has around 9 clusters that cannot be separated by linear boundaries. Note: This data set is saved as data.csv and is used for the R and Octave Neural networks to see how they perform on the same dataset.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification, make_blobs
from matplotlib.colors import ListedColormap
import sklearn
import sklearn.datasets

colors=['black','gold']
cmap = matplotlib.colors.ListedColormap(colors)
X, y = make_blobs(n_samples = 400, n_features = 2, centers = 7,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
y=y.reshape(400,1)
y = y % 2
#Plot the figure
plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:,0], X[:,1], c=y,
marker= 'o', s=50,cmap=cmap)
plt.savefig('fig1.png', bbox_inches='tight')

### 4. Logistic Regression

On the above created dataset, classification with logistic regression is performed, and the decision boundary is plotted. It can be seen that logistic regression performs quite poorly

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification, make_blobs
from matplotlib.colors import ListedColormap
import sklearn
import sklearn.datasets

#from DLfunctions import plot_decision_boundary
execfile("./DLfunctions.py") # Since import does not work in Rmd!!!

colors=['black','gold']
cmap = matplotlib.colors.ListedColormap(colors)
X, y = make_blobs(n_samples = 400, n_features = 2, centers = 7,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
y=y.reshape(400,1)
y = y % 2

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X, y);

# Plot the decision boundary for logistic regression
plot_decision_boundary_n(lambda x: clf.predict(x), X.T, y.T,"fig2.png")


### 5. The 3 layer Neural Network in Python (vectorized)

The vectorized implementation is included below. Note that in the case of Python a learning rate of 0.5 and 3 hidden units performs very well.

## Random data set with 9 clusters
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sklearn.linear_model
import pandas as pd

from sklearn.datasets import make_classification, make_blobs
execfile("./DLfunctions.py") # Since import does not work in Rmd!!!

X1, Y1 = make_blobs(n_samples = 400, n_features = 2, centers = 9,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
Y1=Y1.reshape(400,1)
Y1 = Y1 % 2
X2=X1.T
Y2=Y1.T

#Perform gradient descent
parameters,costs = computeNN(X2, Y2, numHidden = 4, learningRate=0.5, numIterations = 10000)
plot_decision_boundary(lambda x: predict(parameters, x.T), X2, Y2,str(4),str(0.5),"fig3.png")
## Cost after iteration 0: 0.692669
## Cost after iteration 1000: 0.246650
## Cost after iteration 2000: 0.227801
## Cost after iteration 3000: 0.226809
## Cost after iteration 4000: 0.226518
## Cost after iteration 5000: 0.226331
## Cost after iteration 6000: 0.226194
## Cost after iteration 7000: 0.226085
## Cost after iteration 8000: 0.225994
## Cost after iteration 9000: 0.225915

### 6. The 3 layer Neural Network in R (vectorized)

For this the dataset created by Python is saved  to see how R performs on the same dataset. The vectorized implementation of a Neural Network was just a little more interesting as R does not have a similar package like ‘numpy’. While numpy handles broadcasting implicitly, in R I had to use the ‘sweep’ command to broadcast. The implementaion is included below. Note that since the initialization with random weights is slightly different, R performs best with a learning rate of 0.1 and with 6 hidden units

source("DLfunctions2_1.R")
z <- as.matrix(read.csv("data.csv",header=FALSE)) #
x <- z[,1:2]
y <- z[,3]
x1 <- t(x)
y1 <- t(y)
#Perform gradient descent
nn <-computeNN(x1, y1, 6, learningRate=0.1,numIterations=10000) # Good
## [1] 0.7075341
## [1] 0.2606695
## [1] 0.2198039
## [1] 0.2091238
## [1] 0.211146
## [1] 0.2108461
## [1] 0.2105351
## [1] 0.210211
## [1] 0.2099104
## [1] 0.2096437
## [1] 0.209409
plotDecisionBoundary(z,nn,6,0.1)

### 7.  The 3 layer Neural Network in Octave (vectorized)

This uses the same dataset that was generated using Python code.
source("DL-function2.m") data=csvread("data.csv"); X=data(:,1:2); Y=data(:,3); # Make sure that the model parameters are correct. Take the transpose of X & Y
#Perform gradient descent [W1,b1,W2,b2,costs]= computeNN(X', Y',4, learningRate=0.5, numIterations = 10000);

### 8a. Performance  for different learning rates (Python)

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sklearn.linear_model
import pandas as pd

from sklearn.datasets import make_classification, make_blobs
execfile("./DLfunctions.py") # Since import does not work in Rmd!!!
# Create data
X1, Y1 = make_blobs(n_samples = 400, n_features = 2, centers = 9,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
Y1=Y1.reshape(400,1)
Y1 = Y1 % 2
X2=X1.T
Y2=Y1.T
# Create a list of learning rates
learningRate=[0.5,1.2,3.0]
df=pd.DataFrame()
#Compute costs for each learning rate
for lr in learningRate:
parameters,costs = computeNN(X2, Y2, numHidden = 4, learningRate=lr, numIterations = 10000)
print(costs)
df1=pd.DataFrame(costs)
df=pd.concat([df,df1],axis=1)
#Set the iterations
iterations=[0,1000,2000,3000,4000,5000,6000,7000,8000,9000]
#Create data frame
#Set index
df1=df.set_index([iterations])
df1.columns=[0.5,1.2,3.0]
fig=df1.plot()
fig=plt.title("Cost vs No of Iterations for different learning rates")
plt.savefig('fig4.png', bbox_inches='tight')

### 8b. Performance  for different hidden units (Python)

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import sklearn.linear_model
import pandas as pd

from sklearn.datasets import make_classification, make_blobs
execfile("./DLfunctions.py") # Since import does not work in Rmd!!!
#Create data set
X1, Y1 = make_blobs(n_samples = 400, n_features = 2, centers = 9,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
Y1=Y1.reshape(400,1)
Y1 = Y1 % 2
X2=X1.T
Y2=Y1.T
# Make a list of hidden unis
numHidden=[3,5,7]
df=pd.DataFrame()
#Compute costs for different hidden units
for numHid in numHidden:
parameters,costs = computeNN(X2, Y2, numHidden = numHid, learningRate=1.2, numIterations = 10000)
print(costs)
df1=pd.DataFrame(costs)
df=pd.concat([df,df1],axis=1)
#Set the iterations
iterations=[0,1000,2000,3000,4000,5000,6000,7000,8000,9000]
#Set index
df1=df.set_index([iterations])
df1.columns=[3,5,7]
#Plot
fig=df1.plot()
fig=plt.title("Cost vs No of Iterations for different no of hidden units")
plt.savefig('fig5.png', bbox_inches='tight')

### 9a. Performance  for different learning rates (R)

source("DLfunctions2_1.R")
# Read data
z <- as.matrix(read.csv("data.csv",header=FALSE)) #
x <- z[,1:2]
y <- z[,3]
x1 <- t(x)
y1 <- t(y)
#Loop through learning rates and compute costs
learningRate <-c(0.1,1.2,3.0)
df <- NULL
for(i in seq_along(learningRate)){
nn <-  computeNN(x1, y1, 6, learningRate=learningRate[i],numIterations=10000)
cost <- nn$costs df <- cbind(df,cost) }   #Create dataframe df <- data.frame(df) iterations=seq(0,10000,by=1000) df <- cbind(iterations,df) names(df) <- c("iterations","0.5","1.2","3.0") library(reshape2) df1 <- melt(df,id="iterations") # Melt the data #Plot ggplot(df1) + geom_line(aes(x=iterations,y=value,colour=variable),size=1) + xlab("Iterations") + ylab('Cost') + ggtitle("Cost vs No iterations for different learning rates") ### 9b. Performance for different hidden units (R) source("DLfunctions2_1.R") # Loop through Num hidden units numHidden <-c(4,6,9) df <- NULL for(i in seq_along(numHidden)){ nn <- computeNN(x1, y1, numHidden[i], learningRate=0.1,numIterations=10000) cost <- nn$costs
df <- cbind(df,cost)

}      
df <- data.frame(df)
iterations=seq(0,10000,by=1000)
df <- cbind(iterations,df)
names(df) <- c("iterations","4","6","9")
library(reshape2)
# Melt
df1 <- melt(df,id="iterations")
# Plot
ggplot(df1) + geom_line(aes(x=iterations,y=value,colour=variable),size=1)  +
xlab("Iterations") +
ylab('Cost') + ggtitle("Cost vs No iterations for  different number of hidden units")

## 10a. Performance of the Neural Network for different learning rates (Octave)

source("DL-function2.m") plotLRCostVsIterations() print -djph figa.jpg

## 10b. Performance of the Neural Network for different number of hidden units (Octave)

source("DL-function2.m") plotHiddenCostVsIterations() print -djph figa.jpg

## 11. Turning the heat on the Neural Network

In this 2nd part I create a a central region of positives and and the outside region as negatives. The points are generated using the equation of a circle (x – a)^{2} + (y -b) ^{2} = R^{2} . How does the 3 layer Neural Network perform on this?  Here’s a look! Note: The same dataset is also used for R and Octave Neural Network constructions

## 12. Manually creating a circular central region

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification, make_blobs
from matplotlib.colors import ListedColormap
import sklearn
import sklearn.datasets

colors=['black','gold']
cmap = matplotlib.colors.ListedColormap(colors)
x1=np.random.uniform(0,10,800).reshape(800,1)
x2=np.random.uniform(0,10,800).reshape(800,1)
X=np.append(x1,x2,axis=1)
X.shape
# Create (x-a)^2 + (y-b)^2 = R^2
# Create a subset of values where squared is <0,4. Perform ravel() to flatten this vector
a=(np.power(X[:,0]-5,2) + np.power(X[:,1]-5,2) <= 6).ravel()
Y=a.reshape(800,1)

cmap = matplotlib.colors.ListedColormap(colors)

plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:,0], X[:,1], c=Y,
marker= 'o', s=15,cmap=cmap)
plt.savefig('fig6.png', bbox_inches='tight')

### 13a. Decision boundary with hidden units=4 and learning rate = 2.2 (Python)

With the above hyper parameters the decision boundary is triangular

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model
execfile("./DLfunctions.py")
x1=np.random.uniform(0,10,800).reshape(800,1)
x2=np.random.uniform(0,10,800).reshape(800,1)
X=np.append(x1,x2,axis=1)
X.shape

# Create a subset of values where squared is <0,4. Perform ravel() to flatten this vector
a=(np.power(X[:,0]-5,2) + np.power(X[:,1]-5,2) <= 6).ravel()
Y=a.reshape(800,1)

X2=X.T
Y2=Y.T

parameters,costs = computeNN(X2, Y2, numHidden = 4, learningRate=2.2, numIterations = 10000)
plot_decision_boundary(lambda x: predict(parameters, x.T), X2, Y2,str(4),str(2.2),"fig7.png")

## Cost after iteration 0: 0.692836
## Cost after iteration 1000: 0.331052
## Cost after iteration 2000: 0.326428
## Cost after iteration 3000: 0.474887
## Cost after iteration 4000: 0.247989
## Cost after iteration 5000: 0.218009
## Cost after iteration 6000: 0.201034
## Cost after iteration 7000: 0.197030
## Cost after iteration 8000: 0.193507
## Cost after iteration 9000: 0.191949

### 13b. Decision boundary with hidden units=12 and learning rate = 2.2 (Python)

With the above hyper parameters the decision boundary is triangular

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model
execfile("./DLfunctions.py")
x1=np.random.uniform(0,10,800).reshape(800,1)
x2=np.random.uniform(0,10,800).reshape(800,1)
X=np.append(x1,x2,axis=1)
X.shape

# Create a subset of values where squared is <0,4. Perform ravel() to flatten this vector
a=(np.power(X[:,0]-5,2) + np.power(X[:,1]-5,2) <= 6).ravel()
Y=a.reshape(800,1)

X2=X.T
Y2=Y.T

parameters,costs = computeNN(X2, Y2, numHidden = 12, learningRate=2.2, numIterations = 10000)
plot_decision_boundary(lambda x: predict(parameters, x.T), X2, Y2,str(12),str(2.2),"fig8.png")

## Cost after iteration 0: 0.693291
## Cost after iteration 1000: 0.383318
## Cost after iteration 2000: 0.298807
## Cost after iteration 3000: 0.251735
## Cost after iteration 4000: 0.177843
## Cost after iteration 5000: 0.130414
## Cost after iteration 6000: 0.152400
## Cost after iteration 7000: 0.065359
## Cost after iteration 8000: 0.050921
## Cost after iteration 9000: 0.039719

### 14a. Decision boundary with hidden units=9 and learning rate = 0.5 (R)

When the number of hidden units is 6 and the learning rate is 0,1, is also a triangular shape in R

source("DLfunctions2_1.R")
z <- as.matrix(read.csv("data1.csv",header=FALSE)) # N
x <- z[,1:2]
y <- z[,3]
x1 <- t(x)
y1 <- t(y)
nn <-computeNN(x1, y1, 9, learningRate=0.5,numIterations=10000) # Triangular
## [1] 0.8398838
## [1] 0.3303621
## [1] 0.3127731
## [1] 0.3012791
## [1] 0.3305543
## [1] 0.3303964
## [1] 0.2334615
## [1] 0.1920771
## [1] 0.2341225
## [1] 0.2188118
## [1] 0.2082687
plotDecisionBoundary(z,nn,6,0.1)

### 14b. Decision boundary with hidden units=8 and learning rate = 0.1 (R)

source("DLfunctions2_1.R")
z <- as.matrix(read.csv("data1.csv",header=FALSE)) # N
x <- z[,1:2]
y <- z[,3]
x1 <- t(x)
y1 <- t(y)
nn <-computeNN(x1, y1, 8, learningRate=0.1,numIterations=10000) # Hemisphere
## [1] 0.7273279
## [1] 0.3169335
## [1] 0.2378464
## [1] 0.1688635
## [1] 0.1368466
## [1] 0.120664
## [1] 0.111211
## [1] 0.1043362
## [1] 0.09800573
## [1] 0.09126161
## [1] 0.0840379
plotDecisionBoundary(z,nn,8,0.1)

### 15a. Decision boundary with hidden units=12 and learning rate = 1.5 (Octave)

source("DL-function2.m") data=csvread("data1.csv"); X=data(:,1:2); Y=data(:,3); # Make sure that the model parameters are correct. Take the transpose of X & Y [W1,b1,W2,b2,costs]= computeNN(X', Y',12, learningRate=1.5, numIterations = 10000); plotDecisionBoundary(data, W1,b1,W2,b2) print -djpg fige.jpg

Conclusion: This post implemented a 3 layer Neural Network to create non-linear boundaries while performing classification. Clearly the Neural Network performs very well when the number of hidden units and learning rate are varied.

To be continued…
Watch this space!!

To see all posts check Index of posts

# Optimal Cloud Computing

Published in CIOL as Cloud Computing: Windows of Performance , Jul 12,2011

Published in Data Quest as Cloud Computing: Cloud all the way,  Nov 16,2011

The murmur of cloud computing today, is bound to build up to crescendo in the years to come, simply because it makes a sound business sense. Cloud computing is a new paradigm in the world of computing. The cloud essentially creates an illusion of infinite computing resources that are available on demand to the user who only pays based on the usage. While on the surface it appears extremely simple and straightforward, making an optimal use of the cloud is no trivial task.

Prior to deploying on the cloud the enterprise has to decide the CPU, memory and bandwidth usage of the application. For e.g. the Amazon EC2 provides several variants of CPUs based on different pricing schemes namely $0.085/hr,$0.34/hr or \$0.68/hr for small, large or extra large CPU instances. There are different pricing schemes for memory and bandwidth usage as well.

While the technological challenge of deploying the cloud is a separate endeavor in itself, the business considerations needed for deciding the cloud computing resources, optimally, is a separate and an equally important endeavor. This article focuses on the business considerations needed for making an optimal choice of resources while deploying on the cloud.

Since the enterprise is free to choose different CPUs which typically consists of CPU processors with different clock speeds or multi core CPUs for extra large instance the choice is really complicated.

The designer needs to consider how his application scales up with respect to increasing, decreasing or burst demands in traffic. To estimate the kind of resources that would be needed would require a good understanding of how the application scales with respect to increasing traffic. Ideally it will be remarkable if the application can scale linearly with increasing traffic. The key parameters that need to be considered for application performance is application latency and throughput versus the instance type.

Also another consideration is to choose is the kind of resources types that need to be added. Ideally it would make more sense to add small CPU instances which can be added incrementally rather than adding extra large CPU instances which only handle part of the traffic. If we choose the large instance which is only partially used but has to be instantiated, nevertheless, to handle the extra traffic then it could result it wasting of precious resources.

A prime consideration is the choice of CPU resource type and the need to understand how the CPU loads up with increasing traffic with respect to latency and throughput. Once the CPU type, small, medium, large or extra large is chosen the designer needs to monitor how the loading of the CPU resource performs with increasing traffic.
Hence regardless of the choice there will be 3 windows of performance to consider

Window of Optimality: In the optimal window the cost of cloud computing resources, for handling the incoming traffic versus revenue for the enterprise is truly profitable. In the optimal window the application will be capable of scaling extremely well to increasing traffic thus resulting in excellent revenue for the enterprise.

b) Window of Diminishing Returns: In this window the addition of extra resources at additional cost will not result in a proportional increase in scalability. In fact the increasing cost of adding additional resource will offset the revenue to the enterprise as the application will not scale appropriately and will result in diminishing returns.

c) Window of Loss: This is the window, in which no enterprise should not find itself in. In this window the cost of adding the extra resources will be larger than the revenue to the enterprise as an inordinate amount of resources will have to be added for small incremental increase in scalability. This will be the result of a poorly designed application. In this situation the enterprise must go back to the drawing room and re-architect the application.

Hence cloud computing, while truly alluring for the enterprise, it is a path that must be tread very carefully by the enterprise.

Find me on Google+