# Computing Win-Probability of T20 matches

I am late to the ‘Win probability’ computation for T20 matches, but managed to jump on to this bus with this post. Win Probability analysis and computation have been around for some time and are used in baseball, NFL, soccer hockey and others. On T20 cricket, the following posts from White Ball Analytics & Sports Data Science were good pointers to the general approach. The data for the Win Probability computation is taken from Cricsheet.

My initial Machine Learning models could not do better than 62% accuracy. I created a data set of ~830 IPL matches which roughly came to about 280,000 rows of ball-by-ball match data but I could not move beyond 62%. Addition of T20 men moved the needle to 64% accuracy. I spent time tuning Deep Learning networks using Tensorflow and Keras. Finally, I added T20 data from 9 T20 leagues – IPL, T20 men, T20 women, BBL, CPL, NTB, PSL, WBB, SSM. I had one large data set of 1.2 million rows of ball by ball data. The data frame looks like

I created a data frame for each match from ball Num 1 to ballNum ~240 for the 1st and 2nd innings of the match. My initial set of features were ballNum, runs, runRate, numWickets. The target variable isWinner= {0,1} depending on whether the team has won or lost the match.

The features

• ballNum – ball number for 1 ~ 240+ in data frame. 1 – 120+ for 1st innings and 120+ – 240+ in 2nd innings including noballs, wides etc.
• runs = cumulative runs scored at the ball count
• runRate = cumulative runs scored/ ballNum (for 1st innings) and runs= required runs/ball Num for 2nd innings
• numWickets = wickets lost

The target variable isWinner can take values {0,1} depending whether the team won or lost

With this initial dataframe, even though I had close to 1.2 million rows of ball by ball data of T20 matches my best performance with vanilla Logistic regression & SVM in Python was about 64% accuracy.

# Read all the data from 9 T20 leagues
# BBL,CPL, IPL, NTB, PSL, SSM, T20 Men, T20 Women, WBB

# Create one large dataframe
df10=pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9])
print("Shape of dataframe=",df10.shape)
print("#####################################")
stats=check_values(df10)
print("#####################################")
model_fit(df10)
#norm_model_fit(df,stats)
svm_model_fit(df10)

Shape of dataframe= (1206901, 6)
#####################################
Null values: False
It contains 0 infinite values

Accuracy of Logistic regression classifier on training set: 0.63
Accuracy of Logistic regression classifier on test set: 0.64
Accuracy: 0.64
Precision: 0.62
Recall: 0.65
F1: 0.64

Accuracy of Linear SVC classifier on training set: 0.52
Accuracy of Linear SVC classifier on test set: 0.52

With Tensorflow/Keras the performance was about 67%. I tried several things

• Normalisation
• Tried different learning rates
• Different optimisers – SGD, RMSProp, Adam
• Changed depth and width of Neural Network

However I did not get much improvement. Finally I decided to do some Feature engineering. I added 2 new features

a) Runs Momentum : This feature is based on the fact that more the wickets in hand, the more freely the batsmen can make risky strokes, hence increasing the momentum of the runs, This is calculated as

runsMomentum = (11 – numWickets)/balls remaining

b) Performance Index: This feature is the product of the run rate x wickets in hand. In other words, if the strike rate is good and fewer wickets lost at the point in the match, then the performance index is higher at that point in the match will be higher

The final set of features chosen were as below

I had also included the balls Remaining in the innings. Now with this set of features I decided to execute Tensorflow/Keras and do a GridSearch with different learning rates, optimisers. After a couple of hours of computation I got an accuracy of 0.73. I needed to be able to read the ML model in R which required installation of Tensorflow, reticulate and Keras in RStudio and I had several issues. Since I hit a roadblock I moved to regular R models

I performed WIn Probability computation in the following ways

A) Win Probability with Vanilla Logistic Regression (R)

With vanilla Logistic Regression in R using the ‘glm’ package I got an accuracy of 0.67, sensitivity of 0.68 and specificity of 0.65 as shown below

library(dplyr)
library(caret)
library(e1071)
library(ggplot2)

# Read all the data from 9 T20 leagues
# BBL,CPL, IPL, NTB, PSL, SSM, T20 Men, T20 Women, WBB

# Create one large dataframe
df=rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9)

# Helper function to split into training/test
trainTestSplit <- function(df,trainPercent,seed1){
## Sample size percent
samp_size <- floor(trainPercent/100 * nrow(df))
## set the seed
set.seed(seed1)
idx <- sample(seq_len(nrow(df)), size = samp_size)
idx

}

train_idx <- trainTestSplit(df,trainPercent=80,seed=5)
train <- df[train_idx, ]

test <- df[-train_idx, ]
# Fit a generalized linear logistic model,
fit=glm(isWinner~.,family=binomial,data=train,control = list(maxit = 50))

a=predict(fit,newdata=train,type="response")
# Set response >0.5 as 1 and <=0.5 as 0
b=as.factor(ifelse(a>0.5,1,0))
# Compute the confusion matrix for training data

confusionMatrix(
factor(b, levels = 0:1),
factor(train$isWinner, levels = 0:1) ) Confusion Matrix and Statistics Reference Prediction 0 1 0 339938 160336 1 154236 310217 Accuracy : 0.6739 95% CI : (0.673, 0.6749) No Information Rate : 0.5122 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.3473 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.6879 Specificity : 0.6593 Pos Pred Value : 0.6795 Neg Pred Value : 0.6679 Prevalence : 0.5122 Detection Rate : 0.3524 Detection Prevalence : 0.5186 Balanced Accuracy : 0.6736 'Positive' Class : 0 # This can be saved and loaded as saveRDS(fit, "glm.rds") ml_model <- readRDS("glm.rds")  Using the above ML model on Deccan Chargers vs Chennai Super on 27-04-2009 the Win Probability as the match progresses is as below The Worm wicket graph of this match shows it was a closely fought match B) Win Probability using Random Forests with Tidy Models – R Initially I tried Tidy models with tuning for glmnet. The best I got was 0.67. However, I got an excellent performance using TidyModels with Random Forests. I am using Tidy Models for the first time and I have been blown away with how logically it is constructed, much like dplyr & ggplot2. library(dplyr) library(caret) library(e1071) library(ggplot2) library(tidymodels) # Helper packages library(readr) # for importing data library(vip) library(ranger) # Read all the data from 9 T20 leagues # BBL,CPL, IPL, NTB, PSL, SSM, T20 Men, T20 Women, WBB df1=read.csv("output2/matchesBBL2.csv") df2=read.csv("output2/matchesCPL2.csv") df3=read.csv("output2/matchesIPL2.csv") df4=read.csv("output2/matchesNTB2.csv") df5=read.csv("output2/matchesPSL2.csv") df6=read.csv("output2/matchesSSM2.csv") df7=read.csv("output2/matchesT20M2.csv") df8=read.csv("output2/matchesT20W2.csv") df9=read.csv("output2/matchesWBB2.csv") # Create one large dataframe df=rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9) dim(df) [1] 1205909 8 # Take a peek at the dataset glimpse(df)$ ballNum        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28…
$ballsRemaining <int> 125, 124, 123, 122, 121, 120, 119, 118, 117, 116, 115, 114, 113, 112, 111, 110, 109, 108, 107, 106, 1…$ runs           <int> 1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 6, 7, 13, 14, 16, 18, 18, 18, 24, 24, 24, 26, 26, 32, 32, 33, 34, 34, 3…
$runRate <dbl> 1.0000000, 0.5000000, 0.6666667, 0.7500000, 0.6000000, 0.5000000, 0.5714286, 0.5000000, 0.5555556, 0.…$ numWickets     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,…
$runsMomentum <dbl> 0.08800000, 0.08870968, 0.08943089, 0.09016393, 0.09090909, 0.09166667, 0.09243697, 0.09322034, 0.094…$ perfIndex      <dbl> 11.000000, 5.500000, 7.333333, 8.250000, 6.600000, 5.500000, 6.285714, 5.500000, 6.111111, 5.000000, …
$isWinner <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,… df %>% count(isWinner) %>% mutate(prop = n/sum(n)) set.seed(123) df$isWinner = as.factor(df$isWinner) # Split the data into training and test set in 80%:20% splits <- initial_split(df,prop = 0.80) df_other <- training(splits) df_test <- testing(splits) # Create a validation set from training set in 80%:20% set.seed(234) val_set <- validation_split(df_other, prop = 0.80) val_set # Setup for Random forest using Ranger for classification # Set up cores for parallel execution cores <- parallel::detectCores() cores #Set up Random Forest engine rf_mod <- rand_forest(mtry = tune(), min_n = tune(), trees = 1000) %>% set_engine("ranger", num.threads = cores) %>% set_mode("classification") rf_mod # The Random Forest engine includes mtry which is number of predictor # variables required at each decision tree with min_n the minimum number # of Random Forest Model Specification (classification) Main Arguments: mtry = tune() trees = 1000 min_n = tune() Engine-Specific Arguments: num.threads = cores Computational engine: ranger # Setup the predictors and target variable # Normalise all predictors. Random Forest don't need normalization but # I have done it anyway rf_recipe <- recipe(isWinner ~ ., data = df_other) %>% step_normalize(all_predictors()) # Create workflow adding the ML model and recipe rf_workflow <- workflow() %>% add_model(rf_mod) %>% add_recipe(rf_recipe) # The tune is done for 5 different values of the tuning parameters. # Metrics include accuracy and roc_auc rf_res <- rf_workflow %>% tune_grid(val_set, grid = 5, control = control_grid(save_pred = TRUE), metrics = metric_set(accuracy,roc_auc))$ Pick the best of ROC/AUC
rf_res %>%
show_best(metric = "roc_auc")

We can see that when mtry (number of predictors) is 5 or 7 the ROC_AUC is 0.834 which is quite good

# A tibble: 5 × 8
mtry min_n .metric .estimator  mean     n std_err .config
<int> <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>
1     5    26 roc_auc binary     0.834     1      NA Preprocessor1_Model5
2     7    36 roc_auc binary     0.834     1      NA Preprocessor1_Model3
3     2    17 roc_auc binary     0.833     1      NA Preprocessor1_Model4
4     1    20 roc_auc binary     0.832     1      NA Preprocessor1_Model2
5     5     6 roc_auc binary     0.825     1      NA Preprocessor1_Model1

# Select the model with highest accuracy
rf_res %>%
show_best(metric = "accuracy")
mtry min_n .metric  .estimator  mean     n std_err .config
<int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>
1     7    36 accuracy binary     0.737     1      NA Preprocessor1_Model3
2     5    26 accuracy binary     0.736     1      NA Preprocessor1_Model5
3     1    20 accuracy binary     0.736     1      NA Preprocessor1_Model2
4     2    17 accuracy binary     0.735     1      NA Preprocessor1_Model4
5     5     6 accuracy binary     0.731     1      NA Preprocessor1_Model1

# The model with mtry (number of predictors) is 7 has the best accuracy.
# Hence the best model has mtry=7 and min_n=36

rf_best <-
rf_res %>%
select_best(metric = "accuracy")

# Display the best model
rf_best
# A tibble: 1 × 3
mtry min_n .config
<int> <int> <chr>
1     7    36 Preprocessor1_Model3

rf_res %>%
collect_predictions()
id         .pred_class  .row  mtry min_n .pred_0  .pred_1 isWinner .config
<chr>      <fct>       <int> <int> <int>   <dbl>    <dbl> <fct>    <chr>
1 validation 1               1     5     6 0.497   0.503    0        Preprocessor1_Model1
2 validation 1               9     5     6 0.00753 0.992    1        Preprocessor1_Model1
3 validation 0              10     5     6 0.627   0.373    0        Preprocessor1_Model1
4 validation 0              16     5     6 0.998   0.002    0        Preprocessor1_Model1
5 validation 1              18     5     6 0.270   0.730    1        Preprocessor1_Model1
6 validation 0              23     5     6 0.899   0.101    0        Preprocessor1_Model1
7 validation 1              26     5     6 0.452   0.548    1        Preprocessor1_Model1
8 validation 0              30     5     6 0.657   0.343    1        Preprocessor1_Model1
9 validation 0              34     5     6 0.576   0.424    0        Preprocessor1_Model1
10 validation 0              35     5     6 1.00    0.000167 0        Preprocessor1_Model1

rf_auc <-
rf_res %>%
collect_predictions(parameters = rf_best) %>%
roc_curve(isWinner, .pred_0) %>%
mutate(model = "Random Forest")

autoplot(rf_auc)



I

The Final Model

# Create the final Random Forest model with mtry=7 and min_n=36
# engine as "ranger" for classification
last_rf_mod <-
rand_forest(mtry = 7, min_n = 36, trees = 1000) %>%
set_engine("ranger", num.threads = cores, importance = "impurity") %>%
set_mode("classification")

# the last workflow is updated with the final model
last_rf_workflow <-
rf_workflow %>%
update_model(last_rf_mod)

set.seed(345)
last_rf_fit <-
last_rf_workflow %>%
last_fit(splits)

# Collect metrics
last_rf_fit %>%
collect_metrics()
.metric  .estimator .estimate .config
<chr>    <chr>          <dbl> <chr>
1 accuracy binary         0.739 Preprocessor1_Model1
2 roc_auc  binary         0.837 Preprocessor1_Model1

The Random Forest model gives an accuracy of 0.739 and ROC_AUC of .837 which I think is quite good. This is roughly what I got with Tensorflow/Keras

# Get the feature importance
last_rf_fit %>%
extract_fit_parsnip() %>%
vip(num_features = 7)



Interestingly the feature that I engineered seems to have the maximum importancce namely Performance Index which is a product of Run rate x Wicket in Hand. I would have thought numWickets would be important but in T20 match probably is is not.

 generate predictions from the test set
test_predictions <- last_rf_fit %>% collect_predictions()
> test_predictions
# A tibble: 241,182 × 7
id               .pred_0 .pred_1  .row .pred_class isWinner .config
<chr>              <dbl>   <dbl> <int> <fct>       <fct>    <chr>
1 train/test split   0.496   0.504     1 1           0        Preprocessor1_Model1
2 train/test split   0.640   0.360    11 0           0        Preprocessor1_Model1
3 train/test split   0.596   0.404    14 0           0        Preprocessor1_Model1
4 train/test split   0.287   0.713    22 1           0        Preprocessor1_Model1
5 train/test split   0.616   0.384    28 0           0        Preprocessor1_Model1
6 train/test split   0.516   0.484    36 0           0        Preprocessor1_Model1
7 train/test split   0.754   0.246    37 0           0        Preprocessor1_Model1
8 train/test split   0.641   0.359    39 0           0        Preprocessor1_Model1
9 train/test split   0.811   0.189    40 0           0        Preprocessor1_Model1
10 train/test split   0.618   0.382    42 0           0        Preprocessor1_Model1

# generate a confusion matrix
test_predictions %>%
conf_mat(truth = isWinner, estimate = .pred_class)

Truth
Prediction     0     1
0 92173 31623
1 31320 86066

# Create the final model on the train/test data
final_model <- fit(last_rf_workflow, df_other)

# Final model
final_model
══ Workflow [trained] ════════════════════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: rand_forest()

── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 Recipe Step

• step_normalize()

── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Ranger result

Call:
ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~7,      x), num.trees = ~1000, min.node.size = min_rows(~36, x),      num.threads = ~cores, importance = ~"impurity", verbose = FALSE,      seed = sample.int(10^5, 1), probability = TRUE)

Type:                             Probability estimation
Number of trees:                  1000
Sample size:                      964727
Number of independent variables:  7
Mtry:                             7
Target node size:                 36
Variable importance mode:         impurity
Splitrule:                        gini
OOB prediction error (Brier s.):  0.1631303


The Random Forest Model’s performance has been quite impressive and probably requires further exploration.

# Saving and loading the model
save(final_model, file = "fit.rda")

#Predicting the Win Probability of CSK vs DD match on 12 May 2012

Comparing this with the Worm wicket graph of this match we see that DD had no chance at all

C) Win Probability with Tensorflow/Keras with Grid Search – Python

I spent a fair amount of time tuning the hyper parameters of the Keras Deep Learning Network. Finally did go for the Grid Search. Incidentally I did ask ChatGPT to suggest code snippets for GridSearch which it promptly did!!!

import pandas as pd
import numpy as np
from zipfile import ZipFile
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from sklearn.model_selection import GridSearchCV

# Define the model
tf.random.set_seed(4)
model = tf.keras.Sequential([
keras.layers.Dense(32, activation=tf.nn.relu, input_shape=[len(train_dataset1.keys())]),
keras.layers.Dense(16, activation=tf.nn.relu),
keras.layers.Dense(8, activation=tf.nn.relu),
keras.layers.Dense(1,activation=tf.nn.sigmoid)
])

# Since this is binary classification use binary_crossentropy
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics='accuracy')
return(model)

# Create a KerasClassifier object
model = keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model)

# Define the grid of hyperparameters to search over
batch_size = [1024]
epochs = [40]
learning_rate = [0.01, 0.001, 0.0001]

param_grid = dict(dict(optimizer=optimizer,batch_size=batch_size, epochs=epochs) )
# Create the grid search object
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)

# Fit the grid search object to the training data
grid_search.fit(normalized_train_data, train_labels)

# Print the best hyperparameters
print('Best hyperparameters:', grid_search.best_params_)
# summarize results
print("Best: %f using %s" % (grid_search.best_score_, grid_search.best_params_))
means = grid_search.cv_results_['mean_test_score']
stds = grid_search.cv_results_['std_test_score']
params = grid_search.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))

The best worked out to be the optimiser ‘Nadam’ with a learning rate of 0.001

import matplotlib.pyplot as plt
# Create a model
tf.random.set_seed(4)
model = tf.keras.Sequential([
keras.layers.Dense(32, activation=tf.nn.relu, input_shape=[len(train_dataset1.keys())]),
keras.layers.Dense(16, activation=tf.nn.relu),
keras.layers.Dense(8, activation=tf.nn.relu),
keras.layers.Dense(1,activation=tf.nn.sigmoid)
])

# Use the Nadam optimiser
optimizer=keras.optimizers.Nadam(learning_rate=.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, decay=0.0)

# Since this is binary classification use binary_crossentropy
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics='accuracy')

# Fit
#history=model.fit(
#  train_dataset1, train_labels,batch_size=1024,
#  epochs=40, validation_data=(test_dataset1,test_labels), verbose=1)
history=model.fit(
normalized_train_data, train_labels,batch_size=1024,
epochs=40, validation_data=(normalized_test_data,test_labels), verbose=1)

Epoch 37/40
943/943 [==============================] - 3s 3ms/step - loss: 0.4971 - accuracy: 0.7310 - val_loss: 0.4968 - val_accuracy: 0.7357
Epoch 38/40
943/943 [==============================] - 3s 3ms/step - loss: 0.4970 - accuracy: 0.7310 - val_loss: 0.4974 - val_accuracy: 0.7378
Epoch 39/40
943/943 [==============================] - 4s 4ms/step - loss: 0.4970 - accuracy: 0.7309 - val_loss: 0.4994 - val_accuracy: 0.7296
Epoch 40/40
943/943 [==============================] - 3s 3ms/step - loss: 0.4969 - accuracy: 0.7311 - val_loss: 0.4998 - val_accuracy: 0.7300
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "test"], loc="upper left")
plt.show()

Conclusion

So, the Keras Deep Learning Network gives about the same performance of Random Forest in Tidy Models. But I went with R Random Forest as it was easier to save and load the model for use with my data. Also, I am not sure whether the performance of the ML model can be improved beyond a point. However, I will continue to explore.

Watch this space!!!

Also see

To see all posts click Index of posts

References

# Deep Learning from first principles in Python, R and Octave – Part 3

“Once upon a time, I, Chuang Tzu, dreamt I was a butterfly, fluttering hither and thither, to all intents and purposes a butterfly. I was conscious only of following my fancies as a butterfly, and was unconscious of my individuality as a man. Suddenly, I awoke, and there I lay, myself again. Now I do not know whether I was then a man dreaming I was a butterfly, or whether I am now a butterfly dreaming that I am a man.”
from The Brain: The Story of you – David Eagleman

“Thought is a great big vector of neural activity”
Prof Geoffrey Hinton

# Introduction

This is the third part in my series on Deep Learning from first principles in Python, R and Octave. In the first part Deep Learning from first principles in Python, R and Octave-Part 1, I implemented logistic regression as a 2 layer neural network. The 2nd part Deep Learning from first principles in Python, R and Octave-Part 2, dealt with the implementation of 3 layer Neural Networks with 1 hidden layer to perform classification tasks, where the 2 classes cannot be separated by a linear boundary. In this third part, I implement a multi-layer, Deep Learning (DL) network of arbitrary depth (any number of hidden layers) and arbitrary height (any number of activation units in each hidden layer). The implementations of these Deep Learning networks, in all the 3 parts, are based on vectorized versions in Python, R and Octave. The implementation in the 3rd part is for a L-layer Deep Netwwork, but without any regularization, early stopping, momentum or learning rate adaptation techniques. However even the barebones multi-layer DL, is a handful and has enough hyperparameters to fine-tune and adjust.

Checkout my book ‘Deep Learning from first principles: Second Edition – In vectorized Python, R and Octave’. My book starts with the implementation of a simple 2-layer Neural Network and works its way to a generic L-Layer Deep Learning Network, with all the bells and whistles. The derivations have been discussed in detail. The code has been extensively commented and included in its entirety in the Appendix sections. My book is available on Amazon as paperback ($18.99) and in kindle version($9.99/Rs449).

The implementation of the vectorized L-layer Deep Learning network in Python, R and Octave were both exhausting, and exacting!! Keeping track of the indices, layer number and matrix dimensions required quite bit of focus. While the implementation was demanding, it was also very exciting to get the code to work. The trick was to be able to shift gears between the slight quirkiness between the languages. Here are some of challenges I faced.

1. Python and Octave allow multiple return values to be unpacked in a single statement. With R, unpacking multiple return values from a list, requires the list returned, to be unpacked separately. I did see that there is a package gsubfn, which does this.  I hope this feature becomes a base R feature.
2. Python and R allow dissimilar elements to be saved and returned from functions using dictionaries or lists respectively. However there is no real equivalent in Octave. The closest I got to this functionality in Octave, was the ‘cell array’. But the cell array can be accessed only by the index, and not with the key as in a Python dictionary or R list. This makes things just a bit more difficult in Octave.
3. Python and Octave include implicit broadcasting. In R, broadcasting is not implicit, but R has a nifty function, the sweep(), with which we can broadcast either by columns or by rows
4. The closest equivalent of Python’s dictionary, or R’s list, in Octave is the cell array. However I had to manage separate cell arrays for weights and biases and during gradient descent and separate gradients dW and dB
5. In Python the rank-1 numpy arrays can be annoying at times. This issue is not present in R and Octave.

Though the number of lines of code for Deep Learning functions in Python, R and Octave are about ~350 apiece, they have been some of the most difficult code I have implemented. The current vectorized implementation supports the relu, sigmoid and tanh activation functions as of now. I will be adding other activation functions like the ‘leaky relu’, ‘softmax’ and others, to the implementation in the weeks to come.

While testing with different hyper-parameters namely i) the number of hidden layers, ii) the number of activation units in each layer, iii) the activation function and iv) the number iterations, I found the L-layer Deep Learning Network to be very sensitive to these hyper-parameters. It is not easy to tune the parameters. Adding more hidden layers, or more units per layer, does not help and mostly results in gradient descent getting stuck in some local minima. It does take a fair amount of trial and error and very close observation on how the DL network performs for logical changes. We then can zero in on the most the optimal solution. Feel free to download/fork my code from Github DeepLearning-Part 3 and play around with the hyper-parameters for your own problems.

#### Derivation of a Multi Layer Deep Learning Network

Note: A detailed discussion of the derivation below is available in my video presentation Neural Network 4
Lets take a simple 3 layer Neural network with 3 hidden layers and an output layer

In the forward propagation cycle the equations are

$Z_{1} = W_{1}A_{0} +b_{1}$  and  $A_{1} = g(Z_{1})$
$Z_{2} = W_{2}A_{1} +b_{2}$  and  $A_{2} = g(Z_{2})$
$Z_{3} = W_{3}A_{2} +b_{3}$  and $A_{3} = g(Z_{3})$

The loss function is given by
$L = -(ylogA3 + (1-y)log(1-A3))$
and $dL/dA3 = -(Y/A_{3} + (1-Y)/(1-A_{3}))$

For a binary classification the output activation function is the sigmoid function given by
$A_{3} = 1/(1+ e^{-Z3})$. It can be shown that
$dA_{3}/dZ_{3} = A_{3}(1-A_3)$ see equation 2 in Part 1

$\partial L/\partial Z_{3} = \partial L/\partial A_{3}* \partial A_{3}/\partial Z_{3} = A3-Y$ see equation (f) in  Part 1
and since
$\partial L/\partial A_{2} = \partial L/\partial Z_{3} * \partial Z_{3}/\partial A_{2} = (A_{3} -Y) * W_{3}$ because $\partial Z_{3}/\partial A_{2} = W_{3}$ -(1a)
and $\partial L/\partial Z_{2} =\partial L/\partial A_{2} * \partial A_{2}/\partial Z_{2} = (A_{3} -Y) * W_{3} *g'(Z_{2})$ -(1b)
$\partial L/\partial W_{2} = \partial L/\partial Z_{2} * A_{1}$ -(1c)
since $\partial Z_{2}/\partial W_{2} = A_{1}$
and
$\partial L/\partial b_{2} = \partial L/\partial Z_{2}$ -(1d)
because
$\partial Z_{2}/\partial b_{2} =1$

Also

$\partial L/\partial A_{1} =\partial L/\partial Z_{2} * \partial Z_{2}/\partial A_{1} = \partial L/\partial Z_{2} * W_{2}$     – (2a)
$\partial L/\partial Z_{1} =\partial L/\partial A_{1} * \partial A_{1}/\partial Z_{1} = \partial L/\partial A_{1} * W_{2} *g'(Z_{1})$          – (2b)
$\partial L/\partial W_{1} = \partial L/\partial Z_{1} * A_{0}$ – (2c)
$\partial L/\partial b_{1} = \partial L/\partial Z_{1}$ – (2d)

Inspecting the above equations (1a – 1d & 2a-2d), our ‘Uber deep, bottomless’ brain  can easily discern the pattern in these equations. The equation for any layer ‘l’ is of the form
$Z_{l} = W_{l}A_{l-1} +b_{l}$     and  $A_{l} = g(Z_{l})$
The equation for the backward propagation have the general form
$\partial L/\partial A_{l} = \partial L/\partial Z_{l+1} * W^{l+1}$
$\partial L/\partial Z_{l}=\partial L/\partial A_{l} *g'(Z_{l})$
$\partial L/\partial W_{l} =\partial L/\partial Z_{l} *A^{l-1}$
$\partial L/\partial b_{l} =\partial L/\partial Z_{l}$

Some other important results The derivatives of the activation functions in the implemented Deep Learning network
g(z) = sigmoid(z) = $1/(1+e^{-z})$ = a g’(z) = a(1-a) – See Part 1
g(z) = tanh(z) = a g’(z) = $1 - a^{2}$
g(z) = relu(z) = z  when z>0 and 0 when z 0 and 0 when z <= 0
While it appears that there is a discontinuity for the derivative at 0 the small value at the discontinuity does not present a problem

The implementation of the multi layer vectorized Deep Learning Network for Python, R and Octave is included below. For all these implementations, initially I create the size and configuration of the the Deep Learning network with the layer dimennsions So for example layersDimension Vector ‘V’ of length L indicating ‘L’ layers where

V (in Python)= $[v_{0}, v_{1}, v_{2}$, … $v_{L-1}]$
V (in R)= $c(v_{1}, v_{2}, v_{3}$ , … $v_{L})$
V (in Octave)= [ $v_{1} v_{2} v_{3}$$v_{L}]$

In all of these implementations the first element is the number of input features to the Deep Learning network and the last element is always a ‘sigmoid’ activation function since all the problems deal with binary classification.

The number of elements between the first and the last element are the number of hidden layers and the magnitude of each $v_{i}$ is the number of activation units in each hidden layer, which is specified while actually executing the Deep Learning network using the function L_Layer_DeepModel(), in all the implementations Python, R and Octave

## 1a. Classification with Multi layer Deep Learning Network – Relu activation(Python)

In the code below a 4 layer Neural Network is trained to generate a non-linear boundary between the classes. In the code below the ‘Relu’ Activation function is used. The number of activation units in each layer is 9. The cost vs iterations is plotted in addition to the decision boundary. Further the accuracy, precision, recall and F1 score are also computed

import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
import sklearn.linear_model

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification, make_blobs
from matplotlib.colors import ListedColormap
import sklearn
import sklearn.datasets

#from DLfunctions import plot_decision_boundary
execfile("./DLfunctions34.py") #
os.chdir("C:\\software\\DeepLearning-Posts\\part3")

# Create clusters of 2 classes
X1, Y1 = make_blobs(n_samples = 400, n_features = 2, centers = 9,
cluster_std = 1.3, random_state = 4)
#Create 2 classes
Y1=Y1.reshape(400,1)
Y1 = Y1 % 2
X2=X1.T
Y2=Y1.T
# Set the dimensions of DL Network
#  Below we have
#  2 - 2 input features
#  9,9 - 2 hidden layers with 9 activation units per layer and
#  1 - 1 sigmoid activation unit in the output layer as this is a binary classification
# The activation in the hidden layer is the 'relu' specified in L_Layer_DeepModel

layersDimensions = [2, 9, 9,1] #  4-layer model
parameters = L_Layer_DeepModel(X2, Y2, layersDimensions,hiddenActivationFunc='relu', learning_rate = 0.3,num_iterations = 2500, fig="fig1.png")
#Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X2,Y2,str(0.3),"fig2.png")

# Compute the confusion matrix
yhat = predict(parameters,X2)
from sklearn.metrics import confusion_matrix
a=confusion_matrix(Y2.T,yhat.T)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy: {:.2f}'.format(accuracy_score(Y2.T, yhat.T)))
print('Precision: {:.2f}'.format(precision_score(Y2.T, yhat.T)))
print('Recall: {:.2f}'.format(recall_score(Y2.T, yhat.T)))
print('F1: {:.2f}'.format(f1_score(Y2.T, yhat.T)))
## Accuracy: 0.90
## Precision: 0.91
## Recall: 0.87
## F1: 0.89

For more details on metrics like Accuracy, Recall, Precision etc. used in classification take a look at my post Practical Machine Learning with R and Python – Part 2. More details about these and other metrics besides implementation of the most common machine learning algorithms are available in my book My book ‘Practical Machine Learning with R and Python’ on Amazon

## 1b. Classification with Multi layer Deep Learning Network – Relu activation(R)

In the code below, binary classification is performed on the same data set as above using the Relu activation function. The DL network is same as above

library(ggplot2)
# Read the data
x <- z[,1:2]
y <- z[,3]
X1 <- t(x)
Y1 <- t(y)

# Set the dimensions of the Deep Learning network
# No of input features =2, 2 hidden layers with 9 activation units and 1 output layer
layersDimensions = c(2, 9, 9,1)
# Execute the Deep Learning Neural Network
retvals = L_Layer_DeepModel(X1, Y1, layersDimensions,
hiddenActivationFunc='relu',
learningRate = 0.3,
numIterations = 5000,
print_cost = True)
library(ggplot2)
source("DLfunctions33.R")
# Get the computed costs
costs <- retvals[['costs']]
# Create a sequence of iterations
numIterations=5000
iterations <- seq(0,numIterations,by=1000)
df <-data.frame(iterations,costs)
# Plot the Costs vs number of iterations
ggplot(df,aes(x=iterations,y=costs)) + geom_point() +geom_line(color="blue") +
xlab('No of iterations') + ylab('Cost') + ggtitle("Cost vs No of iterations")

# Plot the decision boundary
plotDecisionBoundary(z,retvals,hiddenActivationFunc="relu",0.3)

library(caret)
# Predict the output for the data values
yhat <-predict(retvals$parameters,X1,hiddenActivationFunc="relu") yhat[yhat==FALSE]=0 yhat[yhat==TRUE]=1 # Compute the confusion matrix confusionMatrix(yhat,Y1) ## Confusion Matrix and Statistics ## ## Reference ## Prediction 0 1 ## 0 201 10 ## 1 21 168 ## ## Accuracy : 0.9225 ## 95% CI : (0.8918, 0.9467) ## No Information Rate : 0.555 ## P-Value [Acc > NIR] : < 2e-16 ## ## Kappa : 0.8441 ## Mcnemar's Test P-Value : 0.07249 ## ## Sensitivity : 0.9054 ## Specificity : 0.9438 ## Pos Pred Value : 0.9526 ## Neg Pred Value : 0.8889 ## Prevalence : 0.5550 ## Detection Rate : 0.5025 ## Detection Prevalence : 0.5275 ## Balanced Accuracy : 0.9246 ## ## 'Positive' Class : 0 ##  ## 1c. Classification with Multi layer Deep Learning Network – Relu activation(Octave) Included below is the code for performing classification. Incidentally Octave does not seem to have implemented the confusion matrix, but confusionmat is available in Matlab. # Read the data data=csvread("data.csv"); X=data(:,1:2); Y=data(:,3); # Set layer dimensions layersDimensions = [2 9 7 1] #tanh=-0.5(ok), #relu=0.1 best! # Execute Deep Network [weights biases costs]=L_Layer_DeepModel(X', Y', layersDimensions, hiddenActivationFunc='relu', learningRate = 0.1, numIterations = 10000); plotCostVsIterations(10000,costs); plotDecisionBoundary(data,weights, biases,hiddenActivationFunc="tanh")  ## 2a. Classification with Multi layer Deep Learning Network – Tanh activation(Python) Below the Tanh activation function is used to perform the same classification. I found the Tanh activation required a simpler Neural Network of 3 layers. # Tanh activation import os import numpy as np import matplotlib.pyplot as plt import matplotlib.colors import sklearn.linear_model from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification, make_blobs from matplotlib.colors import ListedColormap import sklearn import sklearn.datasets #from DLfunctions import plot_decision_boundary os.chdir("C:\\software\\DeepLearning-Posts\\part3") execfile("./DLfunctions34.py") # Create the dataset X1, Y1 = make_blobs(n_samples = 400, n_features = 2, centers = 9, cluster_std = 1.3, random_state = 4) #Create 2 classes Y1=Y1.reshape(400,1) Y1 = Y1 % 2 X2=X1.T Y2=Y1.T # Set the dimensions of the Neural Network layersDimensions = [2, 4, 1] # 3-layer model # Compute the DL network parameters = L_Layer_DeepModel(X2, Y2, layersDimensions, hiddenActivationFunc='tanh', learning_rate = .5,num_iterations = 2500,fig="fig3.png") #Plot the decision boundary plot_decision_boundary(lambda x: predict(parameters, x.T), X2,Y2,str(0.5),"fig4.png")  ## 2b. Classification with Multi layer Deep Learning Network – Tanh activation(R) R performs better with a Tanh activation than the Relu as can be seen below  #Set the dimensions of the Neural Network layersDimensions = c(2, 9, 9,1) library(ggplot2) # Read the data z <- as.matrix(read.csv("data.csv",header=FALSE)) x <- z[,1:2] y <- z[,3] X1 <- t(x) Y1 <- t(y) # Execute the Deep Model retvals = L_Layer_DeepModel(X1, Y1, layersDimensions, hiddenActivationFunc='tanh', learningRate = 0.3, numIterations = 5000, print_cost = True) # Get the costs costs <- retvals[['costs']] iterations <- seq(0,numIterations,by=1000) df <-data.frame(iterations,costs) # Plot Cost vs number of iterations ggplot(df,aes(x=iterations,y=costs)) + geom_point() +geom_line(color="blue") + xlab('No of iterations') + ylab('Cost') + ggtitle("Cost vs No of iterations") #Plot the decision boundary plotDecisionBoundary(z,retvals,hiddenActivationFunc="tanh",0.3) ## 2c. Classification with Multi layer Deep Learning Network – Tanh activation(Octave) The code below uses the Tanh activation in the hidden layers for Octave # Read the data data=csvread("data.csv"); X=data(:,1:2); Y=data(:,3); # Set layer dimensions layersDimensions = [2 9 7 1] #tanh=-0.5(ok), #relu=0.1 best! # Execute Deep Network [weights biases costs]=L_Layer_DeepModel(X', Y', layersDimensions, hiddenActivationFunc='tanh', learningRate = 0.1, numIterations = 10000); plotCostVsIterations(10000,costs); plotDecisionBoundary(data,weights, biases,hiddenActivationFunc="tanh")  ## 3. Bernoulli’s Lemniscate To make things more interesting, I create a 2D figure of the Bernoulli’s lemniscate to perform non-linear classification. The Lemniscate is given by the equation $(x^{2} + y^{2})^{2}$ = $2a^{2}*(x^{2}-y^{2})$ ## 3a. Classifying a lemniscate with Deep Learning Network – Relu activation(Python) import os import numpy as np import matplotlib.pyplot as plt os.chdir("C:\\software\\DeepLearning-Posts\\part3") execfile("./DLfunctions33.py") x1=np.random.uniform(0,10,2000).reshape(2000,1) x2=np.random.uniform(0,10,2000).reshape(2000,1) X=np.append(x1,x2,axis=1) X.shape # Create a subset of values where squared is <0,4. Perform ravel() to flatten this vector # Create the equation # (x^{2} + y^{2})^2 - 2a^2*(x^{2}-y^{2}) <= 0 a=np.power(np.power(X[:,0]-5,2) + np.power(X[:,1]-5,2),2) b=np.power(X[:,0]-5,2) - np.power(X[:,1]-5,2) c= a - (b*np.power(4,2)) <=0 Y=c.reshape(2000,1) # Create a scatter plot of the lemniscate plt.scatter(X[:,0], X[:,1], c=Y, marker= 'o', s=15,cmap="viridis") Z=np.append(X,Y,axis=1) plt.savefig("fig50.png",bbox_inches='tight') plt.clf() # Set the data for classification X2=X.T Y2=Y.T # These settings work the best # Set the Deep Learning layer dimensions for a Relu activation layersDimensions = [2,7,4,1] #Execute the DL network parameters = L_Layer_DeepModel(X2, Y2, layersDimensions, hiddenActivationFunc='relu', learning_rate = 0.5,num_iterations = 10000, fig="fig5.png") #Plot the decision boundary plot_decision_boundary(lambda x: predict(parameters, x.T), X2, Y2,str(2.2),"fig6.png") # Compute the Confusion matrix yhat = predict(parameters,X2) from sklearn.metrics import confusion_matrix a=confusion_matrix(Y2.T,yhat.T) from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score print('Accuracy: {:.2f}'.format(accuracy_score(Y2.T, yhat.T))) print('Precision: {:.2f}'.format(precision_score(Y2.T, yhat.T))) print('Recall: {:.2f}'.format(recall_score(Y2.T, yhat.T))) print('F1: {:.2f}'.format(f1_score(Y2.T, yhat.T))) ## Accuracy: 0.93 ## Precision: 0.77 ## Recall: 0.76 ## F1: 0.76 We could get better performance by tuning further. Do play around if you fork the code. Note:: The lemniscate data is saved as a CSV and then read in R and also in Octave. I do this instead of recreating the lemniscate shape ## 3b. Classifying a lemniscate with Deep Learning Network – Relu activation(R code) The R decision boundary for the Bernoulli’s lemniscate is shown below Z <- as.matrix(read.csv("lemniscate.csv",header=FALSE)) Z1=data.frame(Z) # Create a scatter plot of the lemniscate ggplot(Z1,aes(x=V1,y=V2,col=V3)) +geom_point() #Set the data for the DL network X=Z[,1:2] Y=Z[,3] X1=t(X) Y1=t(Y) # Set the layer dimensions for the tanh activation function layersDimensions = c(2,5,4,1) # Execute the Deep Learning network with Tanh activation retvals = L_Layer_DeepModel(X1, Y1, layersDimensions, hiddenActivationFunc='tanh', learningRate = 0.3, numIterations = 20000, print_cost = True) # Plot cost vs iteration costs <- retvals[['costs']] numIterations = 20000 iterations <- seq(0,numIterations,by=1000) df <-data.frame(iterations,costs) ggplot(df,aes(x=iterations,y=costs)) + geom_point() +geom_line(color="blue") + xlab('No of iterations') + ylab('Cost') + ggtitle("Cost vs No of iterations") #Plot the decision boundary plotDecisionBoundary(Z,retvals,hiddenActivationFunc="tanh",0.3) ## 3c. Classifying a lemniscate with Deep Learning Network – Relu activation(Octave code) Octave is used to generate the non-linear lemniscate boundary.  # Read the data data=csvread("lemniscate.csv"); X=data(:,1:2); Y=data(:,3); # Set the dimensions of the layers layersDimensions = [2 9 7 1] # Compute the DL network [weights biases costs]=L_Layer_DeepModel(X', Y', layersDimensions, hiddenActivationFunc='relu', learningRate = 0.20, numIterations = 10000); plotCostVsIterations(10000,costs); plotDecisionBoundary(data,weights, biases,hiddenActivationFunc="relu")  ## 4a. Binary Classification using MNIST – Python code Finally I perform a simple classification using the MNIST handwritten digits, which according to Prof Geoffrey Hinton is “the Drosophila of Deep Learning”. The Python code for reading the MNIST data is taken from Alex Kesling’s github link MNIST. In the Python code below, I perform a simple binary classification between the handwritten digit ‘5’ and ‘not 5’ which is all other digits. I will perform the proper classification of all digits using the Softmax classifier some time later. import os import numpy as np import matplotlib.pyplot as plt os.chdir("C:\\software\\DeepLearning-Posts\\part3") execfile("./DLfunctions34.py") execfile("./load_mnist.py") training=list(read(dataset='training',path="./mnist")) test=list(read(dataset='testing',path="./mnist")) lbls=[] pxls=[] print(len(training)) # Select the first 10000 training data and the labels for i in range(10000): l,p=training[i] lbls.append(l) pxls.append(p) labels= np.array(lbls) pixels=np.array(pxls) # Sey y=1 when labels == 5 and 0 otherwise y=(labels==5).reshape(-1,1) X=pixels.reshape(pixels.shape[0],-1) # Create the necessary feature and target variable X1=X.T Y1=y.T # Create the layer dimensions. The number of features are 28 x 28 = 784 since the 28 x 28 # pixels is flattened to single vector of length 784. layersDimensions=[784, 15,9,7,1] # Works very well parameters = L_Layer_DeepModel(X1, Y1, layersDimensions, hiddenActivationFunc='relu', learning_rate = 0.1,num_iterations = 1000, fig="fig7.png") # Test data lbls1=[] pxls1=[] for i in range(800): l,p=test[i] lbls1.append(l) pxls1.append(p) testLabels=np.array(lbls1) testData=np.array(pxls1) ytest=(testLabels==5).reshape(-1,1) Xtest=testData.reshape(testData.shape[0],-1) Xtest1=Xtest.T Ytest1=ytest.T yhat = predict(parameters,Xtest1) from sklearn.metrics import confusion_matrix a=confusion_matrix(Ytest1.T,yhat.T) from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score print('Accuracy: {:.2f}'.format(accuracy_score(Ytest1.T, yhat.T))) print('Precision: {:.2f}'.format(precision_score(Ytest1.T, yhat.T))) print('Recall: {:.2f}'.format(recall_score(Ytest1.T, yhat.T))) print('F1: {:.2f}'.format(f1_score(Ytest1.T, yhat.T))) probs=predict_proba(parameters,Xtest1) from sklearn.metrics import precision_recall_curve precision, recall, thresholds = precision_recall_curve(Ytest1.T, probs.T) closest_zero = np.argmin(np.abs(thresholds)) closest_zero_p = precision[closest_zero] closest_zero_r = recall[closest_zero] plt.xlim([0.0, 1.01]) plt.ylim([0.0, 1.01]) plt.plot(precision, recall, label='Precision-Recall Curve') plt.plot(closest_zero_p, closest_zero_r, 'o', markersize = 12, fillstyle = 'none', c='r', mew=3) plt.xlabel('Precision', fontsize=16) plt.ylabel('Recall', fontsize=16) plt.savefig("fig8.png",bbox_inches='tight')   ## Accuracy: 0.99 ## Precision: 0.96 ## Recall: 0.89 ## F1: 0.92 In addition to plotting the Cost vs Iterations, I also plot the Precision-Recall curve to show how the Precision and Recall, which are complementary to each other vary with respect to the other. To know more about Precision-Recall, please check my post Practical Machine Learning with R and Python – Part 4. Check out my compact and minimal book “Practical Machine Learning with R and Python:Second edition- Machine Learning in stereo” available in Amazon in paperback($10.99) and kindle($7.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!! A physical copy of the book is much better than scrolling down a webpage. Personally, I tend to use my own book quite frequently to refer to R, Python constructs, subsetting, machine Learning function calls and the necessary parameters etc. It is useless to commit any of this to memory, and a physical copy of a book is much easier to thumb through for the relevant code snippet. Pick up your copy today! ## 4b. Binary Classification using MNIST – R code In the R code below the same binary classification of the digit ‘5’ and the ‘not 5’ is performed. The code to read and display the MNIST data is taken from Brendan O’ Connor’s github link at MNIST source("mnist.R") load_mnist() #show_digit(train$x[2,]
layersDimensions=c(784, 7,7,3,1) # Works at 1500
x <- t(train$x) # Choose only 5000 training data x2 <- x[,1:5000] y <-train$y
# Set labels for all digits that are 'not 5' to 0
y[y!=5] <- 0
# Set labels of digit 5 as 1
y[y==5] <- 1
# Set the data
y1 <- as.matrix(y)
y2 <- t(y1)
# Choose the 1st 5000 data
y3 <- y2[,1:5000]

#Execute the Deep Learning Model
retvals = L_Layer_DeepModel(x2, y3, layersDimensions,
hiddenActivationFunc='tanh',
learningRate = 0.3,
numIterations = 3000, print_cost = True)
# Plot cost vs iteration
costs <- retvals[['costs']]
numIterations = 3000
iterations <- seq(0,numIterations,by=1000)
df <-data.frame(iterations,costs)
ggplot(df,aes(x=iterations,y=costs)) + geom_point() +geom_line(color="blue") +
xlab('No of iterations') + ylab('Cost') + ggtitle("Cost vs No of iterations")

# Compute probability scores
scores <- computeScores(retvals$parameters, x2,hiddenActivationFunc='relu') a=y3==1 b=y3==0 # Compute probabilities of class 0 and class 1 class1=scores[a] class0=scores[b] # Plot ROC curve pr <-pr.curve(scores.class0=class1, scores.class1=class0, curve=T) plot(pr) The AUC curve hugs the top left corner and hence the performance of the classifier is quite good. ## 4c. Binary Classification using MNIST – Octave code This code to load MNIST data was taken from Daniel E blog. Precision recall curves are available in Matlab but are yet to be implemented in Octave’s statistics package.  load('./mnist/mnist.txt.gz'); % load the dataset # Subset the 'not 5' digits a=(trainY != 5); # Subset '5' b=(trainY == 5); #make a copy of trainY #Set 'not 5' as 0 and '5' as 1 y=trainY; y(a)=0; y(b)=1; X=trainX(1:5000,:); Y=y(1:5000); # Set the dimensions of layer layersDimensions=[784, 7,7,3,1]; # Compute the DL network [weights biases costs]=L_Layer_DeepModel(X', Y', layersDimensions, hiddenActivationFunc='relu', learningRate = 0.1, numIterations = 5000);  # Conclusion It was quite a challenge coding a Deep Learning Network in Python, R and Octave. The Deep Learning network implementation, in this post,is the base Deep Learning network, without any of the regularization methods included. Here are some key learning that I got while playing with different multi-layer networks on different problems a. Deep Learning Networks come with many levers, the hyper-parameters, – learning rate – activation unit – number of hidden layers – number of units per hidden layer – number of iterations while performing gradient descent b. Deep Networks are very sensitive. A change in any of the hyper-parameter makes it perform very differently c. Initially I thought adding more hidden layers, or more units per hidden layer will make the DL network better at learning. On the contrary, there is a performance degradation after the optimal DL configuration d. At a sub-optimal number of hidden layers or number of hidden units, gradient descent seems to get stuck at a local minima e. There were occasions when the cost came down, only to increase slowly as the number of iterations were increased. Probably early stopping would have helped. f. I also did come across situations of ‘exploding/vanishing gradient’, cost went to Inf/-Inf. Here I would think inclusion of ‘momentum method’ would have helped I intend to add the additional hyper-parameters of L1, L2 regularization, momentum method, early stopping etc. into the code in my future posts. Feel free to fork/clone the code from Github Deep Learning – Part 3, and take the DL network apart and play around with it. I will be continuing this series with more hyper-parameters to handle vanishing and exploding gradients, early stopping and regularization in the weeks to come. I also intend to add some more activation functions to this basic Multi-Layer Network. Hang around, there are more exciting things to come. Watch this space! To see all posts see Index of posts # R vs Python: Different similarities and similar differences A debate about which language is better suited for Datascience, R or Python, can set off diehard fans of these languages into a tizzy. This post tries to look at some of the different similarities and similar differences between these languages. To a large extent the ease or difficulty in learning R or Python is subjective. I have heard that R has a steeper learning curve than Python and also vice versa. This probably depends on the degree of familiarity with the languuge To a large extent both R an Python do the same thing in just slightly different ways and syntaxes. The ease or the difficulty in the R/Python construct’s largely is in the ‘eyes of the beholder’ nay, programmer’ we could say. I include my own experience with the languages below. Check out my compact and minimal book “Practical Machine Learning with R and Python:Third edition- Machine Learning in stereo” available in Amazon in paperback($12.99) and kindle($8.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!! ### 1. R data types R has the following data types 1. Character 2. Integer 3. Numeric 4. Logical 5. Complex 6. Raw Python has several data types 1. Int 2. float 3. Long 4. Complex and so on ### 2. R Vector vs Python List A common data type in R is the vector. Python has a similar data type, the list # R vectors a<-c(4,5,1,3,4,5) print(a[3]) ## [1] 1 print(a[3:4]) # R does not always need the explicit print.  ## [1] 1 3 #R type of variable print(class(a)) ## [1] "numeric" # Length of a print(length(a)) ## [1] 6 # Python lists a=[4,5,1,3,4,5] # print(a[2]) # Some python IDEs require the explicit print print(a[2:5]) print(type(a)) # Length of a print(len(a)) ## 1 ## [1, 3, 4] ## ## 6 ### 2a. Other data types – Python Python also has certain other data types like the tuple, dictionary etc as shown below. R does not have as many of the data types, nevertheless we can do everything that Python does in R # Python tuple b = (4,5,7,8) print(b) #Python dictionary c={'name':'Ganesh','age':54,'Work':'Professional'} print(c) #Print type of variable c  ## (4, 5, 7, 8) ## {'name': 'Ganesh', 'age': 54, 'Work': 'Professional'} ### 2.Type of Variable To know the type of the variable in R we use ‘class’, In Python the corresponding command is ‘type’ #R - Type of variable a<-c(4,5,1,3,4,5) print(class(a)) ## [1] "numeric" #Python - Print type of tuple a a=[4,5,1,3,4,5] print(type(a)) b=(4,3,"the",2) print(type(b)) ## ##  ### 3. Length To know length in R, use length() #R - Length of vector # Length of a a<-c(4,5,1,3,4,5) print(length(a)) ## [1] 6 To know the length of a list,tuple or dict we can use len() # Python - Length of list , tuple etc # Length of a a=[4,5,1,3,4,5] print(len(a)) # Length of b b = (4,5,7,8) print(len(b))  ## 6 ## 4 ### 4. Accessing help To access help in R we use the ‘?’ or the ‘help’ function #R - Help - To be done in R console or RStudio #?sapply #help(sapply) Help in python on any topic involves #Python help - This can be done on a (I)Python console #help(len) #?len ### 5. Subsetting The key difference between R and Python with regards to subsetting is that in R the index starts at 1. In Python it starts at 0, much like C,C++ or Java To subset a vector in R we use #R - Subset a<-c(4,5,1,3,4,8,12,18,1) print(a[3]) ## [1] 1 # To print a range or a slice. Print from the 3rd to the 5th element print(a[3:6]) ## [1] 1 3 4 8 Python also uses indices. The difference in Python is that the index starts from 0/ #Python - Subset a=[4,5,1,3,4,8,12,18,1] # Print the 4th element (starts from 0) print(a[3]) # Print a slice from 4 to 6th element print(a[3:6]) ## 3 ## [3, 4, 8] ### 6. Operations on vectors in R and operation on lists in Python In R we can do many operations on vectors for e.g. element by element addition, subtraction, exponentation,product etc. as show #R - Operations on vectors a<- c(5,2,3,1,7) b<- c(1,5,4,6,8) #Element wise Addition print(a+b) ## [1] 6 7 7 7 15 #Element wise subtraction print(a-b) ## [1] 4 -3 -1 -5 -1 #Element wise product print(a*b) ## [1] 5 10 12 6 56 # Exponentiating the elements of a vector print(a^2) ## [1] 25 4 9 1 49 In Python to do this on lists we need to use the ‘map’ and the ‘lambda’ function as follows # Python - Operations on list a =[5,2,3,1,7] b =[1,5,4,6,8] #Element wise addition with map & lambda print(list(map(lambda x,y: x+y,a,b))) #Element wise subtraction print(list(map(lambda x,y: x-y,a,b))) #Element wise product print(list(map(lambda x,y: x*y,a,b))) # Exponentiating the elements of a list print(list(map(lambda x: x**2,a)))  ## [6, 7, 7, 7, 15] ## [4, -3, -1, -5, -1] ## [5, 10, 12, 6, 56] ## [25, 4, 9, 1, 49] However if we create ndarrays from lists then we can do the element wise addition,subtraction,product, etc. like R. Numpy is really a powerful module with many, many functions for matrix manipulations import numpy as np a =[5,2,3,1,7] b =[1,5,4,6,8] a=np.array(a) b=np.array(b) #Element wise addition print(a+b) #Element wise subtraction print(a-b) #Element wise product print(a*b) # Exponentiating the elements of a list print(a**2)  ## [ 6 7 7 7 15] ## [ 4 -3 -1 -5 -1] ## [ 5 10 12 6 56] ## [25 4 9 1 49] ### 7. Getting the index of element To determine the index of an element which satisifies a specific logical condition in R use ‘which’. In the code below the index of element which is equal to 1 is 4 # R - Which a<- c(5,2,3,1,7) print(which(a == 1)) ## [1] 4 In Python array we can use np.where to get the same effect. The index will be 3 as the index starts from 0 # Python - np.where import numpy as np a =[5,2,3,1,7] a=np.array(a) print(np.where(a==1)) ## (array([3], dtype=int64),) ### 8. Data frames R, by default comes with a set of in-built datasets. There are some datasets which come with the SkiKit- Learn package # R # To check built datasets use #data() - In R console or in R Studio #iris - Don't print to console We can use the in-built data sets that come with Scikit package #Python import sklearn as sklearn import pandas as pd from sklearn import datasets # This creates a Sklearn bunch data = datasets.load_iris() # Convert to Pandas dataframe iris = pd.DataFrame(data.data, columns=data.feature_names) ### 9. Working with dataframes With R you can work with dataframes directly. For more complex dataframe operations in R there are convenient packages like dplyr, reshape2 etc. For Python we need to use the Pandas package. Pandas is quite comprehensive in the list of things we can do with data frames The most common operations on a dataframe are • Check the size of the dataframe • Take a look at the top 5 or bottom 5 rows of dataframe • Check the content of the dataframe #### a.Size In R use dim() #R - Size dim(iris) ## [1] 150 5 For Python use .shape #Python - size import sklearn as sklearn import pandas as pd from sklearn import datasets data = datasets.load_iris() # Convert to Pandas dataframe iris = pd.DataFrame(data.data, columns=data.feature_names) iris.shape #### b. Top & bottom 5 rows of dataframe To know the top and bottom rows of a data frame we use head() & tail as shown below for R and Python #R head(iris,5) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa tail(iris,5) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 146 6.7 3.0 5.2 2.3 virginica ## 147 6.3 2.5 5.0 1.9 virginica ## 148 6.5 3.0 5.2 2.0 virginica ## 149 6.2 3.4 5.4 2.3 virginica ## 150 5.9 3.0 5.1 1.8 virginica #Python import sklearn as sklearn import pandas as pd from sklearn import datasets data = datasets.load_iris() # Convert to Pandas dataframe iris = pd.DataFrame(data.data, columns=data.feature_names) print(iris.head(5)) print(iris.tail(5)) ## sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) ## 0 5.1 3.5 1.4 0.2 ## 1 4.9 3.0 1.4 0.2 ## 2 4.7 3.2 1.3 0.2 ## 3 4.6 3.1 1.5 0.2 ## 4 5.0 3.6 1.4 0.2 ## sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) ## 145 6.7 3.0 5.2 2.3 ## 146 6.3 2.5 5.0 1.9 ## 147 6.5 3.0 5.2 2.0 ## 148 6.2 3.4 5.4 2.3 ## 149 5.9 3.0 5.1 1.8 #### c. Check the content of the dataframe #R summary(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 ## Median :5.800 Median :3.000 Median :4.350 Median :1.300 ## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 ## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 ## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ##  str(iris) ## 'data.frame': 150 obs. of 5 variables: ##$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ##$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ##$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#Python
import sklearn as sklearn
import pandas as pd
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
print(iris.info())
##
## RangeIndex: 150 entries, 0 to 149
## Data columns (total 4 columns):
## sepal length (cm)    150 non-null float64
## sepal width (cm)     150 non-null float64
## petal length (cm)    150 non-null float64
## petal width (cm)     150 non-null float64
## dtypes: float64(4)
## memory usage: 4.8 KB
## None

#### d. Check column names

#R
names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"
## [5] "Species"
colnames(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"
## [5] "Species"
#Python
import sklearn as sklearn
import pandas as pd
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
#Get column names
print(iris.columns)
## Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
##        'petal width (cm)'],
##       dtype='object')

#### e. Rename columns

In R we can assign a vector to column names

#R
colnames(iris) <- c("lengthOfSepal","widthOfSepal","lengthOfPetal","widthOfPetal","Species")
colnames(iris)
## [1] "lengthOfSepal" "widthOfSepal"  "lengthOfPetal" "widthOfPetal"
## [5] "Species"

In Python we can assign a list to s.columns

#Python
import sklearn as sklearn
import pandas as pd
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
iris.columns = ["lengthOfSepal","widthOfSepal","lengthOfPetal","widthOfPetal"]
print(iris.columns)
## Index(['lengthOfSepal', 'widthOfSepal', 'lengthOfPetal', 'widthOfPetal'], dtype='object')

### f.Details of dataframe

#Python
import sklearn as sklearn
import pandas as pd
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
print(iris.info())
##
## RangeIndex: 150 entries, 0 to 149
## Data columns (total 4 columns):
## sepal length (cm)    150 non-null float64
## sepal width (cm)     150 non-null float64
## petal length (cm)    150 non-null float64
## petal width (cm)     150 non-null float64
## dtypes: float64(4)
## memory usage: 4.8 KB
## None

#### g. Subsetting dataframes

# R
#To subset a dataframe 'df' in R we use df[row,column] or df[row vector,column vector]
#df[row,column]
iris[3,4]
## [1] 0.2
#df[row vector, column vector]
iris[2:5,1:3]
##   lengthOfSepal widthOfSepal lengthOfPetal
## 2           4.9          3.0           1.4
## 3           4.7          3.2           1.3
## 4           4.6          3.1           1.5
## 5           5.0          3.6           1.4
#If we omit the row vector, then it implies all rows or if we omit the column vector
# then implies all columns for that row
iris[2:5,]
##   lengthOfSepal widthOfSepal lengthOfPetal widthOfPetal Species
## 2           4.9          3.0           1.4          0.2  setosa
## 3           4.7          3.2           1.3          0.2  setosa
## 4           4.6          3.1           1.5          0.2  setosa
## 5           5.0          3.6           1.4          0.2  setosa
# In R we can all specific columns by column names
iris$Sepal.Length[2:5] ## NULL #Python # To select an entire row we use .iloc. The index can be used with the ':'. If # .iloc[start row: end row]. If start row is omitted then it implies the beginning of # data frame, if end row is omitted then it implies all rows till end #Python import sklearn as sklearn import pandas as pd from sklearn import datasets data = datasets.load_iris() # Convert to Pandas dataframe iris = pd.DataFrame(data.data, columns=data.feature_names) print(iris.iloc[3]) print(iris[:5]) # In python we can select columns by column name as follows print(iris['sepal length (cm)'][2:6]) #If you want to select more than 2 columns then you must use the double '[[]]' since the # index is a list itself print(iris[['sepal length (cm)','sepal width (cm)']][4:7]) ## sepal length (cm) 4.6 ## sepal width (cm) 3.1 ## petal length (cm) 1.5 ## petal width (cm) 0.2 ## Name: 3, dtype: float64 ## sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) ## 0 5.1 3.5 1.4 0.2 ## 1 4.9 3.0 1.4 0.2 ## 2 4.7 3.2 1.3 0.2 ## 3 4.6 3.1 1.5 0.2 ## 4 5.0 3.6 1.4 0.2 ## 2 4.7 ## 3 4.6 ## 4 5.0 ## 5 5.4 ## Name: sepal length (cm), dtype: float64 ## sepal length (cm) sepal width (cm) ## 4 5.0 3.6 ## 5 5.4 3.9 ## 6 4.6 3.4 #### h. Computing Mean, Standard deviation #R #Mean mean(iris$lengthOfSepal)
## [1] 5.843333
#Standard deviation
sd(iris$widthOfSepal) ## [1] 0.4358663 #Python #Mean import sklearn as sklearn import pandas as pd from sklearn import datasets data = datasets.load_iris() # Convert to Pandas dataframe iris = pd.DataFrame(data.data, columns=data.feature_names) # Convert to Pandas dataframe print(iris['sepal length (cm)'].mean()) #Standard deviation print(iris['sepal width (cm)'].std()) ## 5.843333333333335 ## 0.4335943113621737 #### i. Boxplot Boxplot can be produced in R using baseplot #R boxplot(iris$lengthOfSepal)

Matplotlib is a popular package in Python for plots

#Python
import sklearn as sklearn
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
img=plt.boxplot(iris['sepal length (cm)'])
plt.show(img)

#### j.Scatter plot

#R
plot(iris$widthOfSepal,iris$lengthOfSepal)

#Python
import matplotlib.pyplot as plt
import sklearn as sklearn
import pandas as pd
from sklearn import datasets
# Convert to Pandas dataframe
iris = pd.DataFrame(data.data, columns=data.feature_names)
img=plt.scatter(iris['sepal width (cm)'],iris['sepal length (cm)'])
#plt.show(img)

#### k. Read from csv file

#R
tendulkar= read.csv("tendulkar.csv",stringsAsFactors = FALSE,na.strings=c(NA,"-"))
#Dimensions of dataframe
dim(tendulkar)
## [1] 347  13
names(tendulkar)
##  [1] "X"          "Runs"       "Mins"       "BF"         "X4s"
##  [6] "X6s"        "SR"         "Pos"        "Dismissal"  "Inns"
## [11] "Opposition" "Ground"     "Start.Date"

Use pandas.read_csv() for Python

#Python
import pandas as pd
print(tendulkar.shape)
print(tendulkar.columns)
## (347, 13)
## Index(['Unnamed: 0', 'Runs', 'Mins', 'BF', '4s', '6s', 'SR', 'Pos',
##        'Dismissal', 'Inns', 'Opposition', 'Ground', 'Start Date'],
##       dtype='object')

#### l. Clean the dataframe in R and Python.

The following steps are done for R and Python
1.Remove rows with ‘DNB’
2.Remove rows with ‘TDNB’
3.Remove rows with absent
4.Remove the “*” indicating not out
5.Remove incomplete rows with NA for R or NaN in Python
6.Do a scatter plot

#R
# Remove rows with 'DNB'
a <- tendulkar$Runs != "DNB" tendulkar <- tendulkar[a,] dim(tendulkar) ## [1] 330 13 # Remove rows with 'TDNB' b <- tendulkar$Runs != "TDNB"
tendulkar <- tendulkar[b,]

# Remove rows with absent
c <- tendulkar$Runs != "absent" tendulkar <- tendulkar[c,] dim(tendulkar) ## [1] 329 13 # Remove the "* indicating not out tendulkar$Runs <- as.numeric(gsub("\\*","",tendulkar$Runs)) dim(tendulkar) ## [1] 329 13 # Select only complete rows - complete.cases() c <- complete.cases(tendulkar) #Subset the rows which are complete tendulkar <- tendulkar[c,] dim(tendulkar) ## [1] 327 13 # Do some base plotting - Scatter plot plot(tendulkar$BF,tendulkar\$Runs)

#Python
import pandas as pd
import matplotlib.pyplot as plt
print(tendulkar.shape)
# Remove rows with 'DNB'
a=tendulkar.Runs !="DNB"
tendulkar=tendulkar[a]
print(tendulkar.shape)
# Remove rows with 'TDNB'
b=tendulkar.Runs !="TDNB"
tendulkar=tendulkar[b]
print(tendulkar.shape)
# Remove rows with absent
c= tendulkar.Runs != "absent"
tendulkar=tendulkar[c]
print(tendulkar.shape)
# Remove the "* indicating not out
tendulkar.Runs= tendulkar.Runs.str.replace(r"[*]","")
#Select only complete rows - dropna()
tendulkar=tendulkar.dropna()
print(tendulkar.shape)
tendulkar.Runs = tendulkar.Runs.astype(int)
tendulkar.BF = tendulkar.BF.astype(int)
#Scatter plot
plt.scatter(tendulkar.BF,tendulkar.Runs)
## (347, 13)
## (330, 13)
## (329, 13)
## (329, 13)
## (327, 13)

#### m.Chaining operations on dataframes

To chain a set of operations we need to use an R package like dplyr. Pandas does this The following operations are done on tendulkar data frame by dplyr for R and Pandas for Python below

1. Group by ground
2. Compute average runs in each ground
3. Arrange in descending order
#R
library(dplyr)
tendulkar1 <- tendulkar %>% group_by(Ground) %>% summarise(meanRuns= mean(Runs)) %>%
arrange(desc(meanRuns))
head(tendulkar1,10)
## # A tibble: 10 × 2
##           Ground  meanRuns
##
## 1         Multan 194.00000
## 2          Leeds 193.00000
## 3  Colombo (RPS) 143.00000
## 4        Lucknow 142.00000
## 5          Dhaka 132.75000
## 6     Manchester  93.50000
## 7         Sydney  87.22222
## 8   Bloemfontein  85.00000
## 9     Georgetown  81.00000
## 10 Colombo (SSC)  77.55556
#Python
import pandas as pd
print(tendulkar.shape)
# Remove rows with 'DNB'
a=tendulkar.Runs !="DNB"
tendulkar=tendulkar[a]
# Remove rows with 'TDNB'
b=tendulkar.Runs !="TDNB"
tendulkar=tendulkar[b]
# Remove rows with absent
c= tendulkar.Runs != "absent"
tendulkar=tendulkar[c]
# Remove the "* indicating not out
tendulkar.Runs= tendulkar.Runs.str.replace(r"[*]","")

#Select only complete rows - dropna()
tendulkar=tendulkar.dropna()
tendulkar.Runs = tendulkar.Runs.astype(int)
tendulkar.BF = tendulkar.BF.astype(int)
tendulkar1= tendulkar.groupby('Ground').mean()['Runs'].sort_values(ascending=False)
print(tendulkar1.head(10))
## (347, 13)
## Ground
## Multan           194.000000
## Leeds            193.000000
## Colombo (RPS)    143.000000
## Lucknow          142.000000
## Dhaka            132.750000
## Manchester        93.500000
## Sydney            87.222222
## Bloemfontein      85.000000
## Georgetown        81.000000
## Colombo (SSC)     77.555556
## Name: Runs, dtype: float64

### 9. Functions

product <- function(a,b){
c<- a*b
c
}
product(5,7)
## [1] 35
def product(a,b):
c = a*b
return c

print(product(5,7))

## 35



## Conclusion

Personally, I took to R, much like a ‘duck takes to water’. I found the R syntax very simple and mostly intuitive. R packages like dplyr, ggplot2, reshape2, make the language quite irrestible. R is weakly typed and has only numeric and character types as opposed to the full fledged data types in Python.

Python, has too many bells and whistles, which can be a little bewildering to the novice. It is possible that they may be useful as one becomes more experienced with the language. Also I found that installing Python packages sometimes gives errors with Python versions 2.7 or 3.6. This will leave you scrambling to google to find how to fix these problems. These can be quite frustrating. R on the other hand makes installing R packages a breeze.

Anyway, this is my current opinion, and like all opinions, may change in the course of time. Let’s see!

I may write a follow up post with more advanced features of R and Python. So do keep checking! Long live R! Viva la Python!

Note: This post was created using RStudio’s RMarkdown which allows you to embed R and Python code snippets. It works perfectly, except that matplotlib’s pyplot does not display.