# My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

The second edition of my book ‘Deep Learning from first principles:Second Edition- In vectorized Python, R and Octave’, is now available on Amazon, in both paperback ($18.99) and kindle ($9.99/Rs449/-)  versions. Since this book is almost 70% code, all functions, and code snippets have been formatted to use the fixed-width font ‘Lucida Console’. In addition line numbers have been added to all code snippets. This makes the code more organized and much more readable. I have also fixed typos in the book

The book includes the following chapters

Table of Contents
Preface 4
Introduction 6
1. Logistic Regression as a Neural Network 8
2. Implementing a simple Neural Network 23
3. Building a L- Layer Deep Learning Network 48
4. Deep Learning network with the Softmax 85
5. MNIST classification with Softmax 103
6. Initialization, regularization in Deep Learning 121
7. Gradient Descent Optimization techniques 167
8. Gradient Check in Deep Learning 197
1. Appendix A 214
2. Appendix 1 – Logistic Regression as a Neural Network 220
3. Appendix 2 - Implementing a simple Neural Network 227
4. Appendix 3 - Building a L- Layer Deep Learning Network 240
5. Appendix 4 - Deep Learning network with the Softmax 259
6. Appendix 5 - MNIST classification with Softmax 269
7. Appendix 6 - Initialization, regularization in Deep Learning 302
8. Appendix 7 - Gradient Descent Optimization techniques 344
9. Appendix 8 – Gradient Check 405
References 475

To see posts click Index of Posts

# My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon

Note: The 3rd edition of this book is now available My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon

The third edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback ($12.99) and kindle ($9.99/Rs449) versions.  This second edition includes more content,  extensive comments and formatting for better readability.

In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code.
1. Practical machine with R and Python: Third Edition – Machine Learning in Stereo(Paperback-$12.99) 2. Practical machine with R and Third Edition – Machine Learning in Stereo(Kindle-$9.99/Rs449)

This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Preface …………………………………………………………………………….4
Introduction ………………………………………………………………………6
1. Essential R ………………………………………………………………… 8
2. Essential Python for Datascience ……………………………………………57
3. R vs Python …………………………………………………………………81
4. Regression of a continuous variable ……………………………………….101
5. Classification and Cross Validation ………………………………………..121
6. Regression techniques and regularization ………………………………….146
7. SVMs, Decision Trees and Validation curves ………………………………191
8. Splines, GAMs, Random Forests and Boosting ……………………………222
9. PCA, K-Means and Hierarchical Clustering ………………………………258
References ……………………………………………………………………..269

Hope you have a great time learning as I did while implementing these algorithms!

# The 3rd paperback & kindle editions of my books on Cricket, now on Amazon

The 3rd  paperback & kindle edition of both my books on cricket is now available on Amazon

a) Cricket analytics with cricketr, Third Edition. The paperback edition is $12.99 and the kindle edition is$4.99/Rs320.  This book is based on my R package ‘cricketr‘, available on CRAN and uses ESPN Cricinfo Statsguru

b) Beaten by sheer pace! Cricket analytics with yorkr, 3rd edition . The paperback is $12.99 and the kindle version is$6.99/Rs448. This is based on my R package ‘yorkr‘ on CRAN and uses data from Cricsheet

Note: In the 3rd edition of  the paperback book, the charts will be in black and white. If you would like the charts to be in color, please check out the 2nd edition of these books see More book, more cricket! 2nd edition of my books now on Amazon

To see all posts see Index of posts

# Practical Machine Learning with R and Python – Part 4

This is the 4th installment of my ‘Practical Machine Learning with R and Python’ series. In this part I discuss classification with Support Vector Machines (SVMs), using both a Linear and a Radial basis kernel, and Decision Trees. Further, a closer look is taken at some of the metrics associated with binary classification, namely accuracy vs precision and recall. I also touch upon Validation curves, Precision-Recall, ROC curves and AUC with equivalent code in R and Python

This post is a continuation of my 3 earlier posts on Practical Machine Learning in R and Python
1. Practical Machine Learning with R and Python – Part 1
2. Practical Machine Learning with R and Python – Part 2
3. Practical Machine Learning with R and Python – Part 3

The RMarkdown file with the code and the associated data files can be downloaded from Github at MachineLearning-RandPython-Part4

1. Machine Learning in plain English-Part 1
2. Machine Learning in plain English-Part 2
3. Machine Learning in plain English-Part 3

Check out my compact and minimal book  “Practical Machine Learning with R and Python:Third edition- Machine Learning in stereo”  available in Amazon in paperback($12.99) and kindle($8.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!!

Support Vector Machines (SVM) are another useful Machine Learning model that can be used for both regression and classification problems. SVMs used in classification, compute the hyperplane, that separates the 2 classes with the maximum margin. To do this the features may be transformed into a larger multi-dimensional feature space. SVMs can be used with different kernels namely linear, polynomial or radial basis to determine the best fitting model for a given classification problem.

In the 2nd part of this series Practical Machine Learning with R and Python – Part 2, I had mentioned the various metrics that are used in classification ML problems namely Accuracy, Precision, Recall and F1 score. Accuracy gives the fraction of data that were correctly classified as belonging to the +ve or -ve class. However ‘accuracy’ in itself is not a good enough measure because it does not take into account the fraction of the data that were incorrectly classified. This issue becomes even more critical in different domains. For e.g a surgeon who would like to detect cancer, would like to err on the side of caution, and classify even a possibly non-cancerous patient as possibly having cancer, rather than mis-classifying a malignancy as benign. Here we would like to increase recall or sensitivity which is  given by Recall= TP/(TP+FN) or we try reduce mis-classification by either increasing the (true positives) TP or reducing (false negatives) FN

On the other hand, search algorithms would like to increase precision which tries to reduce the number of irrelevant results in the search result. Precision= TP/(TP+FP). In other words we do not want ‘false positives’ or irrelevant results to come in the search results and there is a need to reduce the false positives.

When we try to increase ‘precision’, we do so at the cost of ‘recall’, and vice-versa. I found this diagram and explanation in Wikipedia very useful Source: Wikipedia

“Consider a brain surgeon tasked with removing a cancerous tumor from a patient’s brain. The surgeon needs to remove all of the tumor cells since any remaining cancer cells will regenerate the tumor. Conversely, the surgeon must not remove healthy brain cells since that would leave the patient with impaired brain function. The surgeon may be more liberal in the area of the brain she removes to ensure she has extracted all the cancer cells. This decision increases recall but reduces precision. On the other hand, the surgeon may be more conservative in the brain she removes to ensure she extracts only cancer cells. This decision increases precision but reduces recall. That is to say, greater recall increases the chances of removing healthy cells (negative outcome) and increases the chances of removing all cancer cells (positive outcome). Greater precision decreases the chances of removing healthy cells (positive outcome) but also decreases the chances of removing all cancer cells (negative outcome).”

## 1.1a. Linear SVM – R code

In R code below I use SVM with linear kernel

source('RFunctions-1.R')
library(dplyr)
library(e1071)
library(caret)
library(reshape2)
library(ggplot2)
# Read data. Data from SKLearn
cancer$target <- as.factor(cancer$target)

# Split into training and test sets
train_idx <- trainTestSplit(cancer,trainPercent=75,seed=5)
train <- cancer[train_idx, ]
test <- cancer[-train_idx, ]

# Fit a linear basis kernel. DO not scale the data
svmfit=svm(target~., data=train, kernel="linear",scale=FALSE)
ypred=predict(svmfit,test)
#Print a confusion matrix
confusionMatrix(ypred,test$target) ## Confusion Matrix and Statistics ## ## Reference ## Prediction 0 1 ## 0 54 3 ## 1 3 82 ## ## Accuracy : 0.9577 ## 95% CI : (0.9103, 0.9843) ## No Information Rate : 0.5986 ## P-Value [Acc > NIR] : <2e-16 ## ## Kappa : 0.9121 ## Mcnemar's Test P-Value : 1 ## ## Sensitivity : 0.9474 ## Specificity : 0.9647 ## Pos Pred Value : 0.9474 ## Neg Pred Value : 0.9647 ## Prevalence : 0.4014 ## Detection Rate : 0.3803 ## Detection Prevalence : 0.4014 ## Balanced Accuracy : 0.9560 ## ## 'Positive' Class : 0 ##  ## 1.1b Linear SVM – Python code The code below creates a SVM with linear basis in Python and also dumps the corresponding classification metrics import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.svm import LinearSVC from sklearn.datasets import make_classification, make_blobs from sklearn.metrics import confusion_matrix from matplotlib.colors import ListedColormap from sklearn.datasets import load_breast_cancer # Load the cancer data (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0) clf = LinearSVC().fit(X_train, y_train) print('Breast cancer dataset') print('Accuracy of Linear SVC classifier on training set: {:.2f}' .format(clf.score(X_train, y_train))) print('Accuracy of Linear SVC classifier on test set: {:.2f}' .format(clf.score(X_test, y_test))) ## Breast cancer dataset ## Accuracy of Linear SVC classifier on training set: 0.92 ## Accuracy of Linear SVC classifier on test set: 0.94 ## 1.2 Dummy classifier Often when we perform classification tasks using any ML model namely logistic regression, SVM, neural networks etc. it is very useful to determine how well the ML model performs agains at dummy classifier. A dummy classifier uses some simple computation like frequency of majority class, instead of fitting and ML model. It is essential that our ML model does much better that the dummy classifier. This problem is even more important in imbalanced classes where we have only about 10% of +ve samples. If any ML model we create has a accuracy of about 0.90 then it is evident that our classifier is not doing any better than a dummy classsfier which can just take a majority count of this imbalanced class and also come up with 0.90. We need to be able to do better than that. In the examples below (1.3a & 1.3b) it can be seen that SVMs with ‘radial basis’ kernel with unnormalized data, for both R and Python, do not perform any better than the dummy classifier. ## 1.2a Dummy classifier – R code R does not seem to have an explicit dummy classifier. I created a simple dummy classifier that predicts the majority class. SKlearn in Python also includes other strategies like uniform, stratified etc. but this should be possible to create in R also. # Create a simple dummy classifier that computes the ratio of the majority class to the totla DummyClassifierAccuracy <- function(train,test,type="majority"){ if(type=="majority"){ count <- sum(train$target==1)/dim(train)[1]
}
count
}

cancer$target <- as.factor(cancer$target)

# Create training and test sets
train_idx <- trainTestSplit(cancer,trainPercent=75,seed=5)
train <- cancer[train_idx, ]
test <- cancer[-train_idx, ]

#Dummy classifier majority class
acc=DummyClassifierAccuracy(train,test)
sprintf("Accuracy is %f",acc)
## [1] "Accuracy is 0.638498"

## 1.2b Dummy classifier – Python code

This dummy classifier uses the majority class.

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn.metrics import confusion_matrix
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer,
random_state = 0)

# Negative class (0) is most frequent
dummy_majority = DummyClassifier(strategy = 'most_frequent').fit(X_train, y_train)
y_dummy_predictions = dummy_majority.predict(X_test)

print('Dummy classifier accuracy on test set: {:.2f}'
.format(dummy_majority.score(X_test, y_test)))

## Dummy classifier accuracy on test set: 0.63

## 1.3a – Radial SVM (un-normalized) – R code

SVMs perform better when the data is normalized or scaled. The 2 examples below show that SVM with radial basis kernel does not perform any better than the dummy classifier

library(dplyr)
library(e1071)
library(caret)
library(reshape2)
library(ggplot2)

train_idx <- trainTestSplit(cancer,trainPercent=75,seed=5)
train <- cancer[train_idx, ]
test <- cancer[-train_idx, ]
# Unnormalized data
ypred=predict(svmfit,test)
confusionMatrix(ypred,test$target) ## Confusion Matrix and Statistics ## ## Reference ## Prediction 0 1 ## 0 0 0 ## 1 57 85 ## ## Accuracy : 0.5986 ## 95% CI : (0.5131, 0.6799) ## No Information Rate : 0.5986 ## P-Value [Acc > NIR] : 0.5363 ## ## Kappa : 0 ## Mcnemar's Test P-Value : 1.195e-13 ## ## Sensitivity : 0.0000 ## Specificity : 1.0000 ## Pos Pred Value : NaN ## Neg Pred Value : 0.5986 ## Prevalence : 0.4014 ## Detection Rate : 0.0000 ## Detection Prevalence : 0.0000 ## Balanced Accuracy : 0.5000 ## ## 'Positive' Class : 0 ##  ## 1.4b – Radial SVM (un-normalized) – Python code import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.svm import SVC # Load the cancer data (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0) clf = SVC(C=10).fit(X_train, y_train) print('Breast cancer dataset (unnormalized features)') print('Accuracy of RBF-kernel SVC on training set: {:.2f}' .format(clf.score(X_train, y_train))) print('Accuracy of RBF-kernel SVC on test set: {:.2f}' .format(clf.score(X_test, y_test))) ## Breast cancer dataset (unnormalized features) ## Accuracy of RBF-kernel SVC on training set: 1.00 ## Accuracy of RBF-kernel SVC on test set: 0.63 ## 1.5a – Radial SVM (Normalized) -R Code The data is scaled (normalized ) before using the SVM model. The SVM model has 2 paramaters a) C – Large C (less regularization), more regularization b) gamma – Small gamma has larger decision boundary with more misclassfication, and larger gamma has tighter decision boundary The R code below computes the accuracy as the regularization paramater is changed trainingAccuracy <- NULL testAccuracy <- NULL C1 <- c(.01,.1, 1, 10, 20) for(i in C1){ svmfit=svm(target~., data=train, kernel="radial",cost=i,scale=TRUE) ypredTrain <-predict(svmfit,train) ypredTest=predict(svmfit,test) a <-confusionMatrix(ypredTrain,train$target)
b <-confusionMatrix(ypredTest,test$target) trainingAccuracy <-c(trainingAccuracy,a$overall[1])
testAccuracy <-c(testAccuracy,b$overall[1]) } print(trainingAccuracy) ## Accuracy Accuracy Accuracy Accuracy Accuracy ## 0.6384977 0.9671362 0.9906103 0.9976526 1.0000000 print(testAccuracy) ## Accuracy Accuracy Accuracy Accuracy Accuracy ## 0.5985915 0.9507042 0.9647887 0.9507042 0.9507042 a <-rbind(C1,as.numeric(trainingAccuracy),as.numeric(testAccuracy)) b <- data.frame(t(a)) names(b) <- c("C1","trainingAccuracy","testAccuracy") df <- melt(b,id="C1") ggplot(df) + geom_line(aes(x=C1, y=value, colour=variable),size=2) + xlab("C (SVC regularization)value") + ylab("Accuracy") + ggtitle("Training and test accuracy vs C(regularization)") ## 1.5b – Radial SVM (normalized) – Python The Radial basis kernel is used on normalized data for a range of ‘C’ values and the result is plotted. import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() # Load the cancer data (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0) X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) print('Breast cancer dataset (normalized with MinMax scaling)') trainingAccuracy=[] testAccuracy=[] for C1 in [.01,.1, 1, 10, 20]: clf = SVC(C=C1).fit(X_train_scaled, y_train) acctrain=clf.score(X_train_scaled, y_train) accTest=clf.score(X_test_scaled, y_test) trainingAccuracy.append(acctrain) testAccuracy.append(accTest) # Create a dataframe C1=[.01,.1, 1, 10, 20] trainingAccuracy=pd.DataFrame(trainingAccuracy,index=C1) testAccuracy=pd.DataFrame(testAccuracy,index=C1) # Plot training and test R squared as a function of alpha df=pd.concat([trainingAccuracy,testAccuracy],axis=1) df.columns=['trainingAccuracy','trainingAccuracy'] fig1=df.plot() fig1=plt.title('Training and test accuracy vs C (SVC)') fig1.figure.savefig('fig1.png', bbox_inches='tight') ## Breast cancer dataset (normalized with MinMax scaling) Output image: ## 1.6a Validation curve – R code Sklearn includes code creating validation curves by varying paramaters and computing and plotting accuracy as gamma or C or changd. I did not find this R but I think this is a useful function and so I have created the R equivalent of this. # The R equivalent of np.logspace seqLogSpace <- function(start,stop,len){ a=seq(log10(10^start),log10(10^stop),length=len) 10^a } # Read the data. This is taken the SKlearn cancer data cancer <- read.csv("cancer.csv") cancer$target <- as.factor(cancer$target) set.seed(6) # Create the range of C1 in log space param_range = seqLogSpace(-3,2,20) # Initialize the overall training and test accuracy to NULL overallTrainAccuracy <- NULL overallTestAccuracy <- NULL # Loop over the parameter range of Gamma for(i in param_range){ # Set no of folds noFolds=5 # Create the rows which fall into different folds from 1..noFolds folds = sample(1:noFolds, nrow(cancer), replace=TRUE) # Initialize the training and test accuracy of folds to 0 trainingAccuracy <- 0 testAccuracy <- 0 # Loop through the folds for(j in 1:noFolds){ # The training is all rows for which the row is != j (k-1 folds -> training) train <- cancer[folds!=j,] # The rows which have j as the index become the test set test <- cancer[folds==j,] # Create a SVM model for this svmfit=svm(target~., data=train, kernel="radial",gamma=i,scale=TRUE) # Add up all the fold accuracy for training and test separately ypredTrain <-predict(svmfit,train) ypredTest=predict(svmfit,test) # Create confusion matrix a <-confusionMatrix(ypredTrain,train$target)
b <-confusionMatrix(ypredTest,test$target) # Get the accuracy trainingAccuracy <-trainingAccuracy + a$overall[1]
testAccuracy <-testAccuracy+b$overall[1] } # Compute the average of accuracy for K folds for number of features 'i' overallTrainAccuracy=c(overallTrainAccuracy,trainingAccuracy/noFolds) overallTestAccuracy=c(overallTestAccuracy,testAccuracy/noFolds) } #Create a dataframe a <- rbind(param_range,as.numeric(overallTrainAccuracy), as.numeric(overallTestAccuracy)) b <- data.frame(t(a)) names(b) <- c("C1","trainingAccuracy","testAccuracy") df <- melt(b,id="C1") #Plot in log axis ggplot(df) + geom_line(aes(x=C1, y=value, colour=variable),size=2) + xlab("C (SVC regularization)value") + ylab("Accuracy") + ggtitle("Training and test accuracy vs C(regularization)") + scale_x_log10() ## 1.6b Validation curve – Python Compute and plot the validation curve as gamma is varied. import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC from sklearn.model_selection import validation_curve # Load the cancer data (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X_cancer) # Create a gamma values from 10^-3 to 10^2 with 20 equally spaced intervals param_range = np.logspace(-3, 2, 20) # Compute the validation curve train_scores, test_scores = validation_curve(SVC(), X_scaled, y_cancer, param_name='gamma', param_range=param_range, cv=10) #Plot the figure fig2=plt.figure() #Compute the mean train_scores_mean = np.mean(train_scores, axis=1) train_scores_std = np.std(train_scores, axis=1) test_scores_mean = np.mean(test_scores, axis=1) test_scores_std = np.std(test_scores, axis=1) fig2=plt.title('Validation Curve with SVM') fig2=plt.xlabel('$\gamma$(gamma)') fig2=plt.ylabel('Score') fig2=plt.ylim(0.0, 1.1) lw = 2 fig2=plt.semilogx(param_range, train_scores_mean, label='Training score', color='darkorange', lw=lw) fig2=plt.fill_between(param_range, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, alpha=0.2, color='darkorange', lw=lw) fig2=plt.semilogx(param_range, test_scores_mean, label='Cross-validation score', color='navy', lw=lw) fig2=plt.fill_between(param_range, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, alpha=0.2, color='navy', lw=lw) fig2.figure.savefig('fig2.png', bbox_inches='tight')  Output image: ## 1.7a Validation Curve (Preventing data leakage) – Python code In this course Applied Machine Learning in Python, the Professor states that when we apply the same data transformation to a entire dataset, it will cause a data leakage. “The proper way to do cross-validation when you need to scale the data is not to scale the entire dataset with a single transform, since this will indirectly leak information into the training data about the whole dataset, including the test data (see the lecture on data leakage later in the course). Instead, scaling/normalizing must be computed and applied for each cross-validation fold separately” So I apply separate scaling to the training and testing folds and plot. In the lecture the Prof states that this can be done using pipelines. import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.cross_validation import KFold from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC # Read the data (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) # Set the parameter range param_range = np.logspace(-3, 2, 20) # Set number of folds folds=5 #Initialize overallTrainAccuracy=[] overallTestAccuracy=[] # Loop over the paramater range for c in param_range: trainingAccuracy=0 testAccuracy=0 kf = KFold(len(X_cancer),n_folds=folds) # Partition into training and test folds for train_index, test_index in kf: # Partition the data acccording the fold indices generated X_train, X_test = X_cancer[train_index], X_cancer[test_index] y_train, y_test = y_cancer[train_index], y_cancer[test_index] # Scale the X_train and X_test scaler = MinMaxScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Fit a SVC model for each C clf = SVC(C=c).fit(X_train_scaled, y_train) #Compute the training and test score acctrain=clf.score(X_train_scaled, y_train) accTest=clf.score(X_test_scaled, y_test) trainingAccuracy += np.sum(acctrain) testAccuracy += np.sum(accTest) # Compute the mean training and testing accuracy overallTrainAccuracy.append(trainingAccuracy/folds) overallTestAccuracy.append(testAccuracy/folds) overallTrainAccuracy=pd.DataFrame(overallTrainAccuracy,index=param_range) overallTestAccuracy=pd.DataFrame(overallTestAccuracy,index=param_range) # Plot training and test R squared as a function of alpha df=pd.concat([overallTrainAccuracy,overallTestAccuracy],axis=1) df.columns=['trainingAccuracy','testAccuracy'] fig3=plt.title('Validation Curve with SVM') fig3=plt.xlabel('$\gamma$(gamma)') fig3=plt.ylabel('Score') fig3=plt.ylim(0.5, 1.1) lw = 2 fig3=plt.semilogx(param_range, overallTrainAccuracy, label='Training score', color='darkorange', lw=lw) fig3=plt.semilogx(param_range, overallTestAccuracy, label='Cross-validation score', color='navy', lw=lw) fig3=plt.legend(loc='best') fig3.figure.savefig('fig3.png', bbox_inches='tight')  Output image: ## 1.8 a Decision trees – R code Decision trees in R can be plotted using RPart package library(rpart) library(rpart.plot) rpart = NULL # Create a decision tree m <-rpart(Species~.,data=iris) #Plot rpart.plot(m,extra=2,main="Decision Tree - IRIS") ## 1.8 b Decision trees – Python code from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn import tree from sklearn.model_selection import train_test_split import graphviz iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state = 3) clf = DecisionTreeClassifier().fit(X_train, y_train) print('Accuracy of Decision Tree classifier on training set: {:.2f}' .format(clf.score(X_train, y_train))) print('Accuracy of Decision Tree classifier on test set: {:.2f}' .format(clf.score(X_test, y_test))) dot_data = tree.export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, special_characters=True) graph = graphviz.Source(dot_data) graph ## Accuracy of Decision Tree classifier on training set: 1.00 ## Accuracy of Decision Tree classifier on test set: 0.97 ## 1.9a Feature importance – R code I found the following code which had a snippet for feature importance. Sklean has a nice method for this. For some reason the results in R and Python are different. Any thoughts? set.seed(3) # load the library library(mlbench) library(caret) # load the dataset cancer <- read.csv("cancer.csv") cancer$target <- as.factor(cancer$target) # Split as data data <- cancer[,1:31] target <- cancer[,32] # Train the model model <- train(data, target, method="rf", preProcess="scale", trControl=trainControl(method = "cv")) # Compute variable importance importance <- varImp(model) # summarize importance print(importance) # plot importance plot(importance) ## 1.9b Feature importance – Python code import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer import numpy as np # Read the data cancer= load_breast_cancer() (X_cancer, y_cancer) = load_breast_cancer(return_X_y = True) X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer, random_state = 0) # Use the DecisionTreClassifier clf = DecisionTreeClassifier(max_depth = 4, min_samples_leaf = 8, random_state = 0).fit(X_train, y_train) c_features=len(cancer.feature_names) print('Breast cancer dataset: decision tree') print('Accuracy of DT classifier on training set: {:.2f}' .format(clf.score(X_train, y_train))) print('Accuracy of DT classifier on test set: {:.2f}' .format(clf.score(X_test, y_test))) # Plot the feature importances fig4=plt.figure(figsize=(10,6),dpi=80) fig4=plt.barh(range(c_features), clf.feature_importances_) fig4=plt.xlabel("Feature importance") fig4=plt.ylabel("Feature name") fig4=plt.yticks(np.arange(c_features), cancer.feature_names) fig4=plt.tight_layout() plt.savefig('fig4.png', bbox_inches='tight')  ## Breast cancer dataset: decision tree ## Accuracy of DT classifier on training set: 0.96 ## Accuracy of DT classifier on test set: 0.94 Output image: ## 1.10a Precision-Recall, ROC curves & AUC- R code I tried several R packages for plotting the Precision and Recall and AUC curve. PRROC seems to work well. The Precision-Recall curves show the tradeoff between precision and recall. The higher the precision, the lower the recall and vice versa.AUC curves that hug the top left corner indicate a high sensitivity,specificity and an excellent accuracy. source("RFunctions-1.R") library(dplyr) library(caret) library(e1071) library(PRROC) # Read the data (this data is from sklearn!) d <- read.csv("digits.csv") digits <- d[2:66] digits$X64 <- as.factor(digits$X64) # Split as training and test sets train_idx <- trainTestSplit(digits,trainPercent=75,seed=5) train <- digits[train_idx, ] test <- digits[-train_idx, ] # Fit a SVM model with linear basis kernel with probabilities svmfit=svm(X64~., data=train, kernel="linear",scale=FALSE,probability=TRUE) ypred=predict(svmfit,test,probability=TRUE) head(attr(ypred,"probabilities")) ## 0 1 ## 6 7.395947e-01 2.604053e-01 ## 8 9.999998e-01 1.842555e-07 ## 12 1.655178e-05 9.999834e-01 ## 13 9.649997e-01 3.500032e-02 ## 15 9.994849e-01 5.150612e-04 ## 16 9.999987e-01 1.280700e-06 # Store the probability of 0s and 1s m0<-attr(ypred,"probabilities")[,1] m1<-attr(ypred,"probabilities")[,2] # Create a dataframe of scores scores <- data.frame(m1,test$X64)

# Class 0 is data points of +ve class (in this case, digit 1) and -ve class (digit 0)
#Compute Precision Recall
pr <- pr.curve(scores.class0=scores[scores$test.X64=="1",]$m1,
scores.class1=scores[scores$test.X64=="0",]$m1,
curve=T)

# Plot precision-recall curve
plot(pr)

#Plot the ROC curve
roc<-roc.curve(m0, m1,curve=TRUE)
plot(roc)

## 1.10b Precision-Recall, ROC curves & AUC- Python code

For Python Logistic Regression is used to plot Precision Recall, ROC curve and compute AUC

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import roc_curve, auc
X, y = dataset.data, dataset.target
#Create 2 classes -i) Digit 1 (from digit 1) ii) Digit 0 (from all other digits)
# Make a copy of the target
z= y.copy()
# Replace all non 1's as 0
z[z != 1] = 0

X_train, X_test, y_train, y_test = train_test_split(X, z, random_state=0)
# Fit a LR model
lr = LogisticRegression().fit(X_train, y_train)

#Compute the decision scores
y_scores_lr = lr.fit(X_train, y_train).decision_function(X_test)
y_score_list = list(zip(y_test[0:20], y_scores_lr[0:20]))

#Show the decision_function scores for first 20 instances
y_score_list

precision, recall, thresholds = precision_recall_curve(y_test, y_scores_lr)
closest_zero = np.argmin(np.abs(thresholds))
closest_zero_p = precision[closest_zero]
closest_zero_r = recall[closest_zero]
#Plot
plt.figure()
plt.xlim([0.0, 1.01])
plt.ylim([0.0, 1.01])
plt.plot(precision, recall, label='Precision-Recall Curve')
plt.plot(closest_zero_p, closest_zero_r, 'o', markersize = 12, fillstyle = 'none', c='r', mew=3)
plt.xlabel('Precision', fontsize=16)
plt.ylabel('Recall', fontsize=16)
plt.axes().set_aspect('equal')
plt.savefig('fig5.png', bbox_inches='tight')

#Compute and plot the ROC
y_score_lr = lr.fit(X_train, y_train).decision_function(X_test)
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_score_lr)
roc_auc_lr = auc(fpr_lr, tpr_lr)

plt.figure()
plt.xlim([-0.01, 1.00])
plt.ylim([-0.01, 1.01])
plt.plot(fpr_lr, tpr_lr, lw=3, label='LogRegr ROC curve (area = {:0.2f})'.format(roc_auc_lr))
plt.xlabel('False Positive Rate', fontsize=16)
plt.ylabel('True Positive Rate', fontsize=16)
plt.title('ROC curve (1-of-10 digits classifier)', fontsize=16)
plt.legend(loc='lower right', fontsize=13)
plt.plot([0, 1], [0, 1], color='navy', lw=3, linestyle='--')
plt.axes()
plt.savefig('fig6.png', bbox_inches='tight')


output

## 1.10c Precision-Recall, ROC curves & AUC- Python code

In the code below classification probabilities are used to compute and plot precision-recall, roc and AUC

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV

X, y = dataset.data, dataset.target
# Make a copy of the target
z= y.copy()
# Replace all non 1's as 0
z[z != 1] = 0

X_train, X_test, y_train, y_test = train_test_split(X, z, random_state=0)
svm = LinearSVC()
# Need to use CalibratedClassifierSVC to redict probabilities for lInearSVC
clf = CalibratedClassifierCV(svm)
clf.fit(X_train, y_train)
y_proba_lr = clf.predict_proba(X_test)
from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(y_test, y_proba_lr[:,1])
closest_zero = np.argmin(np.abs(thresholds))
closest_zero_p = precision[closest_zero]
closest_zero_r = recall[closest_zero]
#plt.figure(figsize=(15,15),dpi=80)
plt.figure()
plt.xlim([0.0, 1.01])
plt.ylim([0.0, 1.01])
plt.plot(precision, recall, label='Precision-Recall Curve')
plt.plot(closest_zero_p, closest_zero_r, 'o', markersize = 12, fillstyle = 'none', c='r', mew=3)
plt.xlabel('Precision', fontsize=16)
plt.ylabel('Recall', fontsize=16)
plt.axes().set_aspect('equal')
plt.savefig('fig7.png', bbox_inches='tight')

# More book, more cricket! 2nd edition of my books now on Amazon

a) Cricket analytics with cricketr
b) Beaten by sheer pace – Cricket analytics with yorkr
is now available on Amazon, both as Paperback and Kindle versions.

The Kindle versions are just $4.99 for both books. Pick up your copies today!!! A) Cricket analytics with cricketr: Second Edition Click hereCricket analytics with cricketr: Second Edition B) Beaten by sheer pace: Cricket analytics with yorkr(2nd edition) Pick up your copies today!!! # My 3 video presentations on “Essential R” In this post I include my 3 video presentations on the topic “Essential R”. In these 3 presentations I cover the entire landscape of R. I cover the following • R Language – The essentials • Key R Packages (dplyr, lubridate, ggplot2, etc.) • How to create R Markdown and share reports • A look at Shiny apps • How to create a simple R package You can download the relevant slide deck and practice code at Essential R Essential R – Part 1 This video cover basic R data types – character, numeric, vectors, matrices, lists and data frames. It also touches on how to subset these data types Essential R – Part 2 This video continues on how to subset dataframes (the most important data type) and some important packages. It also presents one of the most important job of a Data Scientist – that of cleaning and shaping the data. This is done with an example unclean data frame. It also touches on some key operations of dplyr like select, filter, arrange, summarise and mutate. Other packages like lubridate, quantmod are also included. This presentation also shows how to use base plot and ggplot2 Essential R – Part 3 This final session covers R Markdown , and touches on some of the key markdown elements. There is a brief overview of a simple Shiny app. Finally this presentation also shows the key steps to create an R package These 3 R sessions cover most of the basic R topics that we tend to use in a our day-to-day R way of life. With this you should be able to hit the ground running! Hope you enjoy these video presentation and also hope you have an even greater time with R! Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today! To see all my posts click – Index of posts # Analysis of IPL T20 matches with yorkr templates ## Introduction In this post I create RMarkdown templates for end-to-end analysis of IPL T20 matches, that are available on Cricsheet based on my R package yorkr. With these templates you can convert all IPL data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing IPL matches, teams and players. All of these can be accessed at yorkrIPLTemplate. If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

9/Rs 320 and $6.99/Rs448 respectively The templates are 1. Template for conversion and setup – IPLT20Template.Rmd 2. Any IPL match – IPLMatchtemplate.Rmd 3. IPL matches between 2 nations – IPLMatches2TeamTemplate.Rmd 4. A IPL nations performance against all other IPL nations – IPLAllMatchesAllOppnTemplate.Rmd 5. Analysis of IPL batsmen and bowlers of all IPL nations – IPLBatsmanBowlerTemplate.Rmd Besides the templates the repository also includes the converted data for all IPL matches I downloaded from Cricsheet in Dec 2016. So this data is complete till the 2016 IPL season. You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future! See my earlier posts where I analyze IPL T20 1. yorkr crashes the IPL party ! – Part 1 2. yorkr crashes the IPL party! – Part 2 3. yorkr crashes the IPL party! – Part 3! 4. yorkr crashes the IPL party! – Part 4 There will be 5 folders at the root 1. IPLdata – Match files as yaml from Cricsheet 2. IPLMatches – Yaml match files converted to dataframes 3. IPLMatchesBetween2Teams – All Matches between any 2 IPL teams 4. allMatchesAllOpposition – An IPL teams’s performance against all other teams 5. BattingBowlingDetails – Batting and bowling details of all IPL teams library(yorkr) library(dplyr) The first few steps take care of the data setup. This needs to be done before any of the analysis of IPL batsmen, bowlers, any IPL match, matches between any 2 IPL countries or analysis of a teams performance against all other countries There will be 5 folders at the root 1. data 2. IPLMatches 3. IPLMatchesBetween2Teams 4. allMatchesAllOpposition 5. BattingBowlingDetails # The source YAML files will be in IPLData folder # 1.Create directory of IPLMatches Some files may give conversions errors. You could try to debug the problem or just remove it from the IPLdata folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted. Also take a look at my GooglyPlus shiny app which was created after performing the same conversion on the Dec 16 data . convertAllYaml2RDataframesT20("data","IPLMatches") ### 2.Save all matches between all combinations of IPL nations This function will create the set of all matches between each IPL team against every other IPL team. This uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function. setwd("./IPLMatchesBetween2Teams") saveAllMatchesBetween2IPLTeams("../IPLMatches") ### 3.Save all matches against all opposition This will create a consolidated dataframe of all matches played by every IPL playing nation against all other nattions. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function. setwd("../allMatchesAllOpposition") saveAllMatchesAllOppositionIPLT20("../IPLMatches") ### 4. Create batting and bowling details for each IPL team These are the current IPL playing teams. You can add to this vector as newer IPL teams start playing IPL. You will get to know all IPL teams by also look at the directory created above namely allMatchesAllOpposition. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function. setwd("../BattingBowlingDetails") ipl_teams <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings XI Punjab", "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors", "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions", "Rising Pune Supergiants") for(i in seq_along(ipl_teams)){ print(ipl_teams[i]) val <- paste(ipl_teams[i],"-details",sep="") val <- getTeamBattingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE) } for(i in seq_along(ipl_teams)){ print(ipl_teams[i]) val <- paste(ipl_teams[i],"-details",sep="") val <- getTeamBowlingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE) } ### 5. Get the list of batsmen for a particular IPL team The following code is needed for analyzing individual IPL batsmen. In IPL a player could have played in multiple IPL teams. getBatsmen <- function(df){ bmen <- df %>% distinct(batsman) bmen <- as.character(bmen$batsman)
batsmen <- sort(bmen)
}
csk_details <- battingDetails
dc_details <- battingDetails
dd_details <- battingDetails
kxip_details <- battingDetails
ktk_details <- battingDetails
kkr_details <- battingDetails
mi_details <- battingDetails
pw_details <- battingDetails
rr_details <- battingDetails
rcb_details <- battingDetails
sh_details <- battingDetails
gl_details <- battingDetails
rps_details <- battingDetails

#Get the batsmen for each IPL team
csk_batsmen <- getBatsmen(csk_details)
dc_batsmen <- getBatsmen(dc_details)
dd_batsmen <- getBatsmen(dd_details)
kxip_batsmen <- getBatsmen(kxip_details)
ktk_batsmen <- getBatsmen(ktk_details)
kkr_batsmen <- getBatsmen(kkr_details)
mi_batsmen <- getBatsmen(mi_details)
pw_batsmen <- getBatsmen(pw_details)
rr_batsmen <- getBatsmen(rr_details)
rcb_batsmen <- getBatsmen(rcb_details)
sh_batsmen <- getBatsmen(sh_details)
gl_batsmen <- getBatsmen(gl_details)
rps_batsmen <- getBatsmen(rps_details)

# Save the dataframes
save(csk_batsmen,file="csk.RData")
save(dc_batsmen, file="dc.RData")
save(dd_batsmen, file="dd.RData")
save(kxip_batsmen, file="kxip.RData")
save(ktk_batsmen, file="ktk.RData")
save(kkr_batsmen, file="kkr.RData")
save(mi_batsmen , file="mi.RData")
save(pw_batsmen, file="pw.RData")
save(rr_batsmen, file="rr.RData")
save(rcb_batsmen, file="rcb.RData")
save(sh_batsmen, file="sh.RData")
save(gl_batsmen, file="gl.RData")
save(rps_batsmen, file="rps.RData")

### 6. Get the list of bowlers for a particular IPL team

The method below can get the list of bowler names for any IPL team.The following code is needed for analyzing individual IPL bowlers. In IPL a player could have played in multiple IPL teams.

getBowlers <- function(df){
bwlr <- df %>% distinct(bowler)
bwlr <- as.character(bwlr$bowler) bowler <- sort(bwlr) } load("Chennai Super Kings-BowlingDetails.RData") csk_details <- bowlingDetails load("Deccan Chargers-BowlingDetails.RData") dc_details <- bowlingDetails load("Delhi Daredevils-BowlingDetails.RData") dd_details <- bowlingDetails load("Kings XI Punjab-BowlingDetails.RData") kxip_details <- bowlingDetails load("Kochi Tuskers Kerala-BowlingDetails.RData") ktk_details <- bowlingDetails load("Kolkata Knight Riders-BowlingDetails.RData") kkr_details <- bowlingDetails load("Mumbai Indians-BowlingDetails.RData") mi_details <- bowlingDetails load("Pune Warriors-BowlingDetails.RData") pw_details <- bowlingDetails load("Rajasthan Royals-BowlingDetails.RData") rr_details <- bowlingDetails load("Royal Challengers Bangalore-BowlingDetails.RData") rcb_details <- bowlingDetails load("Sunrisers Hyderabad-BowlingDetails.RData") sh_details <- bowlingDetails load("Gujarat Lions-BowlingDetails.RData") gl_details <- bowlingDetails load("Rising Pune Supergiants-BowlingDetails.RData") rps_details <- bowlingDetails # Get the bowlers for each team csk_bowlers <- getBowlers(csk_details) dc_bowlers <- getBowlers(dc_details) dd_bowlers <- getBowlers(dd_details) kxip_bowlers <- getBowlers(kxip_details) ktk_bowlers <- getBowlers(ktk_details) kkr_bowlers <- getBowlers(kkr_details) mi_bowlers <- getBowlers(mi_details) pw_bowlers <- getBowlers(pw_details) rr_bowlers <- getBowlers(rr_details) rcb_bowlers <- getBowlers(rcb_details) sh_bowlers <- getBowlers(sh_details) gl_bowlers <- getBowlers(gl_details) rps_bowlers <- getBowlers(rps_details) #Save the dataframes save(csk_bowlers,file="csk1.RData") save(dc_bowlers, file="dc1.RData") save(dd_bowlers, file="dd1.RData") save(kxip_bowlers, file="kxip1.RData") save(ktk_bowlers, file="ktk1.RData") save(kkr_bowlers, file="kkr1.RData") save(mi_bowlers , file="mi1.RData") save(pw_bowlers, file="pw1.RData") save(rr_bowlers, file="rr1.RData") save(rcb_bowlers, file="rcb1.RData") save(sh_bowlers, file="sh1.RData") save(gl_bowlers, file="gl1.RData") save(rps_bowlers, file="rps1.RData") ### Now we are all set ### A) IPL T20 Match Analysis ### 1 IPL Match Analysis Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData setwd("./IPLMatches") load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData") csk_dc<- overs #The steps are load("IPLTeam1-IPLTeam2-Date.Rdata") IPLTeam1_IPLTeam2 <- overs All analysis for this match can be done now ### 2. Scorecard teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1") teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2") ### 3.Batting Partnerships teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1") ### 4. Batsmen vs Bowler Plot teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE) teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) ### 5. Team bowling scorecard teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1") teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2") ### 6. Team bowling Wicket kind match teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 7. Team Bowling Wicket Runs Match teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 8. Team Bowling Wicket Match m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") ### 9. Team Bowler vs Batsmen teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 10. Match Worm chart matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") ### B) IPL Matches between 2 IPL teams ### 1 IPL Match Analysis Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData setwd("./IPLMatches") load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData") csk_dc<- overs #The steps are load("IPLTeam1-IPLTeam2-Date.Rdata") IPLTeam1_IPLTeam2 <- overs All analysis for this match can be done now ### 2. Scorecard teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1") teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2") ### 3.Batting Partnerships teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1") ### 4. Batsmen vs Bowler Plot teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE) teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) ### 5. Team bowling scorecard teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1") teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2") ### 6. Team bowling Wicket kind match teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 7. Team Bowling Wicket Runs Match teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 8. Team Bowling Wicket Match m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") ### 9. Team Bowler vs Batsmen teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE) m ### 10. Match Worm chart matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2") ### C) IPL Matches for a team against all other teams ### 1. IPL Matches for a team against all other teams Load the data between for a IPL team against all other countries ./allMatchesAllOpposition for e.g all matches of Kolkata Knight Riders load("allMatchesAllOpposition-Kolkata Knight Riders.RData") kkr_matches <- matches IPLTeam="IPLTeam1" allMatches <- paste("allMatchesAllOposition-",IPLTeam,".RData",sep="") load(allMatches) IPLTeam1AllMatches <- matches  ### 2. Team’s batting scorecard all Matches m <-teamBattingScorecardAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1") m ### 3. Batting scorecard of opposing team m <-teamBattingScorecardAllOppnAllMatches(matches=IPLTeam1AllMatches,theTeam="IPLTeam2") ### 4. Team batting partnerships m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1") m m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="detailed") head(m,30) m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="summary") m ### 5. Team batting partnerships plot teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam1") teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam2") ### 6, Team batsmen vs bowlers report m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=0) m m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=30) m m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam2",rank=1,dispRows=25) m ### 7. Team batsmen vs bowler plot d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=50) d teamBatsmenVsBowlersAllOppnAllMatchesPlot(d) d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=2,dispRows=50) teamBatsmenVsBowlersAllOppnAllMatchesPlot(d) ### 8. Team bowling scorecard teamBowlingScorecardAllOppnAllMatchesMain(matches=IPLTeam1AllMatches,theTeam="IPLTeam1") teamBowlingScorecardAllOppnAllMatches(IPLTeam1AllMatches,'IPLTeam2') ### 9. Team bowler vs batsmen teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0) teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=2) teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0) ### 10. Team Bowler vs bastmen df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=1) teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"IPLTeam1","IPLTeam1") ### 11. Team bowler wicket kind teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All") teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2")  ### 12. teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All",plot=TRUE) teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2",plot=TRUE) ### 1 IPL Batsman setup functions Get the batsman’s details for a batsman setwd("../BattingBowlingDetails") # IPL Team names IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors", "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions", "Rising Pune Supergiants") # Check and get the team indices of IPL teams in which the batsman has played getTeamIndex <- function(batsman){ setwd("./BattingBowlingDetails") load("csk.RData") load("dc.RData") load("dd.RData") load("kxip.RData") load("ktk.RData") load("kkr.RData") load("mi.RData") load("pw.RData") load("rr.RData") load("rcb.RData") load("sh.RData") load("gl.RData") load("rps.RData") setwd("..") getwd() print(ls()) teams_batsmen = list(csk_batsmen,dc_batsmen,dd_batsmen,kxip_batsmen,ktk_batsmen,kkr_batsmen,mi_batsmen, pw_batsmen,rr_batsmen,rcb_batsmen,sh_batsmen,gl_batsmen,rps_batsmen) b <- NULL for (i in 1:length(teams_batsmen)){ a <- which(teams_batsmen[[i]] == batsman) if(length(a) != 0) b <- c(b,i) } b } # Get the list of the IPL team names from the indices passed getTeams <- function(x){ l <- NULL # Get the teams passed in as indexes for (i in seq_along(x)){ l <- c(l, IPLTeamNames[[x[i]]]) } l } # Create a consolidated data frame with all teams the IPL batsman has played for getIPLBatsmanDF <- function(teamNames){ batsmanDF <- NULL # Create a consolidated Data frame of batsman for all IPL teams played for (i in seq_along(teamNames)){ df <- getBatsmanDetails(team=teamNames[i],name=IPLBatsman,dir="./BattingBowlingDetails") batsmanDF <- rbind(batsmanDF,df) } batsmanDF }  ### 2. Create a consolidated IPL batsman data frame # Since an IPL batsman coculd have played in multiple teams we need to determine these teams and # create a consolidated data frame for the analysis # For example to check MS Dhoni we need to do the following IPLBatsman = "MS Dhoni" #Check and get the team indices of IPL teams in which the batsman has played i <- getTeamIndex(IPLBatsman) # Get the team names in which the IPL batsman has played teamNames <- getTeams(i) # Check if file exists in the directory. This check is necessary when moving between matchType ############## Create a consolidated IPL batsman dataframe for analysis batsmanDF <- getIPLBatsmanDF(teamNames)  ### 3. Runs vs deliveries # For e.g. batsmanName="MS Dhoni"" #batsmanRunsVsDeliveries(batsmanDF, "MS Dhoni") batsmanRunsVsDeliveries(batsmanDF,"batsmanName") ### 4. Batsman 4s & 6s batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs) p1 <- batsmanFoursSixes(batsman46,"batsmanName") ### 5. Batsman dismissals batsmanDismissals(batsmanDF,"batsmanName") ### 6. Runs vs Strike rate batsmanRunsVsStrikeRate(batsmanDF,"batsmanName") ### 7. Batsman Moving Average batsmanMovingAverage(batsmanDF,"batsmanName") ### 8. Batsman cumulative average batsmanCumulativeAverageRuns(batsmanDF,"batsmanName") ### 9. Batsman cumulative strike rate batsmanCumulativeStrikeRate(batsmanDF,"batsmanName") ### 10. Batsman runs against oppositions batsmanRunsAgainstOpposition(batsmanDF,"batsmanName") ### 11. Batsman runs vs venue batsmanRunsVenue(batsmanDF,"batsmanName") ### 12. Batsman runs predict batsmanRunsPredict(batsmanDF,"batsmanName") ### 13.Bowler set up functions setwd("../BattingBowlingDetails") # IPL Team names IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors", "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions", "Rising Pune Supergiants") # Get the team indices of IPL teams for which the bowler as played getTeamIndex_bowler <- function(bowler){ # Load IPL Bowlers setwd("./data") load("csk1.RData") load("dc1.RData") load("dd1.RData") load("kxip1.RData") load("ktk1.RData") load("kkr1.RData") load("mi1.RData") load("pw1.RData") load("rr1.RData") load("rcb1.RData") load("sh1.RData") load("gl1.RData") load("rps1.RData") setwd("..") teams_bowlers = list(csk_bowlers,dc_bowlers,dd_bowlers,kxip_bowlers,ktk_bowlers,kkr_bowlers,mi_bowlers, pw_bowlers,rr_bowlers,rcb_bowlers,sh_bowlers,gl_bowlers,rps_bowlers) b <- NULL for (i in 1:length(teams_bowlers)){ a <- which(teams_bowlers[[i]] == bowler) if(length(a) != 0){ b <- c(b,i) } } b } # Get the list of the IPL team names from the indices passed getTeams <- function(x){ l <- NULL # Get the teams passed in as indexes for (i in seq_along(x)){ l <- c(l, IPLTeamNames[[x[i]]]) } l } # Get the team names teamNames <- getTeams(i) getIPLBowlerDF <- function(teamNames){ bowlerDF <- NULL # Create a consolidated Data frame of batsman for all IPL teams played for (i in seq_along(teamNames)){ df <- getBowlerWicketDetails(team=teamNames[i],name=IPLBowler,dir="./BattingBowlingDetails") bowlerDF <- rbind(bowlerDF,df) } bowlerDF } ### 14. Get the consolidated data frame for an IPL bowler # Since an IPL bowler could have played in multiple teams we need to determine these teams and # create a consolidated data frame for the analysis # For example to check R Ashwin we need to do the following IPLBowler = "R Ashwin" #Check and get the team indices of IPL teams in which the batsman has played i <- getTeamIndex(IPLBowler) # Get the team names in which the IPL batsman has played teamNames <- getTeams(i) # Check if file exists in the directory. This check is necessary when moving between matchType ############## Create a consolidated IPL batsman dataframe for analysis bowlerDF <- getIPLBowlerDF(teamNames)  ### 15. Bowler Mean Economy rate # For e.g. to get the details of R Ashwin do #bowlerMeanEconomyRate(bowlerDF,"R Ashwin") bowlerMeanEconomyRate(bowlerDF,"bowlerName") ### 16. Bowler mean runs conceded bowlerMeanRunsConceded(bowlerDF,"bowlerName") ### 17. Bowler Moving Average bowlerMovingAverage(bowlerDF,"bowlerName") ### 18. Bowler cumulative average wickets bowlerCumulativeAvgWickets(bowlerDF,"bowlerName") ### 19. Bowler cumulative Economy Rate (ER) bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName") ### 20. Bowler wicket plot bowlerWicketPlot(bowlerDF,"bowlerName") ### 21. Bowler wicket against opposition bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName") ### 22. Bowler wicket at cricket grounds bowlerWicketsVenue(bowlerDF,"bowlerName") ### 23. Predict number of deliveries to wickets setwd("./IPLMatches") bowlerDF1 <- getDeliveryWickets(team="IPLTeam1",dir=".",name="bowlerName",save=FALSE) bowlerWktsPredict(bowlerDF1,"bowlerName") # cricketr and yorkr books – Paperback now in Amazon My books – Cricket Analytics with cricketr – Beaten by sheer pace!: Cricket analytics with yorkr are now available on Amazon in both Paperback and Kindle versions The cricketr and yorkr packages are written in R, and both are available in CRAN. The books contain details on how to use these R packages to analyze performance of cricketers. cricketr is based on data from ESPN Cricinfo Statsguru, and can analyze Test, ODI and T20 batsmen & bowlers. yorkr is based on data from Cricsheet, and can analyze ODI, T20 and IPL. yorkr can analyze batsmen, bowlers, matches and teams. Cricket Analytics with cricketr You can access the paperback at Cricket analytics with cricketr Beaten by sheer pace! Cricket Analytics with yorkr You can buy the paperback from Amazon at Beaten by sheer pace: Cricket analytics with yorkr Order your copy today! Hope you have a great time reading! # Inswinger: yorkr swings into International T20s In this post I introduce ‘Inswinger’ an interactive Shiny app to analyze International T20 players, matches and teams. This app was a natural consequence to my earlier Shiny app ‘GooglyPlus’. Most of the structure for this app remained the same, I only had to work with a different dataset, so to speak. The Googly Shiny app is based on my R package ‘yorkr’ which is now available in CRAN. The R package and hence this Shiny app is based on data from Cricsheet. Inswinger is based on the latest data dump from Cricsheet (Dec 2016) and includes all International T20 till then. There are a lot of new Internationation teams like Oman, Hong Kong, UAE, etc. In total there are 22 different International T20 teams in my Inswinger app. The countries are a) Afghanistan b) Australia c) Bangladesh d) Bermuda e) Canada f) England g) Hong Kong h) India i) Ireland j) Kenya k) Nepal l) Netherlands m) New Zealand n) Oman o) Pakistan p) Papua New Guinea q) Scotland r) South Africa s) Sri Lanka t) United Arab Emirates u) West Indies v) Zimbabwe If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. Hence in the Inswinger Shiny app results can be displayed both as table or a plot depending on the choice of function.

Inswinger can do detailed analyses of a) Individual T20 batsman b) Individual T20 bowler c) Any T20 match d) Head to head confrontation between 2 T20 teams e) All matches of a T20 team against all other teams.

The Shiny app can be accessed at Inswinger

The code for Inswinger is available at Github. Feel free to clone/download/fork  the code from Inswinger

Based on the 5 detailed analysis domains there are 5 tabs
A) T20 Batsman: This tab can be used to perform analysis of all T20 batsman. If a batsman has played in more than 1 team, then the overall performance is considered. There are 10 functions for the T20 Batsman. They are shown below
– Batsman Runs vs. Deliveries
– Batsman’s Fours & Sixes
– Dismissals of batsman
– Batsman’s Runs vs Strike Rate
– Batsman’s Moving Average
– Batsman’s Cumulative Average Run
– Batsman’s Cumulative Strike Rate
– Batsman’s Runs against Opposition
– Batsman’s Runs at Venue
– Predict Runs of batsman

B) T20 Bowler: This tab can be used to analyze individual T20 bowlers. The functions handle T20 bowlers who have played in more than 1 T20 team.
– Mean Economy Rate of bowler
– Mean runs conceded by bowler
– Bowler’s Moving Average
– Bowler’s Cumulative Avg. Wickets
– Bowler’s Cumulative Avg. Economy Rate
– Bowler’s Wicket Plot
– Bowler’s Wickets against opposition
– Bowler’s Wickets at Venues
– Bowler’s wickets prediction

C) T20 match: This tab can be used for analyzing individual T20 matches. The available functions are
– Match Batting Scorecard – Table
– Batting Partnerships – Plot, Table
– Batsmen vs Bowlers – Plot, Table
– Match Bowling Scorecard   – Table
– Bowling Wicket Kind – Plot, Table
– Bowling Wicket Runs – Plot, Table
– Bowling Wicket Match – Plot, Table
– Bowler vs Batsmen – Plot, Table
– Match Worm Graph – Plot

D) Head to head: This tab can be used for analyzing head-to-head confrontations, between any 2 T20 teams for e.g. all matches between India vs Australia or West Indies vs Sri Lanka . The available functions are
-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary and Detailed}
-Team Batting Scorecard All Matches – Table
-Team Batsmen vs Bowlers all Matches – Plot, Table
-Team Wickets Opposition All Matches – Plot, Table
-Team Bowling Scorecard All Matches – Table
-Team Bowler vs Batsmen All Matches – Plot, Table
-Team Bowlers Wicket Kind All Matches – Plot, Table
-Team Bowler Wicket Runs All Matches – Plot, Table
– Win Loss All Matches – Plot

E) T20 team’s overall performance: this tab can be used analyze the overall performance of any T20 team. For this analysis all matches played by this team is considered. The available functions are
-Team Batsmen Partnerships Overall – Plot, Table {Summary and Detailed)}
-Team Batting Scorecard Overall –Table
-Team Batsmen vs Bowlers Overall – Plot, Table
-Team Bowler vs Batsmen Overall – Plot, Table
-Team Bowling Scorecard Overall – Table
-Team Bowler Wicket Kind Overall – Plot, Table

Below I include a random set of charts that are generated in each of the 5 tabs
A. IPL Batsman
a. Shakib-al-Hassan (Bangladesh) :  Runs vs Deliveries

b. Virat Kohli (India) – Cumulative Average

c.  AB Devilliers (South Africa) – Runs at venues

d. Glenn Maxwell (Australia)  – Predict runs vs deliveries faces

B. IPL Bowler
a. TG Southee (New Zealand) – Mean Economy Rate vs overs

b) DJ Bravo – Moving Average of wickets

c) AC Evans (Scotland) – Bowler Wickets Against Opposition

C.T20 Match
a. Match Score (Afghanistan vs Canada, 2012-03-18)

b)  Match batting partnerships (Plot) Hong Kong vs Oman (2015-11-21), Hong Kong
Hong Kong Partnerships

c) Match batting partnerships (Table) – Ireland vs Scotland(2012-03-18, Ireland)
Batting partnership can also be displayed as a table

d) Batsmen vs Bowlers (Plot) – India vs England (2012-12-22)

e) Match Worm Chart – Sri Lanka vs Pakistan (2015-08-01)

a) Team Batsmen Partnership (Plot) – India vs Australia (all matches)
Virat Kohli has the highest total runs in partnerships against Australia

b)  Team Batsmen Partnership (Summary – Table) – Kenya vs Bangladesh

c) Team Bowling Scorecard (Table only) India vs South Africa all Matches

d) Wins- Losses New Zealand vs West Indies all Matches

C) Overall performances
a) Batting Scorecard All Matches  (Table only) – England’s overall batting performance
Eoin Morgan, Kevin Pieterson  & SJ Taylor have the best performance

b) Batsman vs Bowlers all Matches (Plot)
India’s best performing batsman (Rank=1) is Virat Kohli

c)  Batsman vs Bowlers all Matches (Table)
The plot above for Virat Kohli can also be displayed as a table. Kohli has score most runs DJ Bravo, SR Watson & Shahid Afridi

The Inswinger Shiny app can be accessed at Inswinger. Give it a swing!

The code for Inswinger is available at Github. Feel free to clone/download/fork  the code from Inswinger

Also see my other Shiny apps
1.GooglyPlus
2.What would Shakespeare say?
3.Sixer
4.Revisiting crimes against women in India

To see all posts take at a look at Index of Posts

# GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables

In this post I introduce my new Shiny app,“GooglyPlus”, which is a  more evolved version of my earlier Shiny app “Googly”. My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. My initial version of the app only included plots, and did not exercise the yorkr package fully. Moreover, I am certain, there may be a set of cricket aficionados who would prefer, numbers to charts. Hence I have created this enhanced version of the Googly app and appropriately renamed it as GooglyPlus. GooglyPlus is based on the yorkr package which uses data from Cricsheet. The app is based on IPL data from  all IPL matches from 2008 up to 2016. Feel free to clone/fork or download the code from Github at GooglyPlus.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/\$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

Click  GooglyPlus to access the Shiny app!

The changes for GooglyPlus over the earlier Googly app is only in the following 3 tab panels

• IPL match
• Overall Performance

The analysis of IPL batsman and IPL bowler tabs are unchanged. These charts are as they were before.

The changes are only in  tabs i) IPL match ii) Head to head and  iii) Overall Performance. New functionality has been added and existing functions now have the dual option of either displaying a plot or a table.

The changes are

A) IPL Match
The following additions/enhancements have been done

-Match Batting Scorecard – Table
-Batting Partnerships – Plot, Table (New)
-Batsmen vs Bowlers – Plot, Table(New)
-Match Bowling Scorecard   – Table (New)
-Bowling Wicket Kind – Plot, Table (New)
-Bowling Wicket Runs – Plot, Table (New)
-Bowling Wicket Match – Plot, Table (New)
-Bowler vs Batsmen – Plot, Table (New)
-Match Worm Graph – Plot

The following functions have been added/enhanced

-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard All Matches – Table (New)
-Team Batsmen vs Bowlers all Matches – Plot, Table (New)
-Team Wickets Opposition All Matches – Plot, Table (New)
-Team Bowling Scorecard All Matches – Table (New)
-Team Bowler vs Batsmen All Matches – Plot, Table (New)
-Team Bowlers Wicket Kind All Matches – Plot, Table (New)
-Team Bowler Wicket Runs All Matches – Plot, Table (New)
-Win Loss All Matches – Plot

C) Overall Performance
The following additions/enhancements have been done in this tab

-Team Batsmen Partnerships Overall – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard Overall –Table (New)
-Team Batsmen vs Bowlers Overall – Plot, Table (New)
-Team Bowler vs Batsmen Overall – Plot, Table (New)
-Team Bowling Scorecard Overall – Table (New)
-Team Bowler Wicket Kind Overall – Plot, Table (New)

Included below are some random charts and tables. Feel free to explore the Shiny app further

1) IPL Match
a) Match Batting Scorecard (Table only)
This is the batting score card for the Chennai Super Kings & Deccan Chargers 2011-05-11

b)  Match batting partnerships (Plot)
Delhi Daredevils vs Kings XI Punjab – 2011-04-23

c) Match batting partnerships (Table)
The same batting partnership  Delhi Daredevils vs Kings XI Punjab – 2011-04-23 as a table

d) Batsmen vs Bowlers (Plot)
Kolkata Knight Riders vs Mumbai Indians 2010-04-19

e)  Match Bowling Scorecard (Table only)

a) Team Batsmen Partnership (Plot)
Deccan Chargers vs Kolkata Knight Riders all matches

b)  Team Batsmen Partnership (Summary – Table)
In the following tables it can be seen that MS Dhoni has performed better that SK Raina  CSK against DD matches, whereas SK Raina performs better than Dhoni in CSK vs  KKR matches

i) Chennai Super Kings vs Delhi Daredevils (Summary – Table)

ii) Chennai Super Kings vs Kolkata Knight Riders (Summary – Table)

iii) Rising Pune Supergiants vs Gujarat Lions (Detailed – Table)
This table provides the detailed partnership for RPS vs GL all matches

c) Team Bowling Scorecard (Table only)
This table gives the bowling scorecard of Pune Warriors vs Deccan Chargers in all matches

C) Overall performances
a) Batting Scorecard All Matches  (Table only)

This is the batting scorecard of Royal Challengers Bangalore. The top 3 batsmen are V Kohli, C Gayle and AB Devilliers in that order

b) Batsman vs Bowlers all Matches (Plot)
This gives the performance of Mumbai Indian’s batsman of Rank=1, which is Rohit Sharma, against bowlers of all other teams

c)  Batsman vs Bowlers all Matches (Table)
The above plot as a table. It can be seen that Rohit Sharma has scored maximum runs against M Morkel, then Shakib Al Hasan and then UT Yadav.

d) Bowling scorecard (Table only)
The table below gives the bowling scorecard of CSK. R Ashwin leads with a tally of 98 wickets followed by DJ Bravo who has 88 wickets and then JA Morkel who has 83 wickets in all matches against all teams

This is just a random selection of functions. Do play around with the app and checkout how the different IPL batsmen, bowlers and teams stack against each other. Do read my earlier post Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr  for more details about the app and other functions available.

Click GooglyPlus to access the Shiny app!