cricketr sizes up legendary All-rounders of yesteryear

Introduction

This is a post I have been wanting to write for several months, but had to put it off for one reason or another. In this post I use my R package cricketr to analyze the performance of All-rounder greats namely Kapil Dev, Ian Botham, Imran Khan and Richard Hadlee. All these players had talent that was natural and raw. They were good strikers of the ball and extremely lethal with their bowling. The ODI data for these players have been taken from ESPN Cricinfo.

Please be mindful of the ESPN Cricinfo Terms of Use

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

You can also read this post at Rpubs as cricketr-AR. Dowload this report as a PDF file from cricketr-AR

Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!!  See Cricketr adds team analytics to its repertoire!!!

Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players

Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

Important note: Do check out my other posts using cricketr at cricketr-posts

All Rounders

  1. Kapil Dev (Ind)
  2. Ian Botham (Eng)
  3. Imran Khan (Pak)
  4. Richard Hadlee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

if (!require("cricketr")){ 
    install.packages("cricketr",) 
} 

library(cricketr)

The data for any particular ODI player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Playerand type in the name of the player for e.g Kapil Dev, etc. This will bring up a page which have the profile number for the player e.g. for Kapil Dev this would be http://www.espncricinfo.com/india/content/player/30028.html. Hence, Kapils’s profile is 30028. This can be used to get the data for Kapil Dev’s data as shown below. I have already executed the below 4 commands and I will use the files to run further commands

#kapil1 
#botham11 
#imran1 
#hadlee1 

Analyses of batting performances of the All Rounders

The following plots gives the analysis of the 4 ODI batsmen

  1. Kapil Dev (Ind) – Innings – 225, Runs = 3783, Average=23.79, Strike Rate= 95.07
  2. Ian Botham (Eng) – Innings – 116, Runs= 2113, Average=23.21, Strike Rate= 79.10
  3. Imran Khan (Pak) – Innings – 175, Runs= 3709, Average=33.41, Strike Rate= 72.65
  4. Richard Hadlee (NZ) – Innings – 115, Runs= 1751, Average=21.61, Strike Rate= 75.50

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen

A. Kapil Dev
It can be seen that Kapil scores four 4’s when he scores 50. Also after facing 50 deliveries he scores around 43

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kapil1.csv","Kapil")
batsman6s("./kapil1.csv","Kapil")
batsmanScoringRateODTT("./kapil1.csv","Kapil")

kapil-4s6ssr-1

dev.off()
## null device 
##           1

B. Ian Botham
Botham scores around 39 runs after 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./botham1.csv","Botham")
batsman6s("./botham1.csv","Botham")
batsmanScoringRateODTT("./botham1.csv","Botham")

botham-4s6sr-1

dev.off()
## null device 
##           1

C. Imran Khan
Imran scores around 36 runs for 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./imran1.csv","Imran")
batsman6s("./imran1.csv","Imran")
batsmanScoringRateODTT("./imran1.csv","Imran")

imran-4s6ssr-1

dev.off()
## null device 
##           1

D. Richard Hadlee
Hadlee also scores around 30 runs facing 50 deliveries

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./hadlee1.csv","Hadlee")
batsman6s("./hadlee1.csv","Hadlee")
batsmanScoringRateODTT("./hadlee1.csv","Hadlee")

hadlee-4s6sout-1

dev.off()
## null device 
##           1

Cumulative Average runs of batsman in career

Kapils cumulative avrerage runs drops towards the last 15 innings wheres Botham had a good run towards the end of his career. Imran performance as a batsman really peaks towards the end with a cumulative average of almost 25 runs. Hadlee has a stead performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./kapil1.csv","Kapil")

kbih-car-1

batsmanCumulativeAverageRuns("./botham1.csv","Botham")

kbih-car-2

batsmanCumulativeAverageRuns("./imran1.csv","Imran")

kbih-car-3

batsmanCumulativeAverageRuns("./hadlee1.csv","Hadlee")

kbih-car-4

dev.off()
## null device 
##           1

Cumulative Average strike rate of batsman in career

Kapil’s strike rate is superlative touching the 90’s steadily. Botham’s strike drops dramatically towards the latter part of his career. Imran average at a steady 75 and Hadlee averages around 85.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./kapil1.csv","Kapil")

kbih-casr-1

batsmanCumulativeStrikeRate("./botham1.csv","Botham")

kbih-casr-2

batsmanCumulativeStrikeRate("./imran1.csv","Imran")

kbih-casr-3

batsmanCumulativeStrikeRate("./hadlee1.csv","Hadlee")

kbih-casr-4

dev.off()
## null device 
##           1

Relative Mean Strike Rate

Kapil tops the strike rate among all the all-rounders. This is really a revelation to me. This can also be seen in the original data in Kapil’s strike rate is at a whopping 95.07 in comparison to Botham, Inran and Hadlee who are at 79.1,72.65 and 75.50 respectively

par(mar=c(4,4,2,2))
frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

This plot shows that Imran has a much better average runs scored over the other all rounders followed by Kapil

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Relative cumulative average runs in career

It can be seen clearly that Imran Khan leads the pack in cumulative average runs followed by Kapil Dev and then Botham

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanCumulativeAvgRuns(frames,names)

kbih-relcar-1

Relative cumulative average strike rate in career

In the cumulative strike rate Hadlee and Kapil run a close race.

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBatsmanCumulativeStrikeRate(frames,names)

kbih-relcsr-1

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Kapil Botham Imran Hadlee
## Runs(1s,2s,3s) 72.08  66.53 77.53  73.27
## 4s             21.98  25.78 17.61  21.08
## 6s              5.94   7.68  4.86   5.65

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./kapil1.csv","Kapil")
batsmanPerfForecast("./botham1.csv","Botham")
batsmanPerfForecast("./imran1.csv","Imran")
batsmanPerfForecast("./hadlee1.csv","Hadlee")

plot-fcst-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kapil1.csv","Kapil")
battingPerf3d("./botham1.csv","Botham")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./imran1.csv","Imran")
battingPerf3d("./hadlee1.csv","Hadlee")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

kapil <- batsmanRunsPredict("./kapil1.csv","Kapil",newdataframe=newDF)
botham <- batsmanRunsPredict("./botham1.csv","Botham",newdataframe=newDF)
imran <- batsmanRunsPredict("./imran1.csv","Imran",newdataframe=newDF)
hadlee <- batsmanRunsPredict("./hadlee1.csv","Hadlee",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Kapil is the best bet for a balls faced and minutes at crease followed by Botham.

batsmen <-cbind(round(kapil$Runs),round(botham$Runs),round(imran$Runs),round(hadlee$Runs))
colnames(batsmen) <- c("Kapil","Botham","Imran","Hadlee")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kapil Botham Imran Hadlee
## 1          10           30    16      6    10     15
## 2          31           51    33     22    22     28
## 3          52           72    49     38    33     42
## 4          73           93    65     54    45     56
## 5          94          114    81     70    56     70
## 6         116          136    97     86    67     84
## 7         137          157   113    102    79     97
## 8         158          178   130    117    90    111
## 9         179          199   146    133   102    125
## 10        200          220   162    149   113    139

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means . A. Kapil Dev

batsmanRunsLikelihood("./kapil1.csv","Kapil")

kapil11-1

## Summary of  Kapil 's runs scoring likelihood
## **************************************************
## 
## There is a 34.57 % likelihood that Kapil  will make  22 Runs in  24 balls over 34  Minutes 
## There is a 17.28 % likelihood that Kapil  will make  46 Runs in  46 balls over  65  Minutes 
## There is a 48.15 % likelihood that Kapil  will make  5 Runs in  7 balls over 9  Minutes

B. Ian Botham

batsmanRunsLikelihood("./botham1.csv","Botham")

devilliers-1

## Summary of  Botham 's runs scoring likelihood
## **************************************************
## 
## There is a 47.95 % likelihood that Botham  will make  9 Runs in  12 balls over 15  Minutes 
## There is a 39.73 % likelihood that Botham  will make  23 Runs in  32 balls over  44  Minutes 
## There is a 12.33 % likelihood that Botham  will make  59 Runs in  74 balls over 101  Minutes

C. Imran Khan

batsmanRunsLikelihood("./imran1.csv","Imran")

gaylecache-true-1

## Summary of  Imran 's runs scoring likelihood
## **************************************************
## 
## There is a 23.33 % likelihood that Imran  will make  36 Runs in  54 balls over 74  Minutes 
## There is a 60 % likelihood that Imran  will make  14 Runs in  18 balls over  23  Minutes 
## There is a 16.67 % likelihood that Imran  will make  53 Runs in  90 balls over 115  Minutes

D. Richard Hadlee

batsmanRunsLikelihood("./hadlee1.csv","Hadlee")

maxwell-1

## Summary of  Hadlee 's runs scoring likelihood
## **************************************************
## 
## There is a 6.1 % likelihood that Hadlee  will make  64 Runs in  79 balls over 90  Minutes 
## There is a 42.68 % likelihood that Hadlee  will make  25 Runs in  33 balls over  44  Minutes 
## There is a 51.22 % likelihood that Hadlee  will make  9 Runs in  11 balls over 15  Minutes

Average runs at ground and against opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./kapil1.csv","Kapil")
batsmanAvgRunsOpposition("./kapil1.csv","Kapil")

avgrg-1-1

dev.off()
## null device 
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./botham1.csv","Botham")
batsmanAvgRunsOpposition("./botham1.csv","Botham")

avgrg-2-1

dev.off()
## null device 
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./imran1.csv","Imran")
batsmanAvgRunsOpposition("./imran1.csv","Imran")

avgrg-3-1

dev.off()
## null device 
##           1

D. Richard Hadlee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./hadlee1.csv","Hadlee")
batsmanAvgRunsOpposition("./hadlee1.csv","Hadlee")

avgrg-4-1

dev.off()
## null device 
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

Kapil’s performance drops significantly while there is a slump in Botham’s performance. On the other hand Imran and Hadlee’s performance were on the upswing.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kapil1.csv","Kapil")
batsmanMovingAverage("./botham1.csv","Botham")
batsmanMovingAverage("./imran1.csv","Imran")
batsmanMovingAverage("./hadlee1.csv","Hadlee")

sdgm-ma-1

dev.off()
## null device 
##           1

Check batsmen in-form, out-of-form

[1] “**************************** Form status of Kapil ****************************\n\n
Population size: 72
Mean of population: 19.38 \n
Sample size: 9 Mean of sample: 6.78 SD of sample: 6.14 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 8.4e-05 is less than alpha= 0.05

“**************************** Form status of Botham ****************************\n\n
Population size: 65
Mean of population: 21.29 \n
Sample size: 8 Mean of sample: 15.38 SD of sample: 13.19 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.120342 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 54
Mean of population: 24.94 \n
Sample size: 6 Mean of sample: 30.83 SD of sample: 25.4 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence interval of population average\n\n
Imran ‘s Form Status: In-Form because the p value: 0.704683 is greater than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 73
Mean of population: 18 \n
Sample size: 9 Mean of sample: 27 SD of sample: 24.27 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.85262 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

Analyses of bowling performances of the All Rounders

The following plots gives the analysis of the 4 ODI batsmen

  1. Kapil Dev (Ind) – Innings – 225, Wickets = 253, Average=27.45, Economy Rate= 3.71
  2. Ian Botham (Eng) – Innings – 116, Wickets = 145, Average=28.54, Economy Rate= 3.96
  3. Imran Khan (Pak) – Innings – 175, Wickets = 182, Average=26.61, Economy Rate= 3.89
  4. Richard Hadlee (NZ) – Innings – 115, Wickets = 158, Average=21.56, Economy Rate= 3.30

Botham has the highest number of innings and wickets followed closely by Mitchell. Imran and Hadlee have relatively fewer innings.

To get the bowler’s data use

#kapil2 
#botham2 
#imran2 
#hadlee2 

“`

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc).

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./kapil2.csv","Kapil")
bowlerWktsFreqPercent("./botham2.csv","Botham")
bowlerWktsFreqPercent("./imran2.csv","Imran")
bowlerWktsFreqPercent("./hadlee2.csv","Hadlee")

relbowlfp-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./kapil2.csv","Kapil")
bowlerWktsRunsPlot("./botham2.csv","Botham")
bowlerWktsRunsPlot("./imran2.csv","Imran")
bowlerWktsRunsPlot("./hadlee2.csv","Hadlee")

wktsrun-1

dev.off()
## null device 
##           1

Cumulative average wicket plot

Botham has the best cumulative average wicket touching almost 1.6 wickets followed by Hadlee

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("./kapil2.csv","Kapil")

kwm-bowlcaw-1

bowlerCumulativeAvgWickets("./botham2.csv","Botham")

kwm-bowlcaw-2

bowlerCumulativeAvgWickets("./imran2.csv","Imran")

kwm-bowlcaw-3

bowlerCumulativeAvgWickets("./hadlee2.csv","Hadlee")

kwm-bowlcaw-4

dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("./kapil2.csv","Kapil")

kwm-bowlcer-1

bowlerCumulativeAvgEconRate("./botham2.csv","Botham")

kwm-bowlcer-2

bowlerCumulativeAvgEconRate("./imran2.csv","Imran")

kwm-bowlcer-3

bowlerCumulativeAvgEconRate("./hadlee2.csv","Hadlee")

kwm-bowlcer-4

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./kapil2.csv","Kapil")
bowlerAvgWktsOpposition("./kapil2.csv","Kapil")

gr-1-1

dev.off()
## null device 
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./botham2.csv","Botham")
bowlerAvgWktsOpposition("./botham2.csv","Botham")

gr-2-1

dev.off()
## null device 
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./imran2.csv","Imran")
bowlerAvgWktsOpposition("./imran2.csv","Imran")

gr-3-1

dev.off()
## null device 
##           1

D. Richard Hadlee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./hadlee2.csv","Hadlee")
bowlerAvgWktsOpposition("./hadlee2.csv","Hadlee")

gr-4-1

dev.off()
## null device 
##           1

Relative bowling performance

It can be seen that Botham is the most effective wicket taker of the lot

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlingPerf(frames,names)

relbowlperf-1

Relative Economy Rate against wickets taken

Hadlee has the best overall economy rate followed by Kapil Dev

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlingERODTT(frames,names)

relbowler-1

Relative cumulative average wickets of bowlers in career

This plot confirms the wicket taking ability of Botham followed by Hadlee

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlerCumulativeAvgWickets(frames,names)

rbcaw-1

Relative cumulative average economy rate of bowlers

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
names <- list("Kapil","Botham","Imran","Hadlee")
relativeBowlerCumulativeAvgEconRate(frames,names)

rbcer-1

Moving average of wickets over career

This plot shows that Hadlee has the best economy rate followed by Kapil

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./kapil2.csv","Kapil")
bowlerMovingAverage("./botham2.csv","Botham")
bowlerMovingAverage("./imran2.csv","Imran")
bowlerMovingAverage("./hadlee2.csv","Hadlee")

jmss-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./kapil2.csv","Kapil")
bowlerPerfForecast("./botham2.csv","Botham")
bowlerPerfForecast("./imran2.csv","Imran")
bowlerPerfForecast("./hadlee2.csv","Hadlee")

jjmss-pfcst-1

dev.off()
## null device 
##           1

Check bowler in-form, out-of-form

“**************************** Form status of Kapil ****************************\n\n
Population size: 198
Mean of population: 1.2 \n Sample size: 23 Mean of sample: 0.65 SD of sample: 0.83 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence\n interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 0.002097 is less than alpha= 0.05 \n

“**************************** Form status of Botham ****************************\n\n
Population size: 166
Mean of population: 1.58 \n Sample size: 19 Mean of sample: 1.47 SD of sample: 1.12 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence\n interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.336694 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 137
Mean of population: 1.23 \n Sample size: 16 Mean of sample: 0.81 SD of sample: 0.91 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence\n interval of population average\n\n
Imran ‘s Form Status: Out-of-Form because the p value: 0.041727 is less than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 100
Mean of population: 1.38 \n Sample size: 12 Mean of sample: 1.67 SD of sample: 1.37 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence\n interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.761265 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

Key findings

Here are some key conclusions ODI batsmen

  1. Kapil Dev’s strike rate stands high above the other 3
  2. Imran Khan has the best cumulative average runs followed by Kapil
  3. Botham is the most effective wicket taker followed by Hadlee
  4. Hadlee is the most economical bowler and is followed by Kapil Dev
  5. For a hypothetical Balls Faced and Minutes at creases Kapil will score the most runs followed by Botham
  6. The moving average of indicates that the best is yet to come for Imran and Hadlee. Kapil and Botham were on the decline

Also see my other posts in R

  1. A primer on Qubits, Quantum gates abd Quantum operations
  2. Deblurring with OpenCV:Weiner filter reloaded
  3. Designing a Social Web Portal
  4. A crime map of India in R – Crimes against women
  5. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  6. Mirror, mirror . the best batsman of them all?

For a full list of posts see Index of posts

Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

“We need to regard statistical intuition with proper suspicion and replace impression formation by computation wherever possible”

“We are pattern seekers, believers in a coherent world”

“The hot hand is entirely in the eyes of the beholders, who are consistently” “too quick to perceive order and causality in randomeness. The hot hand is a” “massive and widespread cognitive illusion”

                   "Thinking, Fast and Slow - Daniel Kahneman"

Introduction

Yorker (noun) :A yorker is a bowling delivery in cricket, that pitches at or around the batsman’s toes. Also known as ‘toe crusher’

My package ‘yorkr’ is now available on CRAN. This package is based on data from Cricsheet. Cricsheet has the data of ODIs, Test, Twenty20 and IPL matches as yaml files. The yorkr package provides functions to convert the yaml files to more easily R consumable entities, namely dataframes. In fact all ODI matches have already been converted and are available for use at yorkrData. However as future matches are added to Cricsheet, you will have to convert the match files yourself. More details below.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This post can be viewed at RPubs at yorkr-Part1 or can also be downloaded as a PDF document yorkr-1.pdf

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Important note 1: Do check out all the posts on the python avatar of yorkr, namely ‘yorkpy’ in my post ‘Pitching yorkpy … short of good length to IPL – Part 1

1. First things first

  1. yorkr currently has a total 70 functions as of now. I have intentionally avoided abbreviating function names by dropping vowels, as is the usual practice in coding, because the resulting abbreviated names created would be very difficult to remember, and use. So instead of naming a function as tmBmenPrtshpOppnAllMtches(), I have used the longer form for e.g. teamBatsmenPartnershipOppnAllmatches(), which is much clearer. The longer form will be more intuitive. Moreover RStudio prompts the the different functions which have the same prefix and one does not need to type in the entire function name.
  2. The package yorkr has 4 classes of functions
  • Class 1- Team performances in a match
  • Class 2- Team performances in all matches against a single oppostion (e.g. all matches of India vs Australia or all matches of England vs Pakistan etc.)
  • Class 3- Team performance in all matches against all Opposition (India vs All,Pakistan vs All etc.)
  • Class 4- Individual performances of batsmen and bowlers

In this post I will be looking into Class 1 functions, namely the performances of opposing teams in a single match

The list of functions are

  1. teamBattingScorecardMatch()
  2. teamBatsmenPartnershipMatch()
  3. teamBatsmenVsBowlersMatch()
  4. teamBowlingScorecardMatch()
  5. teamBowlingWicketKindMatch()
  6. teamBowlingWicketRunsMatch()
  7. teamBowlingWicketRunsMatch()
  8. teamBowlingWicketMatch()
  9. teamBowlersVsBatsmenMatch()
  10. matchWormGraph()

2. Install the package from CRAN

library(yorkr)
rm(list=ls())

3. Convert and save yaml file to dataframe

This function will convert a yaml file in the format as specified in Cricsheet to dataframe. This will be saved as as RData file in the target directory. The name of the file wil have the following format team1-team2-date.RData. This is seen below.

convertYaml2RDataframe("225171.yaml","./source","./data")
## [1] "./source/225171.yaml"
## [1] "first loop"
## [1] "second loop"
setwd("./data")
dir()
## [1] "Australia-India-2012-02-12.RData"      
## [2] "Bangladesh-Zimbabwe-2009-10-27.RData"  
## [3] "convertedFiles.txt"                    
## [4] "England-New Zealand-2007-01-30.RData"  
## [5] "Ireland-England-2006-06-13.RData"      
## [6] "Pakistan-South Africa-2013-11-08.RData"
## [7] "Sri Lanka-West Indies-2011-02-06.RData"
setwd("..")

4. Convert and save all yaml files to dataframes

This function will convert all yaml files from a source directory to dataframes and save it in the target directory with the names as mentioned above.

convertAllYaml2RDataframes("./source","./data")
## [1] 1
## i= 1   file= ./source/225171.yaml 
## [1] "first loop"
## [1] "second loop"
## [1] 633  25

5. yorkrData – A Github repositiory

Cricsheet has ODI matches from 2006. There are a total of 1167 ODI matches(files) out of which 34 yaml files had format problems and were skipped. Incidentally I have already converted the 1133 yaml files in the ODI directory of Cricsheet to dataframes and saved then as RData. The rest of the yaml files ave already been converted to RData and are available for use. All the converted RData files can be accessed from my Github link yorkrData under the folder ODI-matches. You will need to use the functions to convert new match files, as they are added to Cricsheet. There is aslo a file named ‘convertedFiles’ which will have the name of the original file and the converted file as below

convertedFiles

  • 225171.yaml:Ireland-England-2006-06-13.RData
  • 225245.yaml:England-Pakistan-2006-08-30.RData
  • 225246.yaml:England-Pakistan-2006-09-02.RData …

You can download the the zip of the files and use it directly in the functions as follows

Note 1: The package in its current form handles ODIs,T20s and IPL T20 matches

Note 2: The link to the converted data frames have been provided above. The dataframes are around 600 rows x 25 columns. In this post I have created 10 functions that analyze team performances in a match. However you are free to slice and dice the dataframe in any way you like. If you do come up with interesting analyses, please do attribute the source of the data to Cricsheet, and my package yorkr and my blog. I would appreciate it if you could send me a note. .

6. Load the match data as dataframes

As mentioned above in this post I will using the functions from Class 1. For this post I will be using the match data from 5 random matches between 10 different opposing teams/countries. For this I will directly use the converted RData files rather than getting the data through the getMatchDetails()

With the RData we can load the data in 2 ways

A. With getMatchDetails()

  1. With getMatchDetails() using the 2 teams and the date on which the match occured
aus_ind <- getMatchDetails("Australia","India","2012-02-12",dir="./data")

or

B.Directly load RData into your code.

The match details will be loaded into a dataframe called ’overs’ which you can assign to a suitable name as below

The randomly selected matches are

  • Australia vs India – 2012-02-12, Adelaide
  • England vs New Zealand – 2007-01-30, Perth
  • Pakistan vs South Africa – 2013-07-08, UAE
  • Sri Lanka vs West Indioes -2011-02-06, Colombo(SSC)
  • Bangladesh vs Zimbabwe -2009-10-27, Dhaka

Directly load RData from file

load("./data/Australia-India-2012-02-12.RData")
aus_ind <- overs
load("./data/England-New Zealand-2007-01-30.RData")
eng_nz <- overs
load("./data/Pakistan-South Africa-2013-11-08.RData")
pak_sa <- overs
load("./data/Sri Lanka-West Indies-2011-02-06.RData")
sl_wi<- overs
load("./data/Bangladesh-Zimbabwe-2009-10-27.RData")
ban_zim <- overs

7. Team batting scorecard

Compute and display the batting scorecard of the teams in the match. The top batsmen in are G Gambhir(Ind), PJ Forrest(Aus), Q De Kock(SA) and KC Sangakkara(SL)

teamBattingScorecardMatch(aus_ind,'India')
## Total= 258
## Source: local data frame [8 x 5]
## 
##     batsman ballsPlayed fours sixes  runs
##      (fctr)       (int) (dbl) (dbl) (dbl)
## 1 G Gambhir         110     7     0    92
## 2  V Sehwag          20     3     0    20
## 3   V Kohli          28     1     0    18
## 4 RG Sharma          41     1     1    33
## 5  SK Raina          30     3     1    38
## 6  MS Dhoni          57     0     1    44
## 7 RA Jadeja           8     0     0    12
## 8  R Ashwin           2     0     0     1
teamBattingScorecardMatch(aus_ind,'Australia')
## Total= 260
## Source: local data frame [9 x 5]
## 
##        batsman ballsPlayed fours sixes  runs
##         (fctr)       (int) (dbl) (dbl) (dbl)
## 1    DA Warner          23     2     0    18
## 2   RT Ponting          13     1     0     6
## 3    MJ Clarke          43     5     0    38
## 4   PJ Forrest          83     5     2    66
## 5    DJ Hussey          76     5     0    72
## 6 DT Christian          36     2     0    39
## 7      MS Wade          17     1     0    16
## 8    RJ Harris           2     0     0     2
## 9     CJ McKay           3     0     0     3
teamBattingScorecardMatch(pak_sa,'South Africa')
## Total= 256
## Source: local data frame [7 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (dbl) (dbl) (dbl)
## 1      Q de Kock         132     9     1   112
## 2        HM Amla          50     6     0    46
## 3   F du Plessis          21     1     0    10
## 4 AB de Villiers          40     2     0    30
## 5      DA Miller           9     0     0     5
## 6      JP Duminy          20     1     1    25
## 7      R McLaren          21     3     1    28
teamBattingScorecardMatch(sl_wi,'Sri Lanka')
## Total= 261
## Source: local data frame [10 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (dbl) (dbl) (dbl)
## 1       WU Tharanga          50     5     0    39
## 2        TM Dilshan          27     2     1    30
## 3     KC Sangakkara         103     4     1    75
## 4  DPMD Jayawardene          52     2     0    44
## 5     CK Kapugedera          17     0     0    17
## 6    TT Samaraweera           7     0     0     4
## 7       NLTC Perera           8     0     0     6
## 8        AD Mathews          22     1     1    36
## 9      HMRKB Herath           4     0     0     2
## 10       BAW Mendis           6     1     0     8

8. Plot the team batting partnerships

The functions below plot the team batting partnetship in the match Note: Many of the plots include an additional parameters plot which is either TRUE or FALSE. The default value is plot=TRUE. When plot=TRUE the plot will be displayed. When plot=FALSE the data frame will be returned to the user. The user can use this to create an interactive chary using one of th epackages like rcharts, ggvis,googleVis or plotly.

teamBatsmenPartnershipMatch(pak_sa,"Pakistan","South Africa")

batsmenPartnership-1

teamBatsmenPartnershipMatch(eng_nz,"New Zealand","England",plot=TRUE)

batsmenPartnership-2

teamBatsmenPartnershipMatch(ban_zim,"Bangladesh","Zimbabwe",plot=FALSE)
##              batsman        nonStriker runs
## 1        Tamim Iqbal   Junaid Siddique    0
## 2        Tamim Iqbal Mohammad Ashraful    5
## 3    Junaid Siddique       Tamim Iqbal    0
## 4  Mohammad Ashraful       Tamim Iqbal    0
## 5  Mohammad Ashraful     Raqibul Hasan   20
## 6      Raqibul Hasan Mohammad Ashraful   13
## 7      Raqibul Hasan   Shakib Al Hasan    3
## 8    Shakib Al Hasan     Raqibul Hasan   12
## 9    Shakib Al Hasan   Mushfiqur Rahim    1
## 10   Mushfiqur Rahim   Shakib Al Hasan    1
## 11   Mushfiqur Rahim       Naeem Islam   30
## 12   Mushfiqur Rahim      Abdur Razzak    6
## 13   Mushfiqur Rahim      Dolar Mahmud   11
## 14   Mushfiqur Rahim     Rubel Hossain    8
## 15       Mahmudullah   Mushfiqur Rahim    4
## 16       Naeem Islam   Mushfiqur Rahim   21
## 17      Abdur Razzak   Mushfiqur Rahim    3
## 18      Dolar Mahmud   Mushfiqur Rahim   41
teamBatsmenPartnershipMatch(aus_ind,"India","Australia", plot=TRUE)

batsmenPartnership-3

9. Batsmen vs Bowler

The function below computes and plots the performances of the batsmen vs the bowlers. As before the plot parameter can be set to TRUE or FALSE. By default it is plot=TRUE

teamBatsmenVsBowlersMatch(pak_sa,'Pakistan',"South Africa", plot=TRUE)

batsmenVsBowler-1

teamBatsmenVsBowlersMatch(aus_ind,'Australia',"India",plot=TRUE)

batsmenVsBowler-2

teamBatsmenVsBowlersMatch(ban_zim,'Zimbabwe',"Bangladesh", plot=TRUE)

batsmenVsBowler-3

m <- teamBatsmenVsBowlersMatch(sl_wi,'West Indies',"Sri Lanka", plot=FALSE)
m
## Source: local data frame [35 x 3]
## Groups: batsman [?]
## 
##      batsman        bowler runsConceded
##       (fctr)        (fctr)        (dbl)
## 1   CH Gayle  CRD Fernando            0
## 2   DM Bravo  CRD Fernando           15
## 3   DM Bravo   NLTC Perera           21
## 4   DM Bravo    AD Mathews           10
## 5   DM Bravo    BAW Mendis           11
## 6   DM Bravo CK Kapugedera            1
## 7   DM Bravo    TM Dilshan            5
## 8   DM Bravo  HMRKB Herath           16
## 9  AB Barath   NLTC Perera            0
## 10 RR Sarwan  CRD Fernando            6
## ..       ...           ...          ...

10. Bowling Scorecard

This function provides the bowling performance, the number of overs bowled, maidens, runs conceded and wickets taken for each match

teamBowlingScorecardMatch(eng_nz,'England')
## Source: local data frame [6 x 5]
## 
##           bowler overs maidens  runs wickets
##           (fctr) (int)   (int) (dbl)   (dbl)
## 1    LE Plunkett     9       0    54       3
## 2    CT Tremlett    10       0    72       1
## 3     A Flintoff    10       0    66       0
## 4     MS Panesar    10       2    35       2
## 5  JWM Dalrymple     5       0    43       0
## 6 PD Collingwood     6       0    36       1
teamBowlingScorecardMatch(eng_nz,'New Zealand')
## Source: local data frame [6 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1 JEC Franklin     8       1    45       1
## 2      SE Bond    10       0    58       1
## 3     JDP Oram     5       0    23       0
## 4     JS Patel    10       0    53       1
## 5   DL Vettori    10       0    40       3
## 6  CD McMillan     7       1    38       2
teamBowlingScorecardMatch(aus_ind,'Australia')
## Source: local data frame [6 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1    RJ Harris    10       0    57       1
## 2     MA Starc     8       0    49       0
## 3     CJ McKay    10       1    53       3
## 4 DT Christian    10       0    45       0
## 5    DJ Hussey     3       0    13       0
## 6   XJ Doherty     9       0    51       2

11. Wicket Kind

The plots below provide the bowling kind of wicket taken by the bowler (caught, bowled, lbw etc.)

teamBowlingWicketKindMatch(aus_ind,"India","Australia")

bowlingWicketKind-1

teamBowlingWicketKindMatch(aus_ind,"Australia","India")

bowlingWicketKind-2

teamBowlingWicketKindMatch(pak_sa,"South Africa","Pakistan")

bowlingWicketKind-3

m <-teamBowlingWicketKindMatch(sl_wi,"Sri Lanka",plot=FALSE)
m
##           bowler wicketKind wicketPlayerOut runs
## 1   CRD Fernando     bowled        CH Gayle   45
## 2    NLTC Perera     caught       AB Barath   36
## 3   HMRKB Herath        lbw       RR Sarwan   54
## 4     BAW Mendis     caught   S Chanderpaul   46
## 5    NLTC Perera        lbw        DM Bravo   36
## 6    NLTC Perera     caught       DJG Sammy   36
## 7   CRD Fernando     caught        DJ Bravo   45
## 8     BAW Mendis     caught       NO Miller   46
## 9     BAW Mendis     caught        CS Baugh   46
## 10    BAW Mendis     caught         SJ Benn   46
## 11    AD Mathews   noWicket        noWicket   33
## 12 CK Kapugedera   noWicket        noWicket    7
## 13    TM Dilshan   noWicket        noWicket   25

12. Wicket vs Runs conceded

The plots below provide the wickets taken and the runs conceded by the bowler in the match

teamBowlingWicketRunsMatch(pak_sa,"Pakistan","South Africa")

wicketRuns-1

teamBowlingWicketRunsMatch(aus_ind,"Australia","India")

wicketRuns-2

m <-teamBowlingWicketRunsMatch(sl_wi,"West Indies","Sri Lanka", plot=FALSE)
m
## Source: local data frame [6 x 5]
## 
##      bowler overs maidens  runs wickets
##      (fctr) (int)   (int) (dbl)   (chr)
## 1 R Rampaul     5       0    44       1
## 2 DJG Sammy    10       1    61       1
## 3  DJ Bravo    10       0    58       3
## 4  CH Gayle    10       0    34       0
## 5   SJ Benn    10       1    38       4
## 6 NO Miller     5       0    35       0

13. Wickets taken by bowler

The plots provide the wickets taken by the bowler

m <-teamBowlingWicketMatch(eng_nz,'England',"New Zealand", plot=FALSE)
m
##           bowler wicketKind wicketPlayerOut runs
## 1    LE Plunkett        lbw      SP Fleming   54
## 2    LE Plunkett     caught       PG Fulton   54
## 3 PD Collingwood     caught     LRPL Taylor   36
## 4     MS Panesar    stumped     CD McMillan   35
## 5    LE Plunkett     caught       L Vincent   54
## 6     MS Panesar     caught     BB McCullum   35
## 7    CT Tremlett     caught    JEC Franklin   72
## 8     A Flintoff   noWicket        noWicket   66
## 9  JWM Dalrymple   noWicket        noWicket   43
teamBowlingWicketMatch(sl_wi,"Sri Lanka","West Indies")

bowlingWickets-1

teamBowlingWicketMatch(eng_nz,"New Zealand","England")

bowlingWickets-2

14. Bowler Vs Batsmen

The functions compute and display how the different bowlers of the country performed against the batting opposition.

teamBowlersVsBatsmenMatch(ban_zim,"Bangladesh","Zimbabwe")

bowlerVsBatsmen-1

teamBowlersVsBatsmenMatch(aus_ind,"India","Australia")

bowlerVsBatsmen-2

teamBowlersVsBatsmenMatch(eng_nz,"England","New Zealand")

bowlerVsBatsmen-3

m <- teamBowlersVsBatsmenMatch(pak_sa,"Pakistan",plot=FALSE)
m
## Source: local data frame [30 x 3]
## Groups: bowler [?]
## 
##            bowler        batsman runsConceded
##            (fctr)         (fctr)        (dbl)
## 1  Mohammad Irfan      Q de Kock           25
## 2  Mohammad Irfan        HM Amla           17
## 3  Mohammad Irfan   F du Plessis            0
## 4  Mohammad Irfan AB de Villiers            9
## 5   Sohail Tanvir      Q de Kock           11
## 6   Sohail Tanvir        HM Amla            6
## 7   Sohail Tanvir      JP Duminy            9
## 8   Sohail Tanvir      R McLaren           12
## 9     Junaid Khan      Q de Kock           24
## 10    Junaid Khan        HM Amla            6
## ..            ...            ...          ...

15. Match worm graph

The plots below provide the match worm graph for the matches

matchWormGraph(aus_ind,'Australia',"India")

matchWorm-1

matchWormGraph(sl_wi,'Sri Lanka',"West Indies")

matchWorm-2

Conclusion

This post included all functions between 2 opposing countries from the package yorkr.As mentioned above the yaml match files have been already converted to dataframes and are available for download from Github. Go ahead and give it a try

To be continued. Watch this space!

Important note: Do check out my other posts using yorkr at yorkr-posts

 

You may also like

Cricket analytics with cricketr in paperback and Kindle versions

Untitled

My book “Cricket analytics with cricketr” is now available in paperback and Kindle versions. The paperback is available from Amazon (US, UK and Europe) for $ 48.99. The Kindle version can be downloaded from the Kindle store for $2.50 (Rs 169/-). Do pick your copy. It should be a good read for a Sunday afternoon.

This book of mine contains my posts based on my R package ‘cricketr’ now in CRAN. The package cricketr can analyze both batsmen and bowlers for all formats of the game Test, ODI and Twenty20. The package uses the data from ESPN Cricinfo. The analyses include runs frequency charts, performances of batsmen and bowlers in different grounds and against different teams, moving  average of  runs/wickets over the career, mean strike rate, mean economy rate and so on.

The book includes the following chapters based on my R package cricketr  There are 2 additional articles where I use Machine Learning with the package Octave.

CONTENTS
Cricket Analytics with cricketr 11
1.1. Introducing cricketr! : An R package to analyze performances of cricketers 11
1.2. Taking cricketr for a spin – Part 1 49
1.2. cricketr digs the Ashes! 70
1.3. cricketr plays the ODIs! 99
1.4. cricketr adapts to the Twenty20 International! 141
1.5. Sixer – R package cricketr’s new Shiny avatar 170
2. Other cricket posts in R 180
2.1. Analyzing cricket’s batting legends – Through the mirage with R 180
2.2. Mirror, mirror … the best batsman of them all? 206
3. Appendix 220
Cricket analysis with Machine Learning using Octave 220
3.1. Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid 221
3.2. Informed choices through Machine Learning-2 Pitting together Kumble, Kapil, Chandra 234
Further reading 248
Important Links 249

I do hope you have a great time reading it. Do pick up your copy. Feel free to get in touch with me with your comments and suggestions.  I have more interesting things lined up for the future.

Watch this space!

You may also like
1. Literacy in India : A deepR dive.
2. Natural Language Processing: What would Shakespeare say?
3. Revisiting crimes against women in India
4. Experiments with deblurring using OpenCV
5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
6. Bend it like Bluemix, MongoDB with autoscaling – Part 1
7. “Is it animal? Is it an insect?” in Android