# The Clash of the Titans in Test and ODI cricket

Who looks outside, dreams; who looks inside, awakes.
Show me a sane man and I will cure him for you.

            Carl Jung 

We’re made of star stuff. We are a way for the cosmos to know itself.
If you want to make an apple pie from scratch, you must first create the universe.

            Carl Sagan

## Introduction

The biggest nag in the collective psyche of cricketing fraternity these days, is whether Virat Kohli has surpassed Sachin Tendulkar. This question has been troubling cricket lovers the world over and particularly in India, for quite a while. This nagging question has only grown stronger with Kohli’s 41st ODI century and with Michael Vaughan bestowing the GOAT title to Virat Kohli for ODI cricket. Hence, I decided to do my bit in addressing this, by doing analysis of Kohli’s and Tendulkar’s performance in ODI cricket. I also wanted to address the the best among the cricketing idols of India in Test cricket, namely Sunil Gavaskar, Sachin Tendulkar and Virat Kohli. Hence this post has 2 parts

1. Analysis of Tendulkar, Gavaskar and Kohli in Test cricket
2. Analysis of Tendulkar and Kohli in ODIs

In this post, I analyze the performances of these titans in Test and ODI cricket using my R package cricketr. While some may feel that comparisons are not possible as these batsmen are from different eras. To some extent this is true. I would give some leeway to Gavaskar as he had to bat in a pre-helmet era. But with Tendulkar and Kohli a fair and objective comparison is possible. There were pre-eminient bowlers in the times of Tendulkar as there are now.

From the analysis below, it can be seen that Tendulkar is ahead  of everybody else in Test cricket. However it must be noted that Tendulkar’s performance deteriorated towards the end of his career. Such was not the case with Gavaskar. Kohli has some catching up to do and he still has a lot of Test cricket in him.

In ODI Kohli can be seen to pulling ahead of Tendulkar in several aspects.

My R package cricketr can be installed directly from CRAN and you can use it analyze cricketers.

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Note 1: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr templatefrom Github (which is the R Markdown file I have used for the analysis below).

Note 2: I sprinkle the charts with my observations. Feel free to look at them more closely and come to your conclusions.

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post Introducing cricpy:A python package to analyze performances of cricketers

### 1 Load the cricketr package

if (!require("cricketr")){
install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

## A Test cricket  – Analysis of Gavaskar, Tendulkar and Kohli

### 2. Get player data

tendulkar <- getPlayerData(35320,dir=".",file="tendulkar.csv",type="batting")
kohli <- getPlayerData(253802,dir=".",file="kohli.csv",type="batting")
gavaskar <- getPlayerData(28794,dir=".",file="gavaskar.csv",type="batting")

### 3a. Basic analyses for Tendulkar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()

### 3b Basic analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohli.csv","Kohli")
batsmanMeanStrikeRate("./kohli.csv","Kohli")
batsmanRunsRanges("./kohli.csv","Kohli")
dev.off()

### 3c Basic analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsRanges("./gavaskar.csv","Gavaskar")
dev.off()

### 4a.More analyses for Tendulkar

It can be seen that Tendulkar and Gavaskar has been bowled more often than Kohli. Also Kohli does not have as many sixes in Test cricket as Tendulkar and Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")
dev.off()

### 4b. More analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanDismissals("./kohli.csv","Kohli")
dev.off()

### 4c More analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanDismissals("./gavaskar.csv","Gavaskar")
dev.off()

### 5 Performance of batsmen on different grounds

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsGround("./kohli.csv","Kohli")
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")


a

#dev.off()

### 6. Performance if batsmen against different Opposition

1. Tendulkar averages 50 against the following countries – Australia, Bangladesh, England, Sri Lanka, West Indies and Zimbabwe
2. Kohli average almost 50 against all the nations he has played – Australia, Bangladesh, England, New Zealand, Sri Lanka and West Indies
3. Gavaskar averages 50 against Australia, Pakistan, West Indies, Sri Lanka
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohli.csv","Kohli")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")

### 7. Get player data special

This is required for the next 2 function calls

tendulkarsp <- getPlayerDataSp(35320,tdir=".",tfile="tendulkarsp.csv",ttype="batting")
kohlisp <- getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")

#dev.off()

### 8 Get contribution of batsmen in matches won and lost

Kohli contribution has had an equal contribution in won and lost matches. Tendulkar’s runs seem to have not helped in winning as much as only 50% of matches he has played have been won

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("./kohlisp.csv","Kohli")


a

### 9 Performance of batsmen at home and overseas

The boxplots show that Kohli performs better overseas than at home. The 3rd quartile is higher, though the median seems to lower overseas. For Tendulkar the performance is similar both ways. Gavaskar’s median runs scored overseas is higher.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("./kohlisp.csv","Kohli")



### 10. Moving average of runs

Gavaskar’s moving average was very good at the time of his retirement. Kohli seems to be going very strong. Tendulkar’s performance shows signs of deterioration around the time of his retirement.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./kohli.csv","Kohli")

#dev.off()

### 11 Boxplot and histogram of runs

Kohli has a marginally higher average (50.69) than Tendulkar (48.65) while Gavaskar 46. The median runs are same for Tendulkar and Kohli at 32

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfBoxHist("./kohli.csv","Kohli")
batsmanPerfBoxHist("./gavaskar.csv","Gavaskar")

### 12 Cumulative average Runs for batsmen

Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohli.csv","Kohli")
batsmanCumulativeAverageRuns("./gavaskar.csv","Gavaskar")

### 13. Cumulative average strike rate of batsmen

Tendulkar’s strike rate is better than Kohli and Gavaskar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohli.csv","Kohli")
batsmanCumulativeStrikeRate("./gavaskar.csv","Gavaskar")

### 14 Performance forecast of batsmen

The forecasted performance for Kohli and Gavaskar is higher than that of Tendulkar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kohli.csv","Kohli")

#dev.off()

### 15. Relative strike rate of batsmen

par(mar=c(4,4,2,2))

relativeBatsmanSR(frames,names)
#dev.off()



### 16. Relative Runs frequency of batsmen

par(mar=c(4,4,2,2))
relativeRunsFreqPerf(frames,names)
#dev.off()


### 17. Relative cumulative average runs of batsmen

Tendulkar leads the way here, but it can be seem Kohli catching up.

par(mar=c(4,4,2,2))
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()


### 18. Relative cumulative average strike rate

Tendulkar has better strike rate than the other two.

par(mar=c(4,4,2,2))
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()


### 19. Check batsman in form

As in the moving average and performance forecast and cumulative average runs, Kohli and Gavaskar are in-form while Tendulkar was out-of-form towards the end.

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## [1] "**************************** Form status of Sachin Tendulkar ****************************
\n\n Population size: 294  Mean of population: 50.48 \n Sample size: 33  Mean of sample: 32.42 SD of
sample: 29.8 \n\n Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval
of population average\n Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below
the 95% confidence interval of population average\n\n
Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 117
Mean of population: 50.35 \n Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n Null
hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population
average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## [1] "**************************** Form status of Gavaskar ****************************\n\n
Population size: 125  Mean of population: 44.67 \n Sample size: 14  Mean of sample: 57.86 SD of sample:
58.55 \n\n Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval of population
average\n Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence interval of
population average\n\n Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater
than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

### 20. Performance 3D

A 3D regression plane is fitted between the the Balls faced, Minutes at crease and Runs scored

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Sachin Tendulkar")
battingPerf3d("./kohli.csv","Kohli")
#dev.off()

### 20. Runs likelihood

This functions computes the K-Means and determines the runs the batsmen are likely to score.

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
##
## There is a 16.51 % likelihood that Tendulkar  will make  139 Runs in  251 balls over 353  Minutes
## There is a 25.08 % likelihood that Tendulkar  will make  66 Runs in  122 balls over  167  Minutes
## There is a 58.41 % likelihood that Tendulkar  will make  16 Runs in  31 balls over 44  Minutes
batsmanRunsLikelihood("./kohli.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 20 % likelihood that Kohli  will make  143 Runs in  232 balls over 330  Minutes
## There is a 33.85 % likelihood that Kohli  will make  51 Runs in  92 balls over  127  Minutes
## There is a 46.15 % likelihood that Kohli  will make  11 Runs in  24 balls over 31  Minutes
batsmanRunsLikelihood("./gavaskar.csv","Gavaskar")
## Summary of  Gavaskar 's runs scoring likelihood
## **************************************************
##
## There is a 33.81 % likelihood that Gavaskar  will make  69 Runs in  159 balls over 214  Minutes
## There is a 8.63 % likelihood that Gavaskar  will make  172 Runs in  364 balls over  506  Minutes
## There is a 57.55 % likelihood that Gavaskar  will make  13 Runs in  35 balls over 48  Minutes

### 21. Predict runs for a random combination of Balls faced and runs scored

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
batsmen <-cbind(round(tendulkar$Runs),round(kohli$Runs),round(gavaskar$Runs)) colnames(batsmen) <- c("Tendulkar","Kohli","Gavaskar") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Tendulkar Kohli Gavaskar ## 1 10 30 7 6 4 ## 2 38 71 23 24 17 ## 3 66 111 39 42 30 ## 4 94 152 54 60 43 ## 5 121 193 70 78 56 ## 6 149 234 86 96 69 ## 7 177 274 102 114 82 ## 8 205 315 118 132 95 ## 9 233 356 134 150 108 ## 10 261 396 150 168 121 ## 11 289 437 165 186 134 ## 12 316 478 181 204 147 ## 13 344 519 197 222 160 ## 14 372 559 213 240 173 ## 15 400 600 229 258 186 #dev.off() ## Key findings 1. Kohli has a marginally higher average than Tendulkar 2. Tendulkar has the best strike rate of all the 3. 3. The cumulative average runs and the performance forecast for Kohli and Gavaskar show an improving trend, while Tendulkar’s numbers deteriorate towards the end of his career 4. Kohli is fast catching up Tendulkar on cumulative average runs vs innings in career. ## B ODI Cricket – Analysis of Tendulkar and Kohli The functions below get the ODI data for Tendulkar and Kohli as CSV files so that the analyses can be done ### 22 Get player data for ODIs tendulkarOD <- getPlayerDataOD(35320,dir=".",file="tendulkarOD.csv",type="batting") kohliOD <- getPlayerDataOD(253802,dir=".",file="kohliOD.csv",type="batting") #dev.off() ### 23a Basic performance of Tendulkar in ODI par(mfrow=c(3,2)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("./tendulkarOD.csv","Tendulkar") batsmanRunsRanges("./tendulkarOD.csv","Tendulkar") batsman4s("./tendulkarOD.csv","Tendulkar") batsman6s("./tendulkarOD.csv","Tendulkar") batsmanScoringRateODTT("./tendulkarOD.csv","Tendulkar") #dev.off() ### 23b. Basic performance of Kohli in ODI par(mfrow=c(3,2)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("./kohliOD.csv","Kohli") batsmanRunsRanges("./kohliOD.csv","Kohli") batsman4s("./kohliOD.csv","Kohli") batsman6s("./kohliOD.csv","Kohli") batsmanScoringRateODTT("./kohliOD.csv","Kohli") #dev.off() ### 24. Performance forecast in ODIs Kohli’s forecasted runs are much higher than Tendulkar’s in ODIs par(mar=c(4,4,2,2)) batsmanPerfForecast("./tendulkarOD.csv","Tendulkar") batsmanPerfForecast("./kohliOD.csv","Kohli") ### 25. Batting performance A 3D regression plane is fitted between Balls faced, Minutes at crease and Runs scored. par(mar=c(4,4,2,2)) battingPerf3d("./tendulkarOD.csv","Tendulkar") battingPerf3d("./kohliOD.csv","Kohli") ### 26. Predicting runs scored for the ODI batsmen Kohli will score runs than Tendulkar for the same minutes at crease and balls faced. BF <- seq( 10, 200,length=10) Mins <- seq(30,220,length=10) newDF <- data.frame(BF,Mins) tendulkarDF <- batsmanRunsPredict("./tendulkarOD.csv","Tendulkar",newdataframe=newDF) kohliDF <- batsmanRunsPredict("./kohliOD.csv","Kohli",newdataframe=newDF) batsmen <-cbind(round(tendulkarDF$Runs),round(kohliDF$Runs)) colnames(batsmen) <- c("Tendulkar","Kohli") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Tendulkar Kohli ## 1 10 30 7 8 ## 2 31 51 26 28 ## 3 52 72 45 48 ## 4 73 93 64 68 ## 5 94 114 83 88 ## 6 116 136 102 108 ## 7 137 157 121 128 ## 8 158 178 140 149 ## 9 179 199 159 169 ## 10 200 220 178 189 ### 27. Runs likelihood for the ODI batsmen Tendulkar has clusters around 13, 53 and 111 runs while Kohli has clusters around 13, 63,116. So it more likely that Kohli will tend to score higher par(mar=c(4,4,2,2)) batsmanRunsLikelihood("./tendulkarOD.csv","Tendulkar") ## Summary of Tendulkar 's runs scoring likelihood ## ************************************************** ## ## There is a 18.09 % likelihood that Tendulkar will make 111 Runs in 118 balls over 172 Minutes ## There is a 28.39 % likelihood that Tendulkar will make 53 Runs in 63 balls over 95 Minutes ## There is a 53.52 % likelihood that Tendulkar will make 13 Runs in 18 balls over 27 Minutes batsmanRunsLikelihood("./kohliOD.csv","Kohli") ## Summary of Kohli 's runs scoring likelihood ## ************************************************** ## ## There is a 31.41 % likelihood that Kohli will make 63 Runs in 69 balls over 97 Minutes ## There is a 49.74 % likelihood that Kohli will make 13 Runs in 18 balls over 24 Minutes ## There is a 18.85 % likelihood that Kohli will make 116 Runs in 113 balls over 163 Minutes ### 28. Runs in different venues for the ODI batsmen par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./tendulkarOD.csv","Tendulkar") batsmanAvgRunsGround("./kohliOD.csv","Kohli") ### 28. Runs against different opposition for the ODI batsmen Tendulkar’s has 50+ average against Bermuda, Kenya and Namibia. While Kohli has a 50+ average against New Zealand, West Indies, South Africa, Zimbabwe and Bangladesh par(mar=c(4,4,2,2)) batsmanAvgRunsOpposition("./tendulkarOD.csv","Tendulkar") batsmanAvgRunsOpposition("./kohliOD.csv","Kohli") ### 29. Moving average of runs for the ODI batsmen Tendulkar’s moving average shows an improvement (50+) towards the end of his career, but Kohli shows a marked increase 60+ currently par(mar=c(4,4,2,2)) batsmanMovingAverage("./tendulkarOD.csv","Tendulkar") batsmanMovingAverage("./kohliOD.csv","Kohli") ### 30. Cumulative average runs of ODI batsmen Tendulkar plateaus at 40+ while Kohli’s cumulative average runs goes up and up!!! par(mar=c(4,4,2,2)) batsmanCumulativeAverageRuns("./tendulkarOD.csv","Tendulkar") batsmanCumulativeAverageRuns("./kohliOD.csv","Kohli") ### 31 Cumulative strike rate of ODI batsmen par(mar=c(4,4,2,2)) batsmanCumulativeStrikeRate("./tendulkarOD.csv","Tendulkar") batsmanCumulativeStrikeRate("./kohliOD.csv","Kohli") ### 32. Relative batsmen strike rate par(mar=c(4,4,2,2)) frames <- list("./tendulkarOD.csv","./kohliOD.csv") names <- list("Tendulkar","Kohli") relativeBatsmanSRODTT(frames,names) #dev.off()  ### 33. Relative Run Frequency percentages par(mar=c(4,4,2,2)) frames <- list("./tendulkarOD.csv","./kohliOD.csv") names <- list("Tendulkar","Kohli") relativeRunsFreqPerfODTT(frames,names) #dev.off()  ### 34. Relative cumulative average runs of ODI batsmen Kohli breaks away from Tendulkar in cumulative average runs after 100 innings par(mar=c(4,4,2,2)) frames <- list("./tendulkarOD.csv","./kohliOD.csv") names <- list("Tendulkar","Kohli") relativeBatsmanCumulativeAvgRuns(frames,names) #dev.off()  ### 35. Relative cumulative strike rate of ODI batsmen This seems to be tussle with Kohli having an edge till about 40 innings and then from 40+ to 180 innings Tendulkar leads. Kohli just seems to be edging forward. par(mar=c(4,4,2,2)) frames <- list("./tendulkarOD.csv","./kohliOD.csv") names <- list("Tendulkar","Kohli") relativeBatsmanCumulativeStrikeRate(frames,names) #dev.off()  ### 36. Batsmen 4s and 6s par(mar=c(4,4,2,2)) frames <- list("./tendulkarOD.csv","./kohliOD.csv") names <- list("Tendulkar","Kohli") batsman4s6s(frames,names) ## Tendulkar Kohli ## Runs(1s,2s,3s) 66.29 69.67 ## 4s 29.65 25.90 ## 6s 4.06 4.43 #dev.off() ### 37. Check ODI batsmen form par(mar=c(4,4,2,2)) checkBatsmanInForm("./tendulkar.csv","Tendulkar") ## [1] "**************************** Form status of Tendulkar ******** ********************\n\n Population size: 294 Mean of population: 50.48 \n Sample size: 33 Mean of sample: 32.42 SD of sample: 29.8 \n\n Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Tendulkar 's sample average is below the 95% confidence interval of population average\n\n Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("./kohli.csv","Kohli") ## [1] "**************************** Form status of Kohli *********** *****************\n\n Population size: 117 Mean of population: 50.35 \n Sample size: 13 Mean of sample: 53.77 SD of sample: 46.15 \n\n Null hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" #dev.off() ## Key Findings 1. Kohli has a better performance against oppositions like West Indies, South Africa and New Zealand 2. Kohli breaks away from Tendulkar in cumulative average runs 3. Tendulkar has been leading the strike rate rate but Kohli in recent times seems to be breaking loose. Check out some other players with my R package cricketr Important note: Do check out my other posts using cricketr at cricketr-posts Also see To see all posts click Index of posts # Pitching yorkpy … in the block hole – Part 4 A good programmer is someone who always looks both ways before crossing a one-way street. Doug Linder There are two ways to write error-free programs; only the third one works. Alan J. Perlis In order to understand recursion, one must first understand recursion. Anonymous This is the fourth and final part of my Python package yorkpy. In this part yorkpy, the python avatar of my R package yorkr see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!, develops wings and is prepared for take-off. The yorkpy package uses data from Cricsheet You can clone/download the code at Github yorkpy This post has been published to RPubs at yorkpy-Part4 You can download this post as PDF at IPLT20-yorkpy-part4 You can download all the data used in this post and the previous post at yorkpyData This post is a continuation of the earlier posts on yorkpy 1. Pitching yorkpy . short of good length to IPL – Part 1 In this part I included functions that convert the yaml data of IPL matches into Pandas dataframe which are then saved as CSV. This part can perform analysis of individual IPL matches. Note The converted data is available at yorkpyData 2. Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 This part included functions to create a large data frame for head-to-head confrontation between any 2IPL teams says CSK-MI, DD-KKR etc, which can be saved as CSV. Analysis is then performed on these team-2-team confrontations. Note The converted data is available at yorkpyData 3. Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 The 3rd part includes the performance of any IPL team against all other IPL teams. The data can also be saved as CSV.Note The converted data is available at yorkpyData Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below). This 4th and final part includes analysis of batting and bowling performances of any IPL player. The batting and bowling details for all teams have already been converted and are available at IPLT20-Batting-BowlingDetails This part includes the following new functions #### Batsman functions 1. batsmanRunsVsDeliveries 2. batsmanFoursSixes 3. batsmanDismissals 4. batsmanRunsVsStrikeRate 5. batsmanMovingAverage 6. batsmanCumulativeAverageRuns 7. batsmanCumulativeStrikeRate 8. batsmanRunsAgainstOpposition 9. batsmanRunsVenue #### Bowler functions 1. bowlerMeanEconomyRate 2. bowlerMeanRunsConceded 3. bowlerMovingAverage 4. bowlerCumulativeAvgWickets 5. bowlerCumulativeAvgEconRate 6. bowlerWicketPlot 7. bowlerWicketsAgainstOpposition 8. bowlerWicketsVenue # A. Batsman functions ### 1. Get IPL Team Batting details The function below gets the overall IPL team batting details based on the CSV files that were saved for IPL T20 matches. This is currently also available in Github at yorkpyData. The batting details of the IPL team in each match is created and a huge data frame is created by combining the batting details from each match. This can be saved as a csv file with name as for e.g. Delhi Daredevils-BattingDetails.csv. dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" #csk_details = yka.getTeamBattingDetails("Chennai Super Kings",dir=dir1, save=True) #dd_details = yka.getTeamBattingDetails("Delhi Daredevils",dir=dir1,save=True) #kkr_details = yka.getTeamBattingDetails("Kolkata Knight Riders",dir=dir1,save=True) ### 2. Get IPL batsman details This function is used to get the individual IPL T20 batting record for a the specified batsman of the team as in the functions below. For the batsmen functions below I have chosen Rishabh Pant, Kane Williamson and Ambati Rayudu for the analysis as they top the batting lists. You can choose any IPL batsmen for the analysis import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' rpant=yka.getBatsmanDetails(team,name,dir=dir1) ### 3 Batsman Runs vs Deliveries (in IPL matches) This functions plots the runs vs deliveries faced for batsman import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name)  # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name) ### 4. Batsman fours and sixes (in IPL matches) This plots the fours, sixes and the total runs for a batsman import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) ### 5. Batsman dismissals (in IPL matches) import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) ### 6. Batsman Runs vs Strike Rate (in IPL matches) The plots below give the Runs vs Strike rate for batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) ### 7. Batsman Moving average of runs (in IPL matches) The plots below compute and plot the moving average of batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) ### 8. Batsman Cumulative average of runs (in IPL matches) The functions below plot the cumulative average of the batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) ### 9. Batsman Cumulative Strike Rate (in IPL matches) The functions below plot the cumulative strike rate of the batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) ### 10. Batsman performance against opposition (in IPL matches) The plots below show how the batsmen performed against other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) ### 11. Batsman performance at different venues (in IPL matches) The plots below show how the batsmen performed at different venues import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) ## B. Bowler functions ### 12. Get bowling details in IPL matches The function below gets the overall team IPL T20 bowling details based on the RData file available in IPL T20 matches. This is currently also available in Github at yorkpyData. The IPL T20 bowling details of the IPL team in each match is created, and a huge data frame is created by stacking the individual dataframes. This can be saved as a CSV file for e.g. Chennai Super Kings-BowlingDetails.csv dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" #kkr_bowling = yka.getTeamBowlingDetails("Kolkata Knight Riders",dir=dir1,save=True) #csk_bowling = yka.getTeamBowlingDetails("Chennai Super Kings",dir=dir1,save=True) #kxip_bowling = yka.getTeamBowlingDetails("Kings XI Punjab",dir=dir1,save=True) ### 13. Get bowling details of the individual IPL bowlers This function is used to get the individual bowling record for a specified bowler of the country as in the functions below. The plots below deal with bowler’s performance. For this analysis I have chosen Amit Mishra, Piyush Chawla and Bhuvaneshwar Kumar for the analysis. You can chose any other IPL bowler import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' #df=yka.getBowlerWicketDetails(team,name,dir=dir1) ### 14. Bowler Economy Rate (in IPL matches) The plots below show the economy rate of the selected bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) ### 15. Bowler Mean Runs conceded (in IPL matches) The plots below show the mean runs conceded by the selected bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) ### 16. Moving average of wickets for bowler (in IPL matches) The moving average of the bowlers are plotted below import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) ### 17. Cumulative average wickets for bowler (in IPL matches) The cumulative average wickets for each bowler is computed and plotted import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) ### 18. Cumulative average economy rate for bowler (in IPL matches) The plots below give the cumulative average economy rate for each bowler import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) ### 19. Bowler wicket plot (in IPL matches) The plots below give the over vs wickets for bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) ### 20. Bowler wicket against opposition (in IPL matches) The performance of the bowlers against different IPL teams is shown below import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) ### 21. Bowler wicket in different venues (in IPL matches) The plots below show how the bowlers perform at different venues import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) Note:You can clone/download the code at Github yorkpy Important note: Do check out my other posts using yorkpy at yorkpy-posts Conclusion: This concludes the python package yorkpy. Go ahead and give yorkpy a spin! To see all posts click Index of posts # Pitching yorkpy…swinging away from the leg stump to IPL – Part 3 Clocks offer at best a convenient fiction They imply that time ticks steadily, predictably forward, when our experience shows that it often does the opposite: it stretches and compresses, skips a beat and doubles back.  David Eagleman  Memory is the space in which a thing happens for a second time  Paul Auster  ## Introduction In this 3rd post, yorkpy, the python avatar of my R package yorkr develops more muscle. The first two posts of yorkpy were 1. Pitching yorkpy . short of good length to IPL – Part 1 This post dealt with function which perform analytics on an IPL match between any 2 IPL teams 2. Pitching yorkpy…on the middle and outside off-stump to IPL – Part 2 The second post dealt with analytics on all matches between any 2 IPL teams. This third post deals with analyses and analytics of an IPL team in all matches against all other IPL teams. The data for yorkpy comes from Cricsheet. The data in Cricsheet are in the form of yaml files. These files have already been converted as dataframes and stored as CSV as seen in the earlier posts.You can download all the data used in this post and the previous post at yorkpyData The signatures of yorkpy and yorkr are identical and will work in almost the same way. However there may be some unique functions in yorkr & yorkpy, based on what my thought process was on that day! -You can clone/download the code at Github yorkpy -This post has been published to RPubs at yorkpy-Part3 -Download this post as PDF at IPLT20-yorkpy-part3 -You can download all the data used in this post and the previous post at yorkpyData Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below). The IPL T20 functions in yorkpy are shown below ## 2. Get data for all T20 matches between an IPL team and all other IPL teams We can get all IPL T20 matches between an IPL team and all other teams using the function below. The dir parameter should point to the folder which has the IPL T20 csv files of the individual matches (see Pitching yorkpy…short of good length to IPL-Part 1). This function creates a data frame of all the IPL T20 matches between the IPL team and all other teams and and also saves the dataframe as CSV file if save=True. If save=False the dataframe is just returned and not saved. import pandas as pd import os import yorkpy.analytics as yka #dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted" #getAllMatchesAllOpposition("Kolkata Knight Riders",dir=dir1,save=True) ## 3. Save data for all matches between an IPL team and all oppositions This can be done locally using the function below. You could use this function to get combine all IPL matches of an IPL team against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka #dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted" #saveAllMatchesAllOppositionIPLT20(dir1) Note: In the functions below, I have randomly chosen an IPL team for the analyses. You are free to choose any IPL team for your analysis ### 4.Team Batsmen partnership in Twenty20 (all matches against all IPL teams – summary) The function below computes the highest partnerships for an IPL team against all other IPL teams for e.g. the batsmen with the highest partnership from Chennai Super Kings in all matches against all other IPL teams. Any other IPL team could have also been chosen. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv") csk_matches = pd.read_csv(path) m=yka.teamBatsmenPartnershiAllOppnAllMatches(csk_matches,'Chennai Super Kings',report="summary") print(m) ## batsman totalPartnershipRuns ## 42 SK Raina 3699 ## 28 MS Dhoni 2986 ## 25 MEK Hussey 1768 ## 24 M Vijay 1600 ## 36 S Badrinath 1441 ## 5. Team Batsmen partnership in Twenty20 (all matches against all IPL teams -detailed) The function below gives the detailed breakup of partnerships for Mumbai Indian against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv") mi_matches = pd.read_csv(path) theTeam='Mumbai Indians' m=yka.teamBatsmenPartnershiAllOppnAllMatches(mi_matches,theTeam,report="detailed", top=3) print(m) ## batsman totalPartnershipRuns non_striker partnershipRuns ## 0 RG Sharma 3037.0 A Symonds 142.0 ## 1 RG Sharma 3037.0 AC Blizzard 5.0 ## 2 RG Sharma 3037.0 AJ Finch 2.0 ## 3 RG Sharma 3037.0 AP Tare 32.0 ## 4 RG Sharma 3037.0 AT Rayudu 566.0 ## 5 RG Sharma 3037.0 BR Dunk 1.0 ## 6 RG Sharma 3037.0 CJ Anderson 183.0 ## 7 RG Sharma 3037.0 CM Gautam 22.0 ## 8 RG Sharma 3037.0 DR Smith 50.0 ## 9 RG Sharma 3037.0 GJ Maxwell 6.0 ## 10 RG Sharma 3037.0 HH Gibbs 109.0 ## 11 RG Sharma 3037.0 HH Pandya 105.0 ## 12 RG Sharma 3037.0 Harbhajan Singh 86.0 ## 13 RG Sharma 3037.0 JC Buttler 105.0 ## 14 RG Sharma 3037.0 JEC Franklin 50.0 ## 15 RG Sharma 3037.0 KA Pollard 633.0 ## 16 RG Sharma 3037.0 KD Karthik 170.0 ## 17 RG Sharma 3037.0 KH Pandya 34.0 ## 18 RG Sharma 3037.0 KV Sharma 33.0 ## 19 RG Sharma 3037.0 LMP Simmons 172.0 ## 20 RG Sharma 3037.0 MEK Hussey 21.0 ## 21 RG Sharma 3037.0 MJ Guptill 61.0 ## 22 RG Sharma 3037.0 MJ McClenaghan 2.0 ## 23 RG Sharma 3037.0 N Rana 25.0 ## 24 RG Sharma 3037.0 PA Patel 103.0 ## 25 RG Sharma 3037.0 RE Levi 25.0 ## 26 RG Sharma 3037.0 SL Malinga 0.0 ## 27 RG Sharma 3037.0 SR Tendulkar 208.0 ## 28 RG Sharma 3037.0 SS Tiwary 27.0 ## 29 RG Sharma 3037.0 TL Suman 7.0 ## .. ... ... ... ... ## 70 KA Pollard 2344.0 CJ Anderson 82.0 ## 71 KA Pollard 2344.0 CM Gautam 16.0 ## 72 KA Pollard 2344.0 DR Smith 10.0 ## 73 KA Pollard 2344.0 DS Kulkarni 15.0 ## 74 KA Pollard 2344.0 HH Pandya 158.0 ## 75 KA Pollard 2344.0 Harbhajan Singh 158.0 ## 76 KA Pollard 2344.0 J Suchith 26.0 ## 77 KA Pollard 2344.0 JC Buttler 37.0 ## 78 KA Pollard 2344.0 JEC Franklin 38.0 ## 79 KA Pollard 2344.0 JP Duminy 63.0 ## 80 KA Pollard 2344.0 KD Karthik 40.0 ## 81 KA Pollard 2344.0 KH Pandya 111.0 ## 82 KA Pollard 2344.0 KV Sharma 13.0 ## 83 KA Pollard 2344.0 LMP Simmons 77.0 ## 84 KA Pollard 2344.0 MEK Hussey 10.0 ## 85 KA Pollard 2344.0 MG Johnson 1.0 ## 86 KA Pollard 2344.0 N Rana 60.0 ## 87 KA Pollard 2344.0 PA Patel 18.0 ## 88 KA Pollard 2344.0 PP Ojha 12.0 ## 89 KA Pollard 2344.0 R Dhawan 25.0 ## 90 KA Pollard 2344.0 R McLaren 20.0 ## 91 KA Pollard 2344.0 R Sathish 27.0 ## 92 KA Pollard 2344.0 RG Sharma 587.0 ## 93 KA Pollard 2344.0 RJ Peterson 0.0 ## 94 KA Pollard 2344.0 S Dhawan 20.0 ## 95 KA Pollard 2344.0 SL Malinga 14.0 ## 96 KA Pollard 2344.0 SR Tendulkar 69.0 ## 97 KA Pollard 2344.0 SS Tiwary 42.0 ## 98 KA Pollard 2344.0 TL Suman 2.0 ## 99 KA Pollard 2344.0 Z Khan 1.0 ## ## [100 rows x 4 columns] ## 6. Team Batsmen partnership in Twenty20 – Chart (all matches against all IPL teams) The function below plots the partnerships of an IPL team against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Delhi Daredevils-allMatchesAllOpposition.csv") dd_matches = pd.read_csv(path) yka.teamBatsmenPartnershipAllOppnAllMatchesChart(dd_matches,'Delhi Daredevils', plot=True, top=4, partnershipRuns=100) ## 7.Team Batsmen partnership in Twenty20 – Dataframe (all matches against all IPL teams) This function does not plot the data but returns the dataframe to the user to plot or manipulate. Note: Many of the plots include an additional parameters for e.g. plot which is either True or False. The default value is plot=True. When plot=True the plot will be displayed. When plot=False the data frame will be returned to the user. The user can use this to create an interactive charts. The parameter top= specifies the number of top batsmen that need to be included in the chart, and partnershipRuns gives the minimum cutoff runs in partnwerships to be considered import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Kochi Tuskers Kerala-allMatchesAllOpposition.csv") ktk_matches = pd.read_csv(path) m=yka.teamBatsmenPartnershipAllOppnAllMatchesChart(ktk_matches,'Kochi Tuskers Kerala', plot=False, top=3, partnershipRuns=100) print(m) ## batsman non_striker partnershipRuns ## 0 BB McCullum BJ Hodge 17.0 ## 1 BB McCullum DPMD Jayawardene 160.0 ## 2 BB McCullum M Klinger 67.0 ## 3 BB McCullum PA Patel 40.0 ## 4 BB McCullum RA Jadeja 19.0 ## 5 BB McCullum VVS Laxman 41.0 ## 6 BB McCullum Y Gnaneswara Rao 13.0 ## 7 DPMD Jayawardene BB McCullum 152.0 ## 8 DPMD Jayawardene BJ Hodge 41.0 ## 9 DPMD Jayawardene KM Jadhav 4.0 ## 10 DPMD Jayawardene M Klinger 28.0 ## 11 DPMD Jayawardene OA Shah 9.0 ## 12 DPMD Jayawardene PA Patel 25.0 ## 13 DPMD Jayawardene RA Jadeja 18.0 ## 14 DPMD Jayawardene RV Gomez 10.0 ## 15 DPMD Jayawardene VVS Laxman 12.0 ## 16 BJ Hodge BB McCullum 18.0 ## 17 BJ Hodge DPMD Jayawardene 47.0 ## 18 BJ Hodge KM Jadhav 2.0 ## 19 BJ Hodge OA Shah 19.0 ## 20 BJ Hodge PA Patel 79.0 ## 21 BJ Hodge RA Jadeja 99.0 ## 22 BJ Hodge RV Gomez 21.0 ## 8. Team batsmen versus bowler in Twenty20-Chart (all matches against all IPL teams) The plots below provide information on how each of the top batsmen of the IPL team fared against the opposition bowlers of all other IPL teams. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv") rcb_matches = pd.read_csv(path) yka.teamBatsmenVsBowlersAllOppnAllMatches(rcb_matches,"Royal Challengers Bangalore",plot=True,top=3,runsScored=60) ## 9 Team batsmen versus bowler in Twenty20-Dataframe (all matches against all IPL teams) This function provides the batting performance of an IPL team against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv") kxip_matches = pd.read_csv(path) m=yka.teamBatsmenVsBowlersAllOppnAllMatches(kxip_matches,'Kings XI Punjab',plot=False,top=2,runsScored=50) print(m) ## batsman bowler runsScored ## 0 SE Marsh A Chandila 20.0 ## 1 SE Marsh A Choudhary 1.0 ## 2 SE Marsh A Kumble 37.0 ## 3 SE Marsh A Mishra 0.0 ## 4 SE Marsh A Mithun 9.0 ## 5 SE Marsh A Nehra 33.0 ## 6 SE Marsh A Singh 2.0 ## 7 SE Marsh A Symonds 5.0 ## 8 SE Marsh AA Chavan 19.0 ## 9 SE Marsh AA Jhunjhunwala 15.0 ## 10 SE Marsh AB Agarkar 27.0 ## 11 SE Marsh AB Dinda 31.0 ## 12 SE Marsh AB McDonald 9.0 ## 13 SE Marsh AC Thomas 1.0 ## 14 SE Marsh AD Mathews 7.0 ## 15 SE Marsh AD Russell 8.0 ## 16 SE Marsh AJ Tye 0.0 ## 17 SE Marsh AL Menaria 6.0 ## 18 SE Marsh AM Salvi 8.0 ## 19 SE Marsh AN Ahmed 16.0 ## 20 SE Marsh AS Raut 7.0 ## 21 SE Marsh Ankit Sharma 2.0 ## 22 SE Marsh Ankit Soni 11.0 ## 23 SE Marsh B Kumar 10.0 ## 24 SE Marsh B Lee 1.0 ## 25 SE Marsh BAW Mendis 11.0 ## 26 SE Marsh BB Sran 3.0 ## 27 SE Marsh BJ Hodge 18.0 ## 28 SE Marsh Basil Thampi 17.0 ## 29 SE Marsh C de Grandhomme 8.0 ## .. ... ... ... ## 235 DA Miller R Sharma 7.0 ## 236 DA Miller R Tewatia 3.0 ## 237 DA Miller R Vinay Kumar 30.0 ## 238 DA Miller RA Jadeja 84.0 ## 239 DA Miller RD Chahar 3.0 ## 240 DA Miller RE van der Merwe 5.0 ## 241 DA Miller RN ten Doeschate 1.0 ## 242 DA Miller RP Singh 35.0 ## 243 DA Miller Rashid Khan 0.0 ## 244 DA Miller S Aravind 7.0 ## 245 DA Miller S Kaul 23.0 ## 246 DA Miller S Kaushik 8.0 ## 247 DA Miller S Ladda 6.0 ## 248 DA Miller S Nadeem 11.0 ## 249 DA Miller SK Raina 2.0 ## 250 DA Miller SL Malinga 9.0 ## 251 DA Miller SMSM Senanayake 6.0 ## 252 DA Miller SP Narine 10.0 ## 253 DA Miller SR Watson 16.0 ## 254 DA Miller STR Binny 14.0 ## 255 DA Miller Shakib Al Hasan 3.0 ## 256 DA Miller TA Boult 20.0 ## 257 DA Miller TG Southee 11.0 ## 258 DA Miller UT Yadav 51.0 ## 259 DA Miller VR Aaron 19.0 ## 260 DA Miller VS Malik 3.0 ## 261 DA Miller YK Pathan 0.0 ## 262 DA Miller YS Chahal 35.0 ## 263 DA Miller Yuvraj Singh 11.0 ## 264 DA Miller Z Khan 2.0 ## ## [265 rows x 3 columns] ## 10. Team batting scorecard(all matches against all IPL teams) This function provides the overall scorecard for an IPL team in all matches against all other IPL teams. The batting scorecard shows the top batsmen for Kolkata Knight Riders below import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Kolkata Knight Riders-allMatchesAllOpposition.csv") kkr_matches = pd.read_csv(path) scorecard=yka.teamBattingScorecardAllOppnAllMatches(kkr_matches,'Kolkata Knight Riders') print(scorecard) ## batsman runs balls 4s 6s SR ## 19 G Gambhir 3035.0 2533 352 46 119.818397 ## 17 YK Pathan 1893.0 1421 150 86 133.216045 ## 22 RV Uthappa 1806.0 1311 200 54 137.757437 ## 16 JH Kallis 1295.0 1237 128 23 104.688763 ## 23 MK Pandey 1270.0 1048 103 38 121.183206 ## 0 SC Ganguly 1031.0 977 105 36 105.527124 ## 12 MK Tiwary 1002.0 921 86 23 108.794788 ## 1 BB McCullum 882.0 754 92 32 116.976127 ## 25 SA Yadav 608.0 474 54 21 128.270042 ## 15 MS Bisla 543.0 518 60 16 104.826255 ## 26 AD Russell 516.0 308 45 34 167.532468 ## 4 DJ Hussey 511.0 417 31 28 122.541966 ## 24 Shakib Al Hasan 498.0 399 44 15 124.812030 ## 10 BJ Hodge 476.0 430 47 10 110.697674 ## 11 CH Gayle 463.0 350 45 26 132.285714 ## 18 EJG Morgan 444.0 373 45 16 119.034853 ## 54 CA Lynn 378.0 250 30 23 151.200000 ## 6 LR Shukla 374.0 320 31 15 116.875000 ## 29 RN ten Doeschate 326.0 238 26 15 136.974790 ## 21 DB Das 304.0 267 23 16 113.857678 ## 3 WP Saha 298.0 213 24 12 139.906103 ## 28 SP Narine 271.0 193 36 12 140.414508 ## 13 AD Mathews 249.0 211 20 8 118.009479 ## 33 Salman Butt 193.0 172 30 2 112.209302 ## 41 MN van Wyk 167.0 135 19 1 123.703704 ## 7 AB Agarkar 160.0 137 12 5 116.788321 ## 20 R Bhatia 159.0 134 15 3 118.656716 ## 51 C de Grandhomme 126.0 92 10 6 136.956522 ## 39 CA Pujara 122.0 119 14 3 102.521008 ## 40 OA Shah 115.0 96 7 5 119.791667 ## .. ... ... ... ... .. ... ## 50 JO Holder 22.0 20 2 1 110.000000 ## 65 Kuldeep Yadav 20.0 22 2 0 90.909091 ## 71 BJ Haddin 18.0 11 2 1 163.636364 ## 70 NM Coulter-Nile 14.0 13 0 2 107.692308 ## 47 L Balaji 13.0 12 1 0 108.333333 ## 55 SMSM Senanayake 10.0 17 0 0 58.823529 ## 53 M Morkel 9.0 8 0 0 112.500000 ## 62 AN Ghosh 7.0 8 1 0 87.500000 ## 32 GB Hogg 7.0 6 0 0 116.666667 ## 56 MV Boucher 6.0 6 0 0 100.000000 ## 77 Azhar Mahmood 6.0 8 1 0 75.000000 ## 78 DM Bravo 6.0 5 1 0 120.000000 ## 68 SS Shaikh 6.0 7 1 0 85.714286 ## 66 TA Boult 5.0 8 0 0 62.500000 ## 76 Mohammed Shami 5.0 10 0 0 50.000000 ## 80 P Dogra 5.0 8 0 0 62.500000 ## 69 R Vinay Kumar 4.0 7 0 0 57.142857 ## 75 AS Rajpoot 4.0 7 1 0 57.142857 ## 43 Mandeep Singh 4.0 11 1 0 36.363636 ## 37 AB Dinda 4.0 8 0 0 50.000000 ## 79 PJ Sangwan 4.0 2 1 0 200.000000 ## 73 R McLaren 3.0 6 0 0 50.000000 ## 67 SB Bangar 2.0 9 0 0 22.222222 ## 57 RS Gavaskar 2.0 8 0 0 25.000000 ## 72 Shoaib Akhtar 2.0 8 0 0 25.000000 ## 38 Mashrafe Mortaza 2.0 2 0 0 100.000000 ## 63 BAW Mendis 1.0 2 0 0 50.000000 ## 58 SE Bond 1.0 2 0 0 50.000000 ## 44 CK Langeveldt 0.0 1 0 0 0.000000 ## 30 PJ Cummins 0.0 2 0 0 0.000000 ## ## [81 rows x 6 columns] ## 10a. Team batting scorecard(all matches against all IPL teams) The output below shows the Chennai Super Kings against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv") csk_matches = pd.read_csv(path) scorecard=yka.teamBattingScorecardAllOppnAllMatches(csk_matches,'Chennai Super Kings') print(scorecard) ## batsman runs balls 4s 6s SR ## 3 SK Raina 3699 2735 322 150 135.246801 ## 5 MS Dhoni 2986 2199 218 126 135.788995 ## 17 MEK Hussey 1768 1461 181 45 121.013005 ## 11 M Vijay 1600 1289 141 66 124.127230 ## 4 S Badrinath 1441 1245 154 28 115.742972 ## 9 ML Hayden 1107 838 121 44 132.100239 ## 18 F du Plessis 1081 867 92 29 124.682814 ## 25 DR Smith 965 766 102 50 125.979112 ## 26 BB McCullum 841 634 83 42 132.649842 ## 6 JA Morkel 827 591 51 48 139.932318 ## 20 DJ Bravo 706 543 54 30 130.018416 ## 19 RA Jadeja 670 533 46 23 125.703565 ## 0 PA Patel 516 529 67 7 97.542533 ## 2 SP Fleming 196 171 27 3 114.619883 ## 13 R Ashwin 190 208 19 1 91.346154 ## 21 S Vidyut 145 115 21 3 126.086957 ## 31 WP Saha 144 138 8 8 104.347826 ## 1 S Anirudha 133 116 9 7 114.655172 ## 33 DJ Hussey 116 96 8 6 120.833333 ## 38 P Negi 116 77 10 5 150.649351 ## 10 JDP Oram 106 107 6 5 99.065421 ## 29 GJ Bailey 63 67 9 0 94.029851 ## 22 A Flintoff 62 57 5 2 108.771930 ## 8 MS Gony 50 39 2 5 128.205128 ## 7 Joginder Sharma 36 30 1 2 120.000000 ## 27 M Manhas 35 26 3 1 134.615385 ## 28 MM Sharma 29 26 1 2 111.538462 ## 23 SB Jakati 27 28 3 0 96.428571 ## 12 JM Kemp 26 25 1 1 104.000000 ## 14 L Balaji 22 35 1 1 62.857143 ## 24 DE Bollinger 21 23 1 1 91.304348 ## 41 CK Kapugedera 16 24 0 0 66.666667 ## 37 CH Morris 14 17 0 0 82.352941 ## 30 T Thushara 12 19 0 0 63.157895 ## 42 M Ntini 11 19 2 0 57.894737 ## 15 M Muralitharan 9 13 1 0 69.230769 ## 32 KMDN Kulasekara 5 3 1 0 166.666667 ## 34 SB Styris 5 2 1 0 250.000000 ## 35 B Laughlin 4 9 0 0 44.444444 ## 16 S Tyagi 3 4 0 0 75.000000 ## 45 KB Arun Karthik 3 5 0 0 60.000000 ## 36 AS Rajpoot 2 6 0 0 33.333333 ## 43 RG More 2 2 0 0 100.000000 ## 44 S Randiv 2 4 0 0 50.000000 ## 39 A Nehra 1 7 0 0 14.285714 ## 40 A Mukund 0 1 0 0 0.000000 ## 11.Team Bowling scorecard (all matches against all IPL teams) The output below gives the bowling performance of an IPL team against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Sunrisers Hyderabad-allMatchesAllOpposition.csv") srh_matches = pd.read_csv(path) scorecard=yka.teamBowlingScorecardAllOppnAllMatches(srh_matches,'Sunrisers Hyderabad') ## C:\Users\Ganesh\ANACON~1\lib\site-packages\yorkpy\analytics.py:564: SettingWithCopyWarning: ## A value is trying to be set on a copy of a slice from a DataFrame. ## Try using .loc[row_indexer,col_indexer] = value instead ## ## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy ## df1['over']=df1.delivery.astype(int) ## C:\Users\Ganesh\ANACON~1\lib\site-packages\yorkpy\analytics.py:567: SettingWithCopyWarning: ## A value is trying to be set on a copy of a slice from a DataFrame. ## Try using .loc[row_indexer,col_indexer] = value instead ## ## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy ## df1['runsConceded']=df1['runs'] + df1['wides'] + df1['noballs'] print(scorecard) ## bowler overs runs maidens wicket econrate ## 60 JP Faulkner 28 192 0 15 6.857143 ## 83 MM Sharma 37 334 0 13 9.027027 ## 119 SL Malinga 31 215 0 13 6.935484 ## 123 SR Watson 30 281 0 13 9.366667 ## 90 NM Coulter-Nile 24 166 0 12 6.916667 ## 31 DJ Bravo 26 184 0 12 7.076923 ## 135 UT Yadav 37 297 0 12 8.027027 ## 125 Sandeep Sharma 32 280 0 11 8.750000 ## 75 M Morkel 25 195 0 9 7.800000 ## 81 MJ McClenaghan 24 175 0 9 7.291667 ## 5 AB Dinda 23 165 0 9 7.173913 ## 55 JD Unadkat 20 167 0 8 8.350000 ## 36 DS Kulkarni 28 200 0 8 7.142857 ## 25 CH Morris 24 190 0 7 7.916667 ## 101 R Bhatia 18 128 0 7 7.111111 ## 70 Kuldeep Yadav 16 129 0 7 8.062500 ## 11 AR Patel 27 208 0 7 7.703704 ## 122 SP Narine 43 282 0 7 6.558140 ## 141 YS Chahal 26 224 0 6 8.615385 ## 44 Harbhajan Singh 39 264 0 6 6.769231 ## 96 PP Chawla 21 140 0 6 6.666667 ## 4 A Zampa 4 19 0 6 4.750000 ## 126 Shakib Al Hasan 14 99 1 6 7.071429 ## 80 MG Johnson 20 155 0 6 7.750000 ## 59 JP Duminy 10 80 0 5 8.000000 ## 58 JO Holder 15 113 0 5 7.533333 ## 92 P Kumar 23 173 0 5 7.521739 ## 100 R Ashwin 28 142 0 5 5.071429 ## 2 A Mishra 18 144 0 4 8.000000 ## 106 R Vinay Kumar 19 154 0 4 8.105263 ## .. ... ... ... ... ... ... ## 6 AD Mascarenhas 4 25 0 0 6.250000 ## 13 Ankit Soni 2 31 0 0 15.500000 ## 132 TM Head 1 11 0 0 11.000000 ## 10 AN Ahmed 6 63 0 0 10.500000 ## 131 TM Dilshan 1 10 0 0 10.000000 ## 134 Tejas Baroka 3 33 0 0 11.000000 ## 73 M Ashwin 1 6 0 0 6.000000 ## 109 RG Sharma 1 5 0 0 5.000000 ## 22 Basil Thampi 2 21 0 0 10.500000 ## 23 C Munro 1 8 0 0 8.000000 ## 68 KV Sharma 2 19 0 0 9.500000 ## 77 M Vijay 4 24 0 0 6.000000 ## 66 KJ Abbott 3 34 0 0 11.333333 ## 65 KH Pandya 2 17 0 0 8.500000 ## 82 MM Patel 3 22 0 0 7.333333 ## 62 K Rabada 4 59 0 0 14.750000 ## 85 MP Stoinis 3 28 0 0 9.333333 ## 54 JA Morkel 3 35 0 0 11.666667 ## 46 I Sharma 8 64 0 0 8.000000 ## 94 PJ Cummins 4 37 0 0 9.250000 ## 95 PJ Sangwan 8 82 0 0 10.250000 ## 103 R Sathish 1 9 0 0 9.000000 ## 38 DW Steyn 2 17 0 0 8.500000 ## 108 RG More 2 28 0 0 14.000000 ## 34 DJG Sammy 2 18 0 0 9.000000 ## 33 DJ Muthuswami 2 20 0 0 10.000000 ## 32 DJ Hooda 5 45 0 0 9.000000 ## 24 CH Gayle 3 24 0 0 8.000000 ## 116 SA Abbott 2 21 0 0 10.500000 ## 72 LR Shukla 2 28 0 0 14.000000 ## ## [144 rows x 6 columns] ## 11a.Team Bowling scorecard (all matches against all IPL teams) import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Rajasthan Royals-allMatchesAllOpposition.csv") rr_matches = pd.read_csv(path) scorecard=yka.teamBowlingScorecardAllOppnAllMatches(rr_matches,'Rajasthan Royals') print(scorecard) ## bowler overs runs maidens wicket econrate ## 2 A Mishra 63 426 0 29 6.761905 ## 66 JA Morkel 38 301 0 16 7.921053 ## 129 R Vinay Kumar 48 406 0 15 8.458333 ## 135 RP Singh 41 255 0 14 6.219512 ## 95 MF Maharoof 23 139 0 14 6.043478 ## 118 PP Chawla 45 353 0 14 7.844444 ## 130 RA Jadeja 32 227 0 14 7.093750 ## 50 DW Steyn 43 232 0 13 5.395349 ## 56 Harbhajan Singh 45 341 0 12 7.577778 ## 1 A Kumble 21 108 1 12 5.142857 ## 159 SL Malinga 49 363 0 12 7.408163 ## 60 IK Pathan 37 279 0 11 7.540541 ## 82 KA Pollard 21 201 0 11 9.571429 ## 119 PP Ojha 46 426 0 11 9.260870 ## 121 R Ashwin 29 222 0 11 7.655172 ## 22 B Kumar 31 233 0 11 7.516129 ## 3 A Nehra 32 214 0 11 6.687500 ## 41 DJ Bravo 30 292 0 10 9.733333 ## 110 P Kumar 48 329 1 10 6.854167 ## 58 I Sharma 37 284 0 9 7.675676 ## 168 Shakib Al Hasan 25 153 0 9 6.120000 ## 87 L Balaji 33 277 0 9 8.393939 ## 122 R Bhatia 19 121 0 8 6.368421 ## 48 DS Kulkarni 21 148 0 8 7.047619 ## 101 MM Sharma 20 142 0 8 7.100000 ## 174 UT Yadav 25 203 0 8 8.120000 ## 15 AR Patel 16 110 0 7 6.875000 ## 133 RJ Harris 16 132 0 7 8.250000 ## 72 JH Kallis 37 254 0 7 6.864865 ## 192 Z Khan 33 213 0 7 6.454545 ## .. ... ... ... ... ... ... ## 170 Shoaib Ahmed 2 19 0 0 9.500000 ## 54 GS Sandhu 4 49 0 0 12.250000 ## 139 RV Gomez 1 9 0 0 9.000000 ## 163 SPD Smith 0 5 0 0 inf ## 115 PC Valthaty 3 35 0 0 11.666667 ## 34 CJ Anderson 4 26 0 0 6.500000 ## 81 K Upadhyay 3 29 0 0 9.666667 ## 79 K Goel 1 11 0 0 11.000000 ## 28 BJ Rohrer 1 12 0 0 12.000000 ## 78 Joginder Sharma 2 23 0 0 11.500000 ## 99 MK Tiwary 2 28 0 0 14.000000 ## 26 BE Hendricks 4 57 0 0 14.250000 ## 102 MR Marsh 1 10 0 0 10.000000 ## 106 NL McCullum 3 22 0 0 7.333333 ## 113 P Prasanth 1 18 0 0 18.000000 ## 114 P Suyal 4 45 0 0 11.250000 ## 46 DP Vijaykumar 1 10 0 0 10.000000 ## 154 SB Styris 2 14 0 0 7.000000 ## 71 JEC Franklin 3 32 0 0 10.666667 ## 70 JE Taylor 3 22 0 0 7.333333 ## 18 Ankit Sharma 4 33 0 0 8.250000 ## 134 RN ten Doeschate 2 14 0 0 7.000000 ## 16 Abdur Razzak 2 29 0 0 14.500000 ## 65 J Theron 6 48 0 0 8.000000 ## 146 S Narwal 2 17 0 0 8.500000 ## 63 J Botha 1 19 0 0 19.000000 ## 149 S Tyagi 8 65 0 0 8.125000 ## 151 SB Bangar 2 20 0 0 10.000000 ## 13 AM Nayar 2 7 0 0 3.500000 ## 0 A Ashish Reddy 3 22 0 0 7.333333 ## ## [193 rows x 6 columns] ### 12. Team Bowling wicket kind -Chart (all matches against all IPL teams) The functions compute and display the kind of wickets taken(bowled, caught, lbw etc) by an IPL team in all matches against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Gujarat Lions-allMatchesAllOpposition.csv") gl_matches = pd.read_csv(path) yka.teamBowlingWicketKindAllOppnAllMatches(gl_matches,'Gujarat Lions',plot=True,top=5,wickets=2) ### 13. Team Bowling wicket kind -Dataframe (all matches against all IPL teams) This gives the type of wickets taken for an IPL team against all other IPL teams. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Rising Pune Supergiants-allMatchesAllOpposition.csv") rps_matches = pd.read_csv(path) m=yka.teamBowlingWicketKindAllOppnAllMatches(rps_matches,'Rising Pune Supergiants',plot=False,top=4,wickets=10) print(m) ## bowler kind wickets ## 0 A Nehra caught 4 ## 1 A Nehra run out 2 ## 2 MM Sharma caught 3 ## 3 MM Sharma caught and bowled 1 ## 4 MM Sharma run out 1 ## 5 SR Watson bowled 1 ## 6 SR Watson caught 4 ## 7 KW Richardson caught 3 ## 8 KW Richardson retired hurt 1 ## 14 Team Bowler vs Batman -Plot (all matches against all IPL teams) The function below gives the performance of bowlers against batsmen ,in all matches against another IPL team. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Rising Pune Supergiants-allMatchesAllOpposition.csv") rps_matches = pd.read_csv(path) yka.teamBowlersVsBatsmenAllOppnAllMatches(rps_matches,'Rising Pune Supergiants',plot=True,top=5,runsConceded=10) ## 15 Team Bowler vs Batman – Dataframe (all matches against all IPL teams) import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Deccan Chargers-allMatchesAllOpposition.csv") dc_matches = pd.read_csv(path) m=yka.teamBowlersVsBatsmenAllOppnAllMatches(dc_matches,'Deccan Chargers',plot=False,top=2,runsConceded=30) print(m) ## bowler batsman runsConceded ## 0 P Kumar A Ashish Reddy 6.0 ## 1 P Kumar A Symonds 15.0 ## 2 P Kumar AA Bilakhia 12.0 ## 3 P Kumar AA Jhunjhunwala 1.0 ## 4 P Kumar AC Gilchrist 20.0 ## 5 P Kumar Anirudh Singh 11.0 ## 6 P Kumar B Chipli 1.0 ## 7 P Kumar CL White 11.0 ## 8 P Kumar DB Ravi Teja 15.0 ## 9 P Kumar DJ Harris 2.0 ## 10 P Kumar DR Smith 5.0 ## 11 P Kumar FH Edwards 3.0 ## 12 P Kumar HH Gibbs 46.0 ## 13 P Kumar J Theron 0.0 ## 14 P Kumar JP Duminy 4.0 ## 15 P Kumar KC Sangakkara 15.0 ## 16 P Kumar MD Mishra 4.0 ## 17 P Kumar PA Patel 9.0 ## 18 P Kumar RG Sharma 36.0 ## 19 P Kumar RJ Harris 3.0 ## 20 P Kumar S Dhawan 37.0 ## 21 P Kumar S Sohal 6.0 ## 22 P Kumar SB Styris 6.0 ## 23 P Kumar Shahid Afridi 0.0 ## 24 P Kumar TL Suman 22.0 ## 25 P Kumar VVS Laxman 5.0 ## 26 P Kumar Y Venugopal Rao 1.0 ## 27 PP Chawla A Ashish Reddy 2.0 ## 28 PP Chawla A Symonds 35.0 ## 29 PP Chawla AA Jhunjhunwala 6.0 ## 30 PP Chawla AC Gilchrist 4.0 ## 31 PP Chawla B Chipli 8.0 ## 32 PP Chawla CL White 16.0 ## 33 PP Chawla DB Ravi Teja 30.0 ## 34 PP Chawla DJ Harris 9.0 ## 35 PP Chawla DNT Zoysa 1.0 ## 36 PP Chawla HH Gibbs 30.0 ## 37 PP Chawla JP Duminy 10.0 ## 38 PP Chawla KC Sangakkara 1.0 ## 39 PP Chawla MR Marsh 1.0 ## 40 PP Chawla PA Patel 4.0 ## 41 PP Chawla PA Reddy 8.0 ## 42 PP Chawla RG Sharma 50.0 ## 43 PP Chawla S Dhawan 33.0 ## 44 PP Chawla SB Bangar 1.0 ## 45 PP Chawla TL Suman 17.0 ## 46 PP Chawla VVS Laxman 7.0 ## 47 PP Chawla Y Venugopal Rao 3.0 ## 16 Team Wins and Losses – Summary (all matches against all IPL teams) The function below computes and plots the number of wins and losses between an IPL team and all other IPL teams in all matches. The summary just gives the wins, losses and ties import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv") csk_matches = pd.read_csv(path) team1='Chennai Super Kings' yka.plotWinLossByTeamAllOpposition(csk_matches,team1,plot="summary") ## 16a Team Wins and Losses – Detailed (all matches against all IPL teams) The function below computes and plot the number of wins and losses between an IPL team and all other IPL teams in all matches. This gives a breakup of which team won against this team. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv") csk_matches = pd.read_csv(path) team1='Chennai Super Kings' yka.plotWinLossByTeamAllOpposition(csk_matches,team1,plot="detailed") ## 16b Team Wins and Losses – Summary (all matches against all IPL teams) This plot gives the wins vs losses of MI against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv") mi_matches = pd.read_csv(path) team1='Mumbai Indians' yka.plotWinLossByTeamAllOpposition(mi_matches,team1,plot="summary") ## 16c Team Wins and Losses – Detailed (all matches against all IPL teams) The function below computes and plot the number of wins and losses between an IPL team and all other IPL teams in all matches. This gives the breakup of MI wins, losses and ties import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv") mi_matches = pd.read_csv(path) team1='Mumbai Indians' yka.plotWinLossByTeamAllOpposition(mi_matches,team1,plot="detailed") ## 17 Team Wins by win type (all matches against all IPL teams) This function shows how the win happened whether by runs or by wickets in all matches played against all other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv") rcb_matches = pd.read_csv(path) yka.plotWinsByRunOrWicketsAllOpposition(rcb_matches,'Royal Challengers Bangalore') ## 18 Team Wins by toss decision (summary) (all matches against all IPL teams) This show how Royal Challengers Bangalore fared when it chose to field on winning the toss import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv") rcb_matches = pd.read_csv(path) yka.plotWinsbyTossDecisionAllOpposition(rcb_matches,'Royal Challengers Bangalore',tossDecision='field',plot='summary') 18a. Team Wins by toss decision (detailed) (all matches against all IPL teams) import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv") kxip_matches = pd.read_csv(path) yka.plotWinsbyTossDecisionAllOpposition(kxip_matches,'Kings XI Punjab',tossDecision='field',plot='detailed') ## 19 Team Wins by toss decision (summary) (all matches against all IPL teams) This plot shows how Mumbai Indians fared when it chose to bat on winning the toss against all other IPL teams. import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Delhi Daredevils-allMatchesAllOpposition.csv") mi_rcb_matches = pd.read_csv(path) yka.plotWinsbyTossDecisionAllOpposition(mi_rcb_matches,'Mumbai Indians',tossDecision='bat',plot='summary') ## 20 Team Wins by toss decision (detailed)(all matches against all IPL teams) This plot shows how Kings X1 Punjab fared when it chose to bat on winning the toss import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2" path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv") kxip_matches = pd.read_csv(path) yka.plotWinsbyTossDecisionAllOpposition(kxip_matches,'Kings XI Punjab',tossDecision='bat',plot='detailed') Feel free to clone/download the code from Github yorkpy ## Conclusion This post included analysis of an IPL team against all other IPL teams. You can download the data for this and the earlier posts from [yorkpyData](https://github.com/tvganesh/yorkpyData The code can be cloned/downloaded from Github Important note: Do check out my other posts using yorkpy at yorkpy-posts To be continued. Watch this space! To see all posts click Index of posts # Cricpy takes guard for the Twenty20s There are two ways to write error-free programs; only the third one works.”” Alan J. Perlis Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning. ” Rick Cook My software never has bugs. It just develops random features.” Anon If you make an ass out of yourself, there will always be someone to ride you.” Bruce Lee # Introduction This is the 3rd and final post on cricpy, and is a continuation to my 2 earlier posts Cricpy, is the python avatar of my R package ‘cricketr’. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers With this post cricpy, like cricketr, now becomes omnipotent, and is now capable of handling Test, ODI and T20 matches. Cricpy uses the statistics info available in ESPN Cricinfo Statsguru. You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use This post is also hosted on Rpubs at Cricpy takes guard for the Twenty 20s. You can also download the pdf version of this post at cricpy-TT.pdf You can fork/clone the package at Github cricpy Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook. # The cricpy package The data for a particular player in Twenty20s can be obtained with the getPlayerDataTT() function. To do this you will need to go to T20 Batting and T20 Bowling and click the player you are interested in This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence,this can be used to get the data for Virat Kohlias shown below The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the languages you can look up the package in the other and you will notice the parallel constructs. You can fork/clone the package at Github cricpy Note: The charts are self-explanatory and I have not added much of my own interpretation to it. Do look at the plots closely and check out the performances for yourself. ## 1 Importing cricpy – Python # Install the package # Do a pip install cricpy # Import cricpy import cricpy.analytics as ca  ## 2. Invoking functions with Python package cricpy import cricpy.analytics as ca ca.batsman4s("./kohli.csv","Virat Kohli") # 3. Getting help from cricpy – Python import cricpy.analytics as ca help(ca.getPlayerDataTT) ## Help on function getPlayerDataTT in module cricpy.analytics: ## ## getPlayerDataTT(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True) ## Get the Twenty20 International player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory~ ## ## Description ## ## Get the Twenty20 player data given the profile of the batsman/bowler. The allowed inputs are home,away, neutralboth and won,lost,tied or no result of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player ## ## Usage ## ## getPlayerDataTT(profile, opposition="",host="",dir = "./data", file = "player001.csv", ## type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5)) ## Arguments ## ## profile ## This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virat Kohli this turns out to be 253802 http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263 ## opposition ## The numerical value of the opposition country e.g.Australia,India, England etc. The values are Afghanistan:40,Australia:2,Bangladesh:25,England:1,Hong Kong:19,India:6,Ireland:29, New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Note: If no value is entered for opposition then all teams are considered ## host ## The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5, South Africa:3,Sri Lanka:8,United States of America:11,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered ## dir ## Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data" ## file ## Name of the file to store the data into for e.g. kohli.csv. This can be used for subsequent functions. Default="player001.csv" ## type ## type of data required. This can be "batting" or "bowling" ## homeOrAway ## This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue ## result ## This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result ## Details ## ## More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI ## ## Value ## ## Returns the player's dataframe ## ## Note ## ## Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com> ## ## Author(s) ## ## Tinniam V Ganesh ## ## References ## ## http://www.espncricinfo.com/ci/content/stats/index.html ## https://gigadom.wordpress.com/ ## ## See Also ## ## bowlerWktRateTT getPlayerData ## ## Examples ## ## ## Not run: ## # Only away. Get data only for won and lost innings ## kohli =getPlayerDataTT(253802,dir="../cricketr/data", file="kohli1.csv", ## type="batting") ## ## # Get bowling data and store in file for future ## ashwin = getPlayerDataTT(26421,dir="../cricketr/data",file="ashwin1.csv", ## type="bowling") ## ## kohli =getPlayerDataTT(253802,opposition = 2,host=2,dir="../cricketr/data", ## file="kohli1.csv",type="batting") The details below will introduce the different functions that are available in cricpy. ## 4. Get the Twenty20 player data for a player using the function getPlayerDataOD() Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataTT for all subsequent analyses import cricpy.analytics as ca #kohli=ca.getPlayerDataTT(253802,dir=".",file="kohli.csv",type="batting") #guptill=ca.getPlayerDataTT(226492,dir=".",file="guptill.csv",type="batting") #shahzad=ca.getPlayerDataTT(419873,dir=".",file="shahzad.csv",type="batting") #mccullum=ca.getPlayerDataTT(37737,dir=".",file="mccullum.csv",type="batting") Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test, ODI and Twenty20 records ## 5 Virat Kohli’s performance – Basic Analyses The 3 plots below provide the following for Virat Kohli in T20s 1. Frequency percentage of runs in each run range over the whole career 2. Mean Strike Rate for runs scored in the given range 3. A histogram of runs frequency percentages in runs ranges import cricpy.analytics as ca import matplotlib.pyplot as plt ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli") ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli") ca.batsmanRunsRanges("./kohli.csv","Virat Kohli") ## 6. More analyses import cricpy.analytics as ca ca.batsman4s("./kohli.csv","Virat Kohli") ca.batsman6s("./kohli.csv","Virat Kohli") ca.batsmanDismissals("./kohli.csv","Virat Kohli") ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli") ## 7. 3D scatter plot and prediction plane The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease import cricpy.analytics as ca ca.battingPerf3d("./kohli.csv","Virat Kohli") ## 8. Average runs at different venues The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis. import cricpy.analytics as ca ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli") ## 9. Average runs against different opposing teams This plot computes the average runs scored by Kohli against different countries. import cricpy.analytics as ca ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli") ## 10 . Highest Runs Likelihood The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means import cricpy.analytics as ca ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli") # 11. A look at the Top 4 batsman – Kohli, Guptill, Shahzad and McCullum The following batsmen have been very prolific in Twenty20 cricket and will be used for the analyses 1. Virat Kohli: Runs – 2167, Average:49.25 ,Strike rate-136.11 2. MJ Guptill : Runs -2271, Average:34.4 ,Strike rate-132.88 3. Mohammed Shahzad :Runs – 1936, Average:31.22 ,Strike rate-134.81 4. BB McCullum : Runs – 2140, Average:35.66 ,Strike rate-136.21 The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs ## 12. Box Histogram Plot This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency import cricpy.analytics as ca ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli") ca.batsmanPerfBoxHist("./guptill.csv","M J Guptill") ca.batsmanPerfBoxHist("./shahzad.csv","M Shahzad") ca.batsmanPerfBoxHist("./mccullum.csv","BB McCullum") ## 13 Moving Average of runs in career Take a look at the Moving Average across the career of the Top 4 Twenty20 batsmen. import cricpy.analytics as ca ca.batsmanMovingAverage("./kohli.csv","Virat Kohli") ca.batsmanMovingAverage("./guptill.csv","M J Guptill") #ca.batsmanMovingAverage("./shahzad.csv","M Shahzad") # Gives error. Check! ca.batsmanMovingAverage("./mccullum.csv","BB McCullum") ## 14 Cumulative Average runs of batsman in career This function provides the cumulative average runs of the batsman over the career.Kohli’s average tops around 45 runs around 43 innings, though there is a dip downwards import cricpy.analytics as ca ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli") ca.batsmanCumulativeAverageRuns("./guptill.csv","M J Guptill") ca.batsmanCumulativeAverageRuns("./shahzad.csv","M Shahzad") ca.batsmanCumulativeAverageRuns("./mccullum.csv","BB McCullum") ## 15 Cumulative Average strike rate of batsman in career Kohli, Guptill and McCullum average a strike rate of 125+ import cricpy.analytics as ca ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli") ca.batsmanCumulativeStrikeRate("./guptill.csv","M J Guptill") ca.batsmanCumulativeStrikeRate("./shahzad.csv","M Shahzad") ca.batsmanCumulativeStrikeRate("./mccullum.csv","BB McCullum") ## 16 Relative Batsman Cumulative Average Runs The plot below compares the Relative cumulative average runs of the batsman. Kohli is way above all the other 3 batsmen. Behind Kohli is McCullum and then Guptill import cricpy.analytics as ca frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"] names = ["Kohli","Guptill","Shahzad","McCullumn"] ca.relativeBatsmanCumulativeAvgRuns(frames,names) ## 17. Relative Batsman Strike Rate The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show that Kohli tops the overall strike rate followed by McCullum and then Guptill import cricpy.analytics as ca frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"] names = ["Kohli","Guptill","Shahzad","McCullum"] ca.relativeBatsmanCumulativeStrikeRate(frames,names) ## 18. 3D plot of Runs vs Balls Faced and Minutes at Crease The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted import cricpy.analytics as ca ca.battingPerf3d("./kohli.csv","Virat Kohli") ca.battingPerf3d("./guptill.csv","M J Guptill") ca.battingPerf3d("./shahzad.csv","M Shahzad") ca.battingPerf3d("./mccullum.csv","BB McCullum") ## 19. 3D plot of Runs vs Balls Faced and Minutes at Crease Guptill and McCullum have a large percentage of sixes in comparison to the 4s. Kohli has a relative lower number of 6s import cricpy.analytics as ca frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"] names = ["Kohli","Guptill","Shahzad","McCullum"] ca.batsman4s6s(frames,names) ## 20. Predicting Runs given Balls Faced and Minutes at Crease A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. import cricpy.analytics as ca import numpy as np import pandas as pd BF = np.linspace( 10, 400,15) Mins = np.linspace( 30,600,15) newDF= pd.DataFrame({'BF':BF,'Mins':Mins}) kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli") print(kohli) ## BF Mins Runs ## 0 10.000000 30.000000 14.753153 ## 1 37.857143 70.714286 55.963333 ## 2 65.714286 111.428571 97.173513 ## 3 93.571429 152.142857 138.383693 ## 4 121.428571 192.857143 179.593873 ## 5 149.285714 233.571429 220.804053 ## 6 177.142857 274.285714 262.014233 ## 7 205.000000 315.000000 303.224414 ## 8 232.857143 355.714286 344.434594 ## 9 260.714286 396.428571 385.644774 ## 10 288.571429 437.142857 426.854954 ## 11 316.428571 477.857143 468.065134 ## 12 344.285714 518.571429 509.275314 ## 13 372.142857 559.285714 550.485494 ## 14 400.000000 600.000000 591.695674 ## 21 Analysis of Top Bowlers The following 4 bowlers have had an excellent career and will be used for the analysis 1. Shakib Hasan:Wickets: 80, Average = 21.07, Economy Rate – 6.74 2. Mohammed Nabi : Wickets: 67, Average = 24.25, Economy Rate – 7.13 3. Rashid Khan: Wickets: 64, Average = 12.40, Economy Rate – 6.01 4. Imran Tahir : Wickets:62, Average – 14.95, Economy Rate – 6.77 ## 22. Get the bowler’s data This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line import cricpy.analytics as ca #shakib=ca.getPlayerDataTT(56143,dir=".",file="shakib.csv",type="bowling") #nabi=ca.getPlayerDataOD(25913,dir=".",file="nabi.csv",type="bowling") #rashid=ca.getPlayerDataOD(793463,dir=".",file="rashid.csv",type="bowling") #tahir=ca.getPlayerDataOD(40618,dir=".",file="tahir.csv",type="bowling") ## 23. Wicket Frequency Plot This plot below plots the frequency of wickets taken for each of the bowlers import cricpy.analytics as ca ca.bowlerWktsFreqPercent("./shakib.csv","Shakib Al Hasan") ca.bowlerWktsFreqPercent("./nabi.csv","Mohammad Nabi") ca.bowlerWktsFreqPercent("./rashid.csv","Rashid Khan") ca.bowlerWktsFreqPercent("./tahir.csv","Imran Tahir") ## 24. Wickets Runs plot The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken. import cricpy.analytics as ca ca.bowlerWktsRunsPlot("./shakib.csv","Shakib Al Hasan") ca.bowlerWktsRunsPlot("./nabi.csv","Mohammad Nabi") ca.bowlerWktsRunsPlot("./rashid.csv","Rashid Khan") ca.bowlerWktsRunsPlot("./tahir.csv","Imran Tahir") ## 25 Average wickets at different venues The plot gives the average wickets taken by Muralitharan at different venues. import cricpy.analytics as ca ca.bowlerAvgWktsGround("./shakib.csv","Shakib Al Hasan") ca.bowlerAvgWktsGround("./nabi.csv","Mohammad Nabi") ca.bowlerAvgWktsGround("./rashid.csv","Rashid Khan") ca.bowlerAvgWktsGround("./tahir.csv","Imran Tahir") ## 26 Average wickets against different opposition The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team import cricpy.analytics as ca ca.bowlerAvgWktsOpposition("./shakib.csv","Shakib Al Hasan") ca.bowlerAvgWktsOpposition("./nabi.csv","Mohammad Nabi") ca.bowlerAvgWktsOpposition("./rashid.csv","Rashid Khan") ca.bowlerAvgWktsOpposition("./tahir.csv","Imran Tahir") ## 27 Wickets taken moving average From the plot below it can be see import cricpy.analytics as ca ca.bowlerMovingAverage("./shakib.csv","Shakib Al Hasan") ca.bowlerMovingAverage("./nabi.csv","Mohammad Nabi") ca.bowlerMovingAverage("./rashid.csv","Rashid Khan") ca.bowlerMovingAverage("./tahir.csv","Imran Tahir") ## 28 Cumulative average wickets taken The plots below give the cumulative average wickets taken by the bowlers. Rashid Khan has been the most effective with almost 2.28 wickets per match import cricpy.analytics as ca ca.bowlerCumulativeAvgWickets("./shakib.csv","Shakib Al Hasan") ca.bowlerCumulativeAvgWickets("./nabi.csv","Mohammad Nabi") ca.bowlerCumulativeAvgWickets("./rashid.csv","Rashid Khan") ca.bowlerCumulativeAvgWickets("./tahir.csv","Imran Tahir") ## 29 Cumulative average economy rate The plots below give the cumulative average economy rate of the bowlers. Rashid Khan has the nest economy rate followed by Mohammed Nabi import cricpy.analytics as ca ca.bowlerCumulativeAvgEconRate("./shakib.csv","Shakib Al Hasan") ca.bowlerCumulativeAvgEconRate("./nabi.csv","Mohammad Nabi") ca.bowlerCumulativeAvgEconRate("./rashid.csv","Rashid Khan") ca.bowlerCumulativeAvgEconRate("./tahir.csv","Imran Tahir") ## 30 Relative cumulative average economy rate of bowlers The Relative cumulative economy rate is given below. It can be seen that Rashid Khan has the best economy rate followed by Mohammed Nabi and then Imran Tahir import cricpy.analytics as ca frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"] names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"] ca.relativeBowlerCumulativeAvgEconRate(frames,names) ## 31 Relative Economy Rate against wickets taken Rashid Khan has the best figures for wickets between 2-3.5 wickets. Mohammed Nabi pips Rashid Khan when takes a haul of 4 wickets. import cricpy.analytics as ca frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"] names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"] ca.relativeBowlingER(frames,names) ## 32 Relative cumulative average wickets of bowlers in career Rashid has the best performance with cumulative average wickets. He is followed by Imran Tahir in the wicket haul, followed by Shakib Al Hasan import cricpy.analytics as ca frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"] names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"] ca.relativeBowlerCumulativeAvgWickets(frames,names) # 33. Key Findings The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use. Here are the main findings from the analysis above ## Analysis of Top 4 batsman The analysis of the Top 4 test batsman Kohli, Guptill, Shahzad and McCullum 1.Kohli has the best overall cumulative average runs and towers over everybody else 2. Kohli, Guptill and McCullum has a very good strike rate of around 125+ 3. Guptill and McCullum have a larger percentage of sixes as compared to Kohli 4. Rashid Khan has the best cumulative average wickets, followed by Imran Tahir and then Shakib Al Hasan 5. Rashid Khan is the most economical bowler, followed by Mohammed Nabi You can fork/clone the package at Github cricpy ## Conclusion Cricpy now has almost all the functions and functionalities of my R package cricketr. There are still a few more features that need to be added to cricpy. I intend to do this as and when I find time. Go ahead, take cricpy for a spin! Hope you enjoy the ride! Watch this space!!! Important note: Do check out my other posts using cricpy at cricpy-posts To see all posts click Index of Posts # Cricpy takes a swing at the ODIs No computer has ever been designed that is ever aware of what it’s doing; but most of the time, we aren’t either.” Marvin Minksy “The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague” Edgser Djikstra # Introduction In this post, cricpy, the Python avatar of my R package cricketr, learns some new tricks to be able to handle ODI matches. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers Cricpy uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use To know how to use cricpy see Introducing cricpy:A python package to analyze performances of cricketers. To the original version of cricpy, I have added 3 new functions for ODI. The earlier functions work for Test and ODI. This post is also hosted on Rpubs at Cricpy takes a swing at the ODIs. You can also down the pdf version of this post at cricpy-odi.pdf You can fork/clone the package at Github cricpy Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook. # The cricpy package The data for a particular player in ODI can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Virendar Sehwag, Chris Gayle etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohli’s profile is 253802. This can be used to get the data for Virat Kohlis shown below The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the lanuguages you can look up the package in the other and you will notice the parallel constructs. You can fork/clone the package at Github cricpy Note: The charts are self-explanatory and I have not added much of my owy interpretation to it. Do look at the plots closely and check out the performances for yourself. ## 1 Importing cricpy – Python # Install the package # Do a pip install cricpy # Import cricpy import cricpy.analytics as ca  ## 2. Invoking functions with Python package crlcpy import cricpy.analytics as ca ca.batsman4s("./kohli.csv","Virat Kohli")  # 3. Getting help from cricpy – Python import cricpy.analytics as ca help(ca.getPlayerDataOD) ## Help on function getPlayerDataOD in module cricpy.analytics: ## ## getPlayerDataOD(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True) ## Get the One day player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory ## ## Description ## ## Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player ## ## Usage ## ## getPlayerDataOD(profile, opposition="",host="",dir = "../", file = "player001.csv", ## type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5)) ## Arguments ## ## profile ## This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virender Sehwag this turns out to be http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263 ## opposition The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,Bermuda:12, England:1,Hong Kong:19,India:6,Ireland:29, Netherlands:15,New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Africa XI:405 Note: If no value is entered for opposition then all teams are considered ## host The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,Ireland:29,Malaysia:16,New Zealand:5,Pakistan:7, Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered ## dir ## Name of the directory to store the player data into. If not specified the data is stored in a default directory "../data". Default="../data" ## file ## Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv" ## type ## type of data required. This can be "batting" or "bowling" ## homeOrAway ## This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue ## result ## This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result ## Details ## ## More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI ## ## Value ## ## Returns the player's dataframe ## ## Note ## ## Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com> ## ## Author(s) ## ## Tinniam V Ganesh ## ## References ## ## http://www.espncricinfo.com/ci/content/stats/index.html ## https://gigadom.wordpress.com/ ## ## See Also ## ## getPlayerDataSp getPlayerData ## ## Examples ## ## ## ## Not run: ## # Both home and away. Result = won,lost and drawn ## sehwag =getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag1.csv", ## type="batting", homeOrAway=[1,2],result=[1,2,3,4]) ## ## # Only away. Get data only for won and lost innings ## sehwag = getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag2.csv", ## type="batting",homeOrAway=[2],result=[1,2]) ## ## # Get bowling data and store in file for future ## malinga = getPlayerData(49758,dir="../cricketr/data",file="malinga1.csv", ## type="bowling") ## ## # Get Dhoni's ODI record in Australia against Australua ## dhoni = getPlayerDataOD(28081,opposition = 2,host=2,dir=".", ## file="dhoniVsAusinAusOD",type="batting") ## ## ## End(Not run) The details below will introduce the different functions that are available in cricpy. ## 4. Get the ODI player data for a player using the function getPlayerDataOD() Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataOD for all subsequent analyses import cricpy.analytics as ca #sehwag=ca.getPlayerDataOD(35263,dir=".",file="sehwag.csv",type="batting") #kohli=ca.getPlayerDataOD(253802,dir=".",file="kohli.csv",type="batting") #jayasuriya=ca.getPlayerDataOD(49209,dir=".",file="jayasuriya.csv",type="batting") #gayle=ca.getPlayerDataOD(51880,dir=".",file="gayle.csv",type="batting") Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test & ODI records ## 5 Virat Kohli’s performance – Basic Analyses The 3 plots below provide the following for Virat Kohli 1. Frequency percentage of runs in each run range over the whole career 2. Mean Strike Rate for runs scored in the given range 3. A histogram of runs frequency percentages in runs ranges import cricpy.analytics as ca import matplotlib.pyplot as plt ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli") ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli") ca.batsmanRunsRanges("./kohli.csv","Virat Kohli") ## 6. More analyses import cricpy.analytics as ca ca.batsman4s("./kohli.csv","Virat Kohli") ca.batsman6s("./kohli.csv","Virat Kohli") ca.batsmanDismissals("./kohli.csv","Virat Kohli") ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli") ## 7. 3D scatter plot and prediction plane The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease import cricpy.analytics as ca ca.battingPerf3d("./kohli.csv","Virat Kohli") ## Average runs at different venues The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis. import cricpy.analytics as ca ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli") ## 9. Average runs against different opposing teams This plot computes the average runs scored by Kohli against different countries. import cricpy.analytics as ca ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli") ## 10 . Highest Runs Likelihood The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means import cricpy.analytics as ca ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli") # A look at the Top 4 batsman – Kohli, Jayasuriya, Sehwag and Gayle The following batsmen have been very prolific in ODI cricket and will be used for the analyses 1. Virat Kohli: Runs – 10232, Average:59.83 ,Strike rate-92.88 2. Sanath Jayasuriya : Runs – 13430, Average:32.36 ,Strike rate-91.2 3. Virendar Sehwag :Runs – 8273, Average:35.05 ,Strike rate-104.33 4. Chris Gayle : Runs – 9727, Average:37.12 ,Strike rate-85.82 The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs ## 12. Box Histogram Plot This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency import cricpy.analytics as ca ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli") ca.batsmanPerfBoxHist("./jayasuriya.csv","Sanath jayasuriya") ca.batsmanPerfBoxHist("./gayle.csv","Chris Gayle") ca.batsmanPerfBoxHist("./sehwag.csv","Virendar Sehwag") ## 13 Moving Average of runs in career Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Kohli’s performance has been steadily improving over the years, so has Sehwag. Gayle seems to be on the way down import cricpy.analytics as ca ca.batsmanMovingAverage("./kohli.csv","Virat Kohli") ca.batsmanMovingAverage("./jayasuriya.csv","Sanath jayasuriya") ca.batsmanMovingAverage("./gayle.csv","Chris Gayle") ca.batsmanMovingAverage("./sehwag.csv","Virendar Sehwag") ## 14 Cumulative Average runs of batsman in career This function provides the cumulative average runs of the batsman over the career. Kohli seems to be getting better with time and reaches a cumulative average of 45+. Sehwag improves with time and reaches around 35+. Chris Gayle drops from 42 to 35 import cricpy.analytics as ca ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli") ca.batsmanCumulativeAverageRuns("./jayasuriya.csv","Sanath jayasuriya") ca.batsmanCumulativeAverageRuns("./gayle.csv","Chris Gayle") ca.batsmanCumulativeAverageRuns("./sehwag.csv","Virendar Sehwag") ## 15 Cumulative Average strike rate of batsman in career Sehwag has the best strike rate of almost 90. Kohli and Jayasuriya have a cumulative strike rate of 75. import cricpy.analytics as ca ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli") ca.batsmanCumulativeStrikeRate("./jayasuriya.csv","Sanath jayasuriya") ca.batsmanCumulativeStrikeRate("./gayle.csv","Chris Gayle") ca.batsmanCumulativeStrikeRate("./sehwag.csv","Virendar Sehwag") ## 16 Relative Batsman Cumulative Average Runs The plot below compares the Relative cumulative average runs of the batsman . It can be seen that Virat Kohli towers above all others in the runs. He is followed by Chris Gayle and then Sehwag import cricpy.analytics as ca frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"] names = ["Sehwag","Gayle","Jayasuriya","Kohli"] ca.relativeBatsmanCumulativeAvgRuns(frames,names) ## Relative Batsman Strike Rate The plot below gives the relative Runs Frequency Percentages for each 10 run bucket. The plot below show Sehwag has the best strike rate, followed by Jayasuriya import cricpy.analytics as ca frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"] names = ["Sehwag","Gayle","Jayasuriya","Kohli"] ca.relativeBatsmanCumulativeStrikeRate(frames,names) ## 18. 3D plot of Runs vs Balls Faced and Minutes at Crease The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted import cricpy.analytics as ca ca.battingPerf3d("./kohli.csv","Virat Kohli") ca.battingPerf3d("./jayasuriya.csv","Sanath jayasuriya") ca.battingPerf3d("./gayle.csv","Chris Gayle") ca.battingPerf3d("./sehwag.csv","Virendar Sehwag") ## 3D plot of Runs vs Balls Faced and Minutes at Crease From the plot below it can be seen that Sehwag has more runs by way of 4s than 1’s,2’s or 3s. Gayle and Jayasuriya have large number of 6s import cricpy.analytics as ca frames = ["./sehwag.csv","./kohli.csv","./gayle.csv","./jayasuriya.csv"] names = ["Sehwag","Kohli","Gayle","Jayasuriya"] ca.batsman4s6s(frames,names) ## 20. Predicting Runs given Balls Faced and Minutes at Crease A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. import cricpy.analytics as ca import numpy as np import pandas as pd BF = np.linspace( 10, 400,15) Mins = np.linspace( 30,600,15) newDF= pd.DataFrame({'BF':BF,'Mins':Mins}) kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli") print(kohli) ## BF Mins Runs ## 0 10.000000 30.000000 6.807407 ## 1 37.857143 70.714286 36.034833 ## 2 65.714286 111.428571 65.262259 ## 3 93.571429 152.142857 94.489686 ## 4 121.428571 192.857143 123.717112 ## 5 149.285714 233.571429 152.944538 ## 6 177.142857 274.285714 182.171965 ## 7 205.000000 315.000000 211.399391 ## 8 232.857143 355.714286 240.626817 ## 9 260.714286 396.428571 269.854244 ## 10 288.571429 437.142857 299.081670 ## 11 316.428571 477.857143 328.309096 ## 12 344.285714 518.571429 357.536523 ## 13 372.142857 559.285714 386.763949 ## 14 400.000000 600.000000 415.991375 The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. ## 21 Analysis of Top Bowlers The following 4 bowlers have had an excellent career and will be used for the analysis 1. Muthiah Muralitharan:Wickets: 534, Average = 23.08, Economy Rate – 3.93 2. Wasim Akram : Wickets: 502, Average = 23.52, Economy Rate – 3.89 3. Shaun Pollock: Wickets: 393, Average = 24.50, Economy Rate – 3.67 4. Javagal Srinath : Wickets:315, Average – 28.08, Economy Rate – 4.44 How do Muralitharan, Akram, Pollock and Srinath compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses. ## 22. Get the bowler’s data This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line import cricpy.analytics as ca #akram=ca.getPlayerDataOD(43547,dir=".",file="akram.csv",type="bowling") #murali=ca.getPlayerDataOD(49636,dir=".",file="murali.csv",type="bowling") #pollock=ca.getPlayerDataOD(46774,dir=".",file="pollock.csv",type="bowling") #srinath=ca.getPlayerDataOD(34105,dir=".",file="srinath.csv",type="bowling") ## 23. Wicket Frequency Plot This plot below plots the frequency of wickets taken for each of the bowlers import cricpy.analytics as ca ca.bowlerWktsFreqPercent("./murali.csv","M Muralitharan") ca.bowlerWktsFreqPercent("./akram.csv","Wasim Akram") ca.bowlerWktsFreqPercent("./pollock.csv","Shaun Pollock") ca.bowlerWktsFreqPercent("./srinath.csv","J Srinath") ## 24. Wickets Runs plot The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken. Murali’s median runs for wickets ia around 40 while Akram, Pollock and Srinath it is around 32+ runs. The spread around the median is larger for these 3 bowlers in comparison to Murali import cricpy.analytics as ca ca.bowlerWktsRunsPlot("./murali.csv","M Muralitharan") ca.bowlerWktsRunsPlot("./akram.csv","Wasim Akram") ca.bowlerWktsRunsPlot("./pollock.csv","Shaun Pollock") ca.bowlerWktsRunsPlot("./srinath.csv","J Srinath") ## 25 Average wickets at different venues The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur import cricpy.analytics as ca ca.bowlerAvgWktsGround("./murali.csv","M Muralitharan") ca.bowlerAvgWktsGround("./akram.csv","Wasim Akram") ca.bowlerAvgWktsGround("./pollock.csv","Shaun Pollock") ca.bowlerAvgWktsGround("./srinath.csv","J Srinath") ## 26 Average wickets against different opposition The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team import cricpy.analytics as ca ca.bowlerAvgWktsOpposition("./murali.csv","M Muralitharan") ca.bowlerAvgWktsOpposition("./akram.csv","Wasim Akram") ca.bowlerAvgWktsOpposition("./pollock.csv","Shaun Pollock") ca.bowlerAvgWktsOpposition("./srinath.csv","J Srinath") ## 27 Wickets taken moving average From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets import cricpy.analytics as ca ca.bowlerMovingAverage("./murali.csv","M Muralitharan") ca.bowlerMovingAverage("./akram.csv","Wasim Akram") ca.bowlerMovingAverage("./pollock.csv","Shaun Pollock") ca.bowlerMovingAverage("./srinath.csv","J Srinath") ## 28 Cumulative average wickets taken The plots below give the cumulative average wickets taken by the bowlers. Muralitharan has consistently taken wickets at an average of 1.6 wickets per game. Shaun Pollock has an average of 1.5 import cricpy.analytics as ca ca.bowlerCumulativeAvgWickets("./murali.csv","M Muralitharan") ca.bowlerCumulativeAvgWickets("./akram.csv","Wasim Akram") ca.bowlerCumulativeAvgWickets("./pollock.csv","Shaun Pollock") ca.bowlerCumulativeAvgWickets("./srinath.csv","J Srinath") ## 29 Cumulative average economy rate The plots below give the cumulative average economy rate of the bowlers. Pollock is the most economical, followed by Akram and then Murali import cricpy.analytics as ca ca.bowlerCumulativeAvgEconRate("./murali.csv","M Muralitharan") ca.bowlerCumulativeAvgEconRate("./akram.csv","Wasim Akram") ca.bowlerCumulativeAvgEconRate("./pollock.csv","Shaun Pollock") ca.bowlerCumulativeAvgEconRate("./srinath.csv","J Srinath") ## 30 Relative cumulative average economy rate of bowlers The Relative cumulative economy rate shows that Pollock is the most economical of the 4 bowlers. He is followed by Akram and then Murali import cricpy.analytics as ca frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"] names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"] ca.relativeBowlerCumulativeAvgEconRate(frames,names) ## 31 Relative Economy Rate against wickets taken Pollock is most economical vs number of wickets taken. Murali has the best figures for 4 wickets taken. import cricpy.analytics as ca frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"] names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"] ca.relativeBowlingER(frames,names) ## 32 Relative cumulative average wickets of bowlers in career The plot below shows that McGrath has the best overall cumulative average wickets. While the bowlers are neck to neck around 130 innings, you can see Muralitharan is most consistent and leads the pack after 150 innings in the number of wickets taken. import cricpy.analytics as ca frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"] names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"] ca.relativeBowlerCumulativeAvgWickets(frames,names) # 33. Key Findings The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use. Here are the main findings from the analysis above ## Analysis of Top 4 batsman The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing 1. Kohli is a mean run machine and has been consistently piling on runs. Clearly records will lay shattered in days to come for Kohli 2. Virendar Sehwag has the best strike rate of the 4, followed by Jayasuriya and then Kohli 3. Shaun Pollock is the most economical of the bowlers followed by Wasim Akram 4. Muralitharan is the most consistent wicket of the lot. Important note: Do check out my other posts using cricpy at cricpy-posts To see all posts click Index of Posts # Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket In my recent post, My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI), I had recounted my journey in the domains of of Data Science, Machine Learning (ML), and more recently Deep Learning (DL) all of which are useful while analyzing data. Of late, I have come to the realization that there are many facets to data. And to glean insights from data, Data Science, ML and DL alone are not sufficient and one needs to also have a good handle on linear programming and optimization. My colleague at IBM Research also concurred with this view and told me he had arrived at this conclusion several years ago. If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

While ML & DL are very useful and interesting to make inferences and predictions of outputs from input variables, optimization computes the choice of input which results in maximizing or minimizing the output. So I made a small course correction and started on a course from India’s own NPTEL Introduction to Linear Programming by Prof G. Srinivasan of IIT Madras (highly recommended!). The lectures are delivered with remarkable clarity by the Prof and I am just about halfway through the course (each lecture is of 50-55 min duration), when I decided that I needed to try to formulate and solve some real world Linear Programming problem.

As usual, I turned towards cricket for some appropriate situations, and sure enough it was there in the open. For this LP formulation I take International T20 and IPL, though International ODI will also work equally well.  You can download the associated code and data for this from Github at LP-cricket-analysis

In T20 matches the captain has to make choice of how to rotate bowlers with the aim of restricting the batting side. Conversely, the batsmen need to take advantage of the bowling strength to maximize the runs scored.

Note:
a) A simple and obvious strategy would be
– If the ith bowler’s economy rate is less than the economy rate of the jth bowler i.e.
$er_{i}$ < $er_{j}$ then have bowler ‘i’ to bowl more overs as his/her economy rate is better

b)A better strategy would be to consider the economy rate of each bowler against each batsman. How often  have we witnessed bowlers with a great bowling average get thrashed time and again by the same batsman, or a bowler who is generally very poor being very effective against a particular batsman. i.e. $er_{ij}$ < $er_{ik}$ where the jth bowler is more effective than the kth bowler against the ith batsman. This now becomes a linear optimization problem as we can have several combinations of number of overs X economy rate for different bowlers and we will have to solve this algorithmically to determine the lowest score for bowling performance or highest score for batting order.

This post uses the latter approach to optimize bowling change and batting lineup.

Let is take a hypothetical situation
Assume there are 3 bowlers – $bwlr_{1},bwlr_{2},bwlr_{3}$
and there are 4 batsmen – $bman_{1},bman_{2},bman_{3},bman_{4}$

Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman. Also if remaining overs for the bowlers are $o_{1},o_{2},o_{3}$
and the total number of overs left to be bowled are
$o_{1}+o_{2}+o_{3} = N$ then the question is

a) Given the economy rate of each bowler per batsman, how many overs should each bowler bowl, so that the total runs scored by all the batsmen are minimum?

b) Alternatively, if the know the individual strike rate of a batsman against the individual bowlers, how many overs should each batsman face with a bowler so that the total runs scored is maximized?

## 1. LP Formulation for bowling order

Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman.
Objective function : Minimize –
$er_{11}*o_{11} + er_{12}*o_{12} +..+er_{1n}*o_{1n}+ er_{21}*o_{21} + er_{22}*o_{22}+.. + er_{22}*o_{2n}+ er_{m1}*o_{m1}+..+ er_{mn}*o_{mn}$
i.e.
$\sum_{i=1}^{i=m}\sum_{j=1}^{i=n}er_{ij}*o_{ij}$
Constraints
Where $o_{j}$ is the number of overs remaining for the jth bowler against  ‘k’ batsmen
$o_{j1} + o_{j2} + .. o_{jk} < o_{j}$
and if the total number of overs remaining to be bowled is N then
$o_{1} + o_{2} +...+ o_{k} = N$ or
$\sum_{j=1}^{j=k} o_{j} =N$
The overs that any bowler can bowl is $o_{j} >=0$

## 2. LP Formulation for batting lineup

Let the strike rate $sr_{ij}$  be the Strike Rate of the ith batsman to the jth bowler
Objective function : Maximize –
$sr_{11}*o_{11} + sr_{12}*o_{12} +..+ sr_{1n}*o_{1n}+ sr_{21}*o_{21} + sr_{22}*o_{22}+.. sr_{2n}*o_{2n}+ sr_{m1}*o_{m1}+..+ sr_{mn}*o_{mn}$
i.e.
$\sum_{i=1}^{i=4}\sum_{j=1}^{i=3}sr_{ij}*o_{ij}$
Constraints
Where $o_{j}$ is the number of overs remaining for the jth bowler against  ‘k’ batsmen
$o_{j1} + o_{j2} + .. o_{jk} < o_{j}$
and the total number of overs remaining to be bowled is N then
$o_{1} + o_{2} +...+ o_{k} = N$ or
$\sum_{j=1}^{j=k} o_{j} =N$
The overs that any bowler can bowl is
$o_{j} >=0$

lpSolveAPI– For this maximization and minimization problem I used lpSolveAPI.

Below I take 2 simple examples (example1 & 2)  to ensure that my LP formulation and solution is correct before applying it on real T20 cricket data (Intl. T20 and IPL)

## 3. LP formulation (Example 1)

Initially I created a test example to ensure that I get the LP formulation and solution correct. Here the er1=4 and er2=3 and o1 & o2 are the overs bowled by bowlers 1 & 2. Also o1+o2=4 In this example as below

o1 o2 Obj Fun(=4o1+3o2)
1    3      13
2    2      14
3    1      15

library(lpSolveAPI)
library(dplyr)
library(knitr)
lprec <- make.lp(0, 2)
a <-lp.control(lprec, sense="min")
set.objfn(lprec, c(4, 3))  # Economy Rate of 4 and 3 for er1 and er2
add.constraint(lprec, c(1, 1), "=",4)  # o1 + o2 =4
add.constraint(lprec, c(1, 0), ">",1)  # o1 > 1
add.constraint(lprec, c(0, 1), ">",1)  # o2 > 1
lprec
## Model name:
##             C1    C2
## Minimize     4     3
## R1           1     1   =  4
## R2           1     0  >=  1
## R3           0     1  >=  1
## Kind       Std   Std
## Type      Real  Real
## Upper      Inf   Inf
## Lower        0     0
b <-solve(lprec)
get.objective(lprec) # 13
## [1] 13
get.variables(lprec) # 1    3 
## [1] 1 3

Note 1: In the above example 13 runs is the minimum that can be scored and this requires

LP solution:
Minimum runs=13

• o1=1
• o2=3

Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman.

## 4. LP formulation (Example 2)

In this formulation there are 2 bowlers and 2 batsmen o11,o12 are the oves bowled by bowler 1 to batsmen 1 & 2 and o21, o22 are the overs bowled by bowler 2 to batsmen 1 & 2 er11=4, er12=2,er21=2,er22=5 o11+o12+o21+o22=5

The solution for this manually computed is o11, o12, o21, o22 Runs
where B11, B12 are the overs bowler 1 bowls to batsman 1 and B21 and B22 are overs bowler 2 bowls to batsman 2

o11     o12    o21    o22      Runs=(4*o11+2*o12+2*o21+5*o22)
1            1             1            2           18
1           2              1             1           15
2           1              1            1            17
1           1               2            1            15

lprec <- make.lp(0, 4)
a <-lp.control(lprec, sense="min")
set.objfn(lprec, c(4, 2,2,5))
add.constraint(lprec, c(1, 1,0,0), "<=",8)
add.constraint(lprec, c(0, 0,1,1), "<=",7)
add.constraint(lprec, c(1, 1,1,1), "=",5)
add.constraint(lprec, c(1, 0,0,0), ">",1)
add.constraint(lprec, c(0, 1,0,0), ">",1)
add.constraint(lprec, c(0, 0,1,0), ">",1)
add.constraint(lprec, c(0, 0,0,1), ">",1)
lprec
## Model name:
##             C1    C2    C3    C4
## Minimize     4     2     2     5
## R1           1     1     0     0  <=  8
## R2           0     0     1     1  <=  7
## R3           1     1     1     1   =  5
## R4           1     0     0     0  >=  1
## R5           0     1     0     0  >=  1
## R6           0     0     1     0  >=  1
## R7           0     0     0     1  >=  1
## Kind       Std   Std   Std   Std
## Type      Real  Real  Real  Real
## Upper      Inf   Inf   Inf   Inf
## Lower        0     0     0     0
b<-solve(lprec)
get.objective(lprec) 
## [1] 15
get.variables(lprec) 
## [1] 1 2 1 1

Note: In the above example 15 runs is the minimum that can be scored and this requires

LP Solution:
Minimum runs=15

• o11=1
• o12=2
• o21=1
• o22=1

It is possible to keep the minimum to other values and solves also.

## 5. LP formulation for International T20 India vs Australia (Batting lineup)

To analyze batting and bowling lineups in the cricket world I needed to get the ball-by-ball details of runs scored by each batsman against each of the bowlers. Fortunately I had already created this with my R package yorkr. yorkr processes yaml data from Cricsheet. So I copied the data of all matches between Australia and India in International T20s. You can download my processed data for International T20 at Inswinger

load("Australia-India-allMatches.RData")
dim(matches)
## [1] 3541   25

The following functions compute the ‘Strike Rate’ of a batsman as

SR=1/oversRunsScored

Also the Economy Rate is computed as

ER=1/oversRunsConceded

Incidentally the SR=ER

# Compute the Strike Rate of the batsman
computeSR <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6)
a1
}

# Compute the Economy Rate of the batsman
computeER <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(ER=(totalRuns/count)*6)
a1
}

Here I compute the Strike Rate of Virat Kohli, Yuvraj Singh and MS Dhoni against Shane Watson, Brett Lee and MA Starc

 # Kohli
kohliWatson<- computeSR("V Kohli","SR Watson")
kohliWatson
##   totalRuns count       SR
## 1        45    37 7.297297
kohliLee <- computeSR("V Kohli","B Lee")
kohliLee
##   totalRuns count       SR
## 1        10     7 8.571429
kohliStarc <- computeSR("V Kohli","MA Starc")
kohliStarc
##   totalRuns count       SR
## 1        11     9 7.333333
# Yuvraj
yuvrajWatson<- computeSR("Yuvraj Singh","SR Watson")
yuvrajWatson
##   totalRuns count       SR
## 1        24    22 6.545455
yuvrajLee <- computeSR("Yuvraj Singh","B Lee")
yuvrajLee
##   totalRuns count       SR
## 1        12     7 10.28571
yuvrajStarc <- computeSR("Yuvraj Singh","MA Starc")
yuvrajStarc
##   totalRuns count SR
## 1        12     8  9
# MS Dhoni
dhoniWatson<- computeSR("MS Dhoni","SR Watson")
dhoniWatson
##   totalRuns count       SR
## 1        33    28 7.071429
dhoniLee <- computeSR("MS Dhoni","B Lee")
dhoniLee
##   totalRuns count  SR
## 1        26    20 7.8
dhoniStarc <- computeSR("MS Dhoni","MA Starc")
dhoniStarc
##   totalRuns count   SR
## 1        11     8 8.25

When we consider the batting lineup, the problem is one of maximization. In the LP formulation below V Kohli has a SR of 7.29, 8.57, 7.33 against Watson, Lee & Starc
Yuvraj has a SR of 6.5, 10.28, 9 against Watson, Lee & Starc
and Dhoni has a SR of 7.07, 7.8,  8.25 against Watson, Lee and Starc

The constraints are Watson, Lee and Starc have 3, 4 & 3 overs remaining respectively. The total number of overs remaining to be bowled is 9.The other constraints could be that a bowler bowls at least 1 over etc.

Formulating and solving

# 3 batsman x 3 bowlers
lprec <- make.lp(0, 9)
# Maximization
a<-lp.control(lprec, sense="max")

# Set the objective function
set.objfn(lprec, c(kohliWatson$SR, kohliLee$SR,kohliStarc$SR, yuvrajWatson$SR,yuvrajLee$SR,yuvrajStarc$SR,
dhoniWatson$SR,dhoniLee$SR,dhoniStarc$SR)) #Assume the bowlers have 3,4,3 overs left respectively add.constraint(lprec, c(1, 1,1,0,0,0, 0,0,0), "<=",3) add.constraint(lprec, c(0,0,0,1,1,1,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,0,0,1,1,1), "<=",3) #o11+o12+o13+o21+o22+o23+o31+o32+o33=8 (overs remaining) add.constraint(lprec, c(1,1,1,1,1,1,1,1,1), "=",9) add.constraint(lprec, c(1,0,0,0,0,0,0,0,0), ">=",1) #o11 >=1 add.constraint(lprec, c(0,1,0,0,0,0,0,0,0), ">=",0) #o12 >=0 add.constraint(lprec, c(0,0,1,0,0,0,0,0,0), ">=",0) #o13 >=0 add.constraint(lprec, c(0,0,0,1,0,0,0,0,0), ">=",1) #o21 >=1 add.constraint(lprec, c(0,0,0,0,1,0,0,0,0), ">=",1) #o22 >=1 add.constraint(lprec, c(0,0,0,0,0,1,0,0,0), ">=",0) #o23 >=0 add.constraint(lprec, c(0,0,0,0,0,0,1,0,0), ">=",1) #o31 >=1 add.constraint(lprec, c(0,0,0,0,0,0,0,1,0), ">=",0) #o32 >=0 add.constraint(lprec, c(0,0,0,0,0,0,0,0,1), ">=",0) #o33 >=0 lprec ## Model name: ## a linear program with 9 decision variables and 13 constraints b <-solve(lprec) get.objective(lprec) #  ## [1] 77.16418 get.variables(lprec) #  ## [1] 1 2 0 1 3 0 1 0 1 This shows that the maximum runs that can be scored for the current strike rate is 77.16 runs in 9 overs The breakup is as follows This is also shown below get.variables(lprec) #  ## [1] 1 2 0 1 3 0 1 0 1 This is also shown below e <- as.data.frame(rbind(c(1,2,0,3),c(1,3,0,4),c(1,0,1,2))) names(e) <- c("S Watson","B Lee","MA Starc","Overs") rownames(e) <- c("Kohli","Yuvraj","Dhoni") e LP Solution: Maximum runs that can be scored by India against Australia is:77.164 if the 9 overs to be faced by the batsman are as below ## S Watson B Lee MA Starc Overs ## Kohli 1 2 0 3 ## Yuvraj 1 3 0 4 ## Dhoni 1 0 1 2 #Total overs=9 Note: This assumes that the batsmen perform at their current Strike Rate. Howvever anything can happen in a real game, but nevertheless this is a fairly reasonable estimate of the performance Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman. Note 3:You could try other combinations of overs for the above SR. For the above constraints 77.16 is the highest score for the given number of overs ## 6. LP formulation for International T20 India vs Australia (Bowling lineup) For this I compute how the bowling should be rotated between R Ashwin, RA Jadeja and JJ Bumrah when taking into account their performance against batsmen like Shane Watson, AJ Finch and David Warner. For the bowling performance I take the Economy rate of the bowlers. The data is the same as above computeSR <- function(batsman1,bowler1){ a <- matches %>% filter(batsman==batsman1 & bowler==bowler1) a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6) a1 } # RA Jadeja jadejaWatson<- computeER("SR Watson","RA Jadeja") jadejaWatson ## totalRuns count ER ## 1 60 29 12.41379 jadejaFinch <- computeER("AJ Finch","RA Jadeja") jadejaFinch ## totalRuns count ER ## 1 36 33 6.545455 jadejaWarner <- computeER("DA Warner","RA Jadeja") jadejaWarner ## totalRuns count ER ## 1 23 11 12.54545 # Ashwin ashwinWatson<- computeER("SR Watson","R Ashwin") ashwinWatson ## totalRuns count ER ## 1 41 26 9.461538 ashwinFinch <- computeER("AJ Finch","R Ashwin") ashwinFinch ## totalRuns count ER ## 1 63 36 10.5 ashwinWarner <- computeER("DA Warner","R Ashwin") ashwinWarner ## totalRuns count ER ## 1 38 28 8.142857 # JJ Bunrah bumrahWatson<- computeER("SR Watson","JJ Bumrah") bumrahWatson ## totalRuns count ER ## 1 22 20 6.6 bumrahFinch <- computeER("AJ Finch","JJ Bumrah") bumrahFinch ## totalRuns count ER ## 1 25 19 7.894737 bumrahWarner <- computeER("DA Warner","JJ Bumrah") bumrahWarner ## totalRuns count ER ## 1 2 4 3 As can be seen from above RA Jadeja has a ER of 12.4, 6.54, 12.54 against Watson, AJ Finch and Warner also Ashwin has a ER of 9.46, 10.5, 8.14 against Watson, Finch and Warner. Similarly Bumrah has an ER of 6.6,7.89, 3 against Watson, Finch and Warner The constraints are Jadeja, Ashwin and Bumrah have 4, 3 & 4 overs remaining and the total overs remaining to be bowled is 10. Formulating solving the bowling lineup is shown below lprec <- make.lp(0, 9) a <-lp.control(lprec, sense="min") # Set the objective function set.objfn(lprec, c(jadejaWatson$ER, jadejaFinch$ER,jadejaWarner$ER,
ashwinWatson$ER,ashwinFinch$ER,ashwinWarner$ER, bumrahWatson$ER,bumrahFinch$ER,bumrahWarner$ER))

add.constraint(lprec, c(1, 1,1,0,0,0, 0,0,0), "<=",4) # Jadeja has 4 overs
add.constraint(lprec, c(0,0,0,1,1,1,0,0,0), "<=",3)   # Ashwin has 3 overs left
add.constraint(lprec, c(0,0,0,0,0,0,1,1,1), "<=",4)   # Bumrah has 4 overs left
add.constraint(lprec, c(1,1,1,1,1,1,1,1,1), "=",10) # Total overs = 10

lprec
## Model name:
##   a linear program with 9 decision variables and 13 constraints
b <-solve(lprec)
get.objective(lprec) #  
## [1] 73.58775
get.variables(lprec) # 
## [1] 1 2 1 0 1 1 0 1 3

The minimum runs that will be conceded by these 3 bowlers in 10 overs is 73.58 assuming the bowling is rotated as follows

e <- as.data.frame(rbind(c(1,0,0),c(2,1,1),c(1,1,3),c(4,2,4)))
names(e) <- c("RA Jadeja","R Ashwin","JJ Bumrah")
rownames(e) <- c("S Watson","AJ Finch","DA Warner","Overs")
e 

LP Solution:
Minimum runs that will be conceded by India against Australia is 73.58 in 10 overs if the overs bowled are as follows

##           RA Jadeja R Ashwin JJ Bumrah
## S Watson          1        0         0
## AJ Finch          2        1         1
## DA Warner         1        1         3
## Overs             4        2         4
#Total overs=10  

## 7. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Bowling lineup)

As in the case of International T20s I also have processed IPL data derived from my R package yorkr. yorkr. yorkr processes yaml data from Cricsheet. The processed data for all IPL matches can be downloaded from GooglyPlus

load("Mumbai Indians-Kolkata Knight Riders-allMatches.RData")
dim(matches)
## [1] 4237   25
# Compute the Economy Rate of the Mumbai Indian bowlers against Kolkata Knight Riders

# Gambhir
gambhirMalinga <- computeER("G Gambhir","SL Malinga")
gambhirHarbhajan <- computeER("G Gambhir","Harbhajan Singh")
gambhirPollard <- computeER("G Gambhir","KA Pollard")

#Yusuf Pathan
yusufMalinga <- computeER("YK Pathan","SL Malinga")
yusufHarbhajan <- computeER("YK Pathan","Harbhajan Singh")
yusufPollard <- computeER("YK Pathan","KA Pollard")

#JH Kallis
kallisMalinga <- computeER("JH Kallis","SL Malinga")
kallisHarbhajan <- computeER("JH Kallis","Harbhajan Singh")
kallisPollard <- computeER("JH Kallis","KA Pollard")

#RV Uthappa
uthappaMalinga <- computeER("RV Uthappa","SL Malinga")
uthappaHarbhajan <- computeER("RV Uthappa","Harbhajan Singh")
uthappaPollard <- computeER("RV Uthappa","KA Pollard")

Here

gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the ER of Malinga against Gambhir, Yusuf Pathan, Kallis and Uthappa
gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the ER of Harbhajan against Gambhir, Yusuf Pathan, Kallis and Uthappa
gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the ER of Kieron Pollard against Gambhir, Yusuf Pathan, Kallis and Uthappa

The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10.

Formulating and solving this for the bowling lineup of Mumbai Indians against Kolkata Knight Riders

 library("lpSolveAPI")
lprec <- make.lp(0, 12)
a=lp.control(lprec, sense="min")

set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER,
gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER,
gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER))

add.constraint(lprec, c(1,1,1,1, 0,0,0,0, 0,0,0,0), "<=",4)

lprec
## Model name:
##   a linear program with 12 decision variables and 16 constraints
 b=solve(lprec)
get.objective(lprec) #  
## [1] 55.57887
 get.variables(lprec) # 
##  [1] 3 1 0 0 0 1 0 1 3 1 0 0
e <- as.data.frame(rbind(c(3,1,0,0,4),c(0, 1, 0,1,2),c(3, 1, 0,0,4)))
names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs")
rownames(e) <- c("Malinga","Harbhajan","Pollard")
e

LP Solution: Mumbai Indians can restrict Kolkata Knight Riders to 55.87 in 10 overs
if the overs are bowled as below

##           Gambhir Yusuf Kallis Uthappa Overs
## Malinga         3     1      0       0     4
## Harbhajan       0     1      0       1     2
## Pollard         3     1      0       0     4
#Total overs=10  

## 8. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Batting lineup)

As I mentioned it is possible to perform a maximation with the same formulation since computeSR<==>computeER

This just flips the problem around and computes the maximum runs that can be scored for the batsman’s Strike rate (this is same as the bowler’s Economy rate) i.e.

gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Malinga
gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Harbhajan
gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Kieron Pollard.

The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10.

 library("lpSolveAPI")
lprec <- make.lp(0, 12)
a=lp.control(lprec, sense="max")

a <-set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER,
gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER,
gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER))

add.constraint(lprec, c(1,1,1,1, 0,0,0,0, 0,0,0,0), "<=",4)

lprec
## Model name:
##   a linear program with 12 decision variables and 16 constraints
 b=solve(lprec)
get.objective(lprec) #  
## [1] 94.22649
 get.variables(lprec) # 
##  [1] 0 3 0 0 0 1 0 3 0 1 3 0
e <- as.data.frame(rbind(c(0,3,0,0,3),c(0, 1, 0,3,4),c(0, 1, 3,0,4)))
names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs")
rownames(e) <- c("Malinga","Harbhajan","Pollard")
e

LP Solution: Kolkata Knight Riders can score a maximum of 94.22 in 11 overs against Mumbai Indians
if the the number of overs KKR face is as below

##           Gambhir Yusuf Kallis Uthappa Overs
## Malinga         0     3      0       0     3
## Harbhajan       0     1      0       3     4
## Pollard         0     1      3       0     4
#Total overs=11  

Conclusion: It is possible to thus determine the optimum no of overs to give to a specific bowler based on his/her Economy Rate with a particular batsman. Similarly one can determine the maximum runs that can be scored by a batsmen based on their strike rate with bowlers. Cricket like many other games is a game of strategy, skill, talent and some amount of luck. So while the LP formulation can provide some direction,  one must be aware anything could happen in a game of cricket!

Thoughts, comments, suggestions welcome!

To see all posts see Index of Posts

# cricketr flexes new muscles: The final analysis

Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

       Jabberwocky by Lewis Carroll


No analysis of cricket is complete, without determining how players would perform in the host country. Playing Test cricket on foreign pitches, in the host country, is a ‘real test’ for both batsmen and bowlers. Players, who can perform consistently both on domestic and foreign pitches are the genuinely ‘class’ players. Player performance on foreign pitches lets us differentiate the paper tigers, and home ground bullies among batsmen. Similarly, spinners who perform well, only on rank turners in home ground or pace bowlers who can only swing and generate bounce on specially prepared pitches are neither  genuine spinners nor  real pace bowlers.

So this post, helps in identifying those with real strengths, and those who play good only when the conditions are in favor, in home grounds. This post brings a certain level of finality to the analysis of players with my R package ‘cricketr’

Besides, I also meant ‘final analysis’ in the literal sense, as I intend to take a long break from cricket analysis/analytics and focus on some other domains like Neural Networks, Deep Learning and Spark.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/\$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

As already mentioned, my R package ‘cricketr’ uses the statistics info available in ESPN Cricinfo Statsguru. You should be able to install the package from CRAN and use many of the functions available in the package. Please be mindful of ESPN Cricinfo Terms of Use

(Note: This page is also hosted at RPubs as cricketrFinalAnalysis. You can download the PDF file at cricketrFinalAnalysis.

Important note: Do check out my other posts using cricketr at cricketr-posts

For getting data of a player against a particular country for the match played in the host country, I just had to add 2 extra parameters to the getPlayerData() function. The cricketr package has been updated with the changed functions for getPlayerData() – Tests, getPlayerDataOD() – ODI and getPlayerDataTT() for the Twenty20s. The updated functions will be available in cricketr Version -0.0.14

The data for the following players have already been obtained with the new, changed getPlayerData() function and have been saved as *.csv files. I will be re-using these files, instead of getting them all over again. Hence the getPlayerData() lines have been commented below

library(cricketr)

#### 1. Performance of a batsman against a host ountry in the host country

For e.g We can the get the data for Sachin Tendulkar for matches played against Australia and in Australia Here opposition=2 and host =2 indicate that the opposition is Australia and the host country is also Australia

#tendulkarAus=getPlayerData(35320,opposition=2,host=2,file="tendulkarVsAusInAus.csv",type="batting")

All cricketr functions can be used with this data frame, as before. All the charts show the performance of Tendulkar in Australia against Australia.

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsman6s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanRunsRanges("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanDismissals("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanAvgRunsGround("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanMovingAverage("./data/tendulkarVsAusInAus.csv","Tendulkar")

dev.off()
## null device
##           1

# 2. Relative performances of international batsmen against England in England

While we can analyze the performance of a player against an opposition in some host country, I wanted to compare the relative performances of players, to see how players from different nations play in a host country which is not their home ground.

The following lines gets player’s data of matches played in England and against England.The Oval, Lord’s are famous for generating some dangerous swing and bounce. I chose the following players

1. Sir Don Bradman (Australia)
2. Steve Waugh (Australia)
3. Rahul Dravid (India)
4. Vivian Richards (West Indies)
5. Sachin Tendulkar (India)
#tendulkarEng=getPlayerData(35320,opposition=1,host=1,file="tendulkarVsEngInEng.csv",type="batting")
#srwaughEng=getPlayerData(8192,opposition=1,host=1,file="srwaughVsEngInEng.csv",type="batting")
#dravidEng=getPlayerData(28114,opposition=1,host=1,file="dravidVsEngInEng.csv",type="batting")
#vrichardEng=getPlayerData(52812,opposition=1,host=1,file="vrichardsEngInEng.csv",type="batting")
frames <- list("./data/tendulkarVsEngInEng.csv","./data/bradmanVsEngInEng.csv","./data/srwaughVsEngInEng.csv",
"./data/dravidVsEngInEng.csv","./data/vrichardsEngInEng.csv")
names <- list("S Tendulkar","D Bradman","SR Waugh","R Dravid","Viv Richards")

The Lords and the Oval in England are some of the best pitches in the world. Scoring on these pitches and weather conditions, where there is both swing and bounce really requires excellent batting skills. It can be easily seen that Don Bradman stands heads and shoulders over everybody else, averaging close a cumulative average of 100+. He is followed by Viv Richards, who averages around ~60. Interestingly in English conditions, Rahul Dravid edges out Sachin Tendulkar.

relativeBatsmanCumulativeAvgRuns(frames,names)

# The other 2 plots on relative strike rate and cumulative average strike rate,
shows Viv Richards really  blasts the bowling. Viv Richards has a strike rate
of 70, while Bradman 62+, followed by Tendulkar.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

### 3. Relative performances of international batsmen against Australia in Australia

The following players from these countries were chosen

1. Sachin Tendulkar (India)
2. Viv Richard (West Indies)
3. David Gower (England)
4. Jacques Kallis (South Africa)
5. Alastair Cook (Emgland)
frames <- list("./data/tendulkarVsAusInAus.csv","./data/vrichardsVAusInAus.csv","./data/dgowerVsAusInAus.csv",
"./data/kallisVsAusInAus.csv","./data/ancookVsWIInWI.csv")
names <- list("S Tendulkar","Viv Richards","David Gower","J Kallis","AN Cook")

Alastair Cook of England has fantastic cumulative average of 55+ on the pitches of Australia. There is a dip towards the end, but we cannot predict whether it would have continued. AN Cook is followed by Tendulkar who has a steady average of 50+ runs, after which there is Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to cumulative or relative strike rate Viv Richards is a class apart.He seems to really
#tear into bowlers. David Gower has an excellent strike rate and is followed by Tendulkar
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

# 4. Relative performances of international batsmen against India in India

While England & Australia are famous for bouncy tracks with swing, Indian pitches are renowed for being extraordinary turners. Also India has always thrown up world class spinners, from the spin quartet of BS Chandraskehar, Bishen Singh Bedi, EAS Prasanna, S Venkatraghavan, to the times of dangerous Anil Kumble, and now to the more recent Ravichander Ashwon and Harbhajan Singh.

A batsmen who can score runs in India against Indian spinners has to be really adept in handling all kinds of spin.

While Clive Lloyd & Alvin Kallicharan had the best performance against India, they have not been included as ESPN Cricinfo had many of the columns missing.

So I chose the following international players for the analysis against India

1. Hashim Amla (South Africa)
2. Alastair Cook (England)
3. Matthew Hayden (Australia)
4. Viv Richards (West Indies)
frames <- list("./data/amlaVsIndInInd.csv","./data/ancookVsIndInInd.csv","./data/mhaydenVsIndInInd.csv",
"./data/vrichardsVsIndInInd.csv")
names <- list("H Amla","AN Cook","M Hayden","Viv Riachards")

Excluding Clive Lloyd & Alvin Kallicharan the next best performer against India is Hashim Amla,followed by Alastair Cook, Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to strike rate, there is no contest when Viv Richards is around. He is clearly the best
#striker of the ball regardless of whether it is the pacy wickets of
#Australia/England or the spinning tracks of the subcontinent. After
#Viv Richards, Hayden and Alastair Cook have good cumulative strike rates
#in India
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

## 5. All time greats of Indian batting

I couldn’t resist checking out how the top Indian batsmen perform when playing in host countries So here is a look at how the top Indian batsmen perform against different host countries

# 6. Top Indian batsmen against Australia in Australia

The following Indian batsmen were chosen

2. Sachin Tendulkar
3. Virat Kohli
4. Virendar Sehwag
5. VVS Laxman
frames <- list("./data/tendulkarVsAusInAus.csv","./data/gavaskarVsAusInAus.csv","./data/kohliVsAusInAus.csv",
"./data/sehwagVsAusInAus.csv","./data/vvslaxmanVsAusInAus.csv")
names <- list("S Tendulkar","S Gavaskar","V Kohli","V Sehwag","VVS Laxman")

Virat Kohli has the best overall performance against Australia, with a current cumulative average of 60+ runs for the total number of innings played by him (15). With 15 matches the 2nd best is Virendar Sehwag, followed by VVS Laxman. Tendulkar maintains a cumulative average of 48+ runs for an excess of 30+ innings.

relativeBatsmanCumulativeAvgRuns(frames,names)

# Sehwag leads the strike rate against host Australia, followed by
# Tendulkar in Australia and then Kohli
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

# 7. Top Indian batsmen against England in England

The top Indian batmen’s performances against England are shown below

1. Rahul Dravid
2. Dilip Vengsarkar
3. Rahul Dravid
4. Sourav Ganguly
5. Virat Kohli
frames <- list("./data/tendulkarVsEngInEng.csv","./data/dravidVsEngInEng.csv","./data/vengsarkarVsEngInEng.csv",
names <- list("S Tendulkar","R Dravid","D Vengsarkar","S Ganguly","S Gavaskar","V Kohli")

Rahul Dravid has the best performance against England and edges out Tendulkar. He is followed by Tendulkar and then Sourav Ganguly. Note:Incidentally Virat Kohli’s performance against England in England so far has been extremely poor and he averages around 13-15 runs per innings. However he has a long way to go and I hope he catches up. In any case it will be an uphill climb for Kohli in England.

relativeBatsmanCumulativeAvgRuns(frames,names)

#Tendulkar, Ganguly and Dravid have the best strike rate and in that order.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

## 8. Top Indian batsmen against West Indies in West Indies

frames <- list("./data/tendulkarVsWInWI.csv","./data/dravidVsWInWI.csv","./data/vvslaxmanVsWIInWI.csv",
names <- list("S Tendulkar","R Dravid","VVS Laxman","S Gavaskar")

Against the West Indies Sunil Gavaskar is heads and shoulders above the rest. Gavaskar has a very impressive cumulative average against West Indies

relativeBatsmanCumulativeAvgRuns(frames,names)

# VVS Laxman followed by  Tendulkar & then Dravid have a very
# good strike rate against the West Indies
relativeBatsmanCumulativeStrikeRate(frames,names)

# 9. World’s best spinners on tracks suited for pace & bounce

In this part I compare the performances of the top 3 spinners in recent years and check out how they perform on surfaces that are known for pace, and bounce. I have taken the following 3 spinners

1. Anil Kumble (India)
2. M Muralitharan (Sri Lanka)
3. Shane Warne (Australia)
#kumbleEng=getPlayerData(30176  ,opposition=3,host=3,file="kumbleVsEngInEng.csv",type="bowling")
#muraliEng=getPlayerData(49636  ,opposition=3,host=3,file="muraliVsEngInEng.csv",type="bowling")
#warneEng=getPlayerData(8166  ,opposition=3,host=3,file="warneVsEngInEng.csv",type="bowling")

# 10. Top international spinners against England in England

frames <- list("./data/kumbleVsEngInEng.csv","./data/muraliVsEngInEng.csv","./data/warneVsEngInEng.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")

Against England and in England, Muralitharan shines with a cumulative average of nearly 5 wickets per match with a peak of almost 8 wickets. Shane Warne has a steady average at 5 wickets and then Anil Kumble.

relativeBowlerCumulativeAvgWickets(frames,names)

# The order relative cumulative Economy rate, Warne has the best figures,followed by Anil Kumble. Muralitharan
# is much more expensive.
relativeBowlerCumulativeAvgEconRate(frames,names)

# 11. Top international spinners against South Africa in South Africa

frames <- list("./data/kumbleVsSAInSA.csv","./data/muraliVsSAInSA.csv","./data/warneVsSAInSA.csv")
names <- list("Anil Kumble","M Muralitharan","Shane Warne")

In South Africa too, Muralitharan has the best wicket taking performance averaging about 4 wickets. Warne averages around 3 wickets and Kumble around 2 wickets

relativeBowlerCumulativeAvgWickets(frames,names)

# Muralitharan is expensive in South Africa too, while Kumble and Warne go neck-to-neck in the economy rate.
# Kumble edges out Warne and has a better cumulative average economy rate
relativeBowlerCumulativeAvgEconRate(frames,names)

# 11. Top international pacers against India in India

As a final analysis I check how the world’s pacers perform in India against India. India pitches are supposed to be flat devoid of bounce, while being terrific turners. Hence Indian pitches are more suited to spin bowling than pace bowling. This is changing these days.

The best performers against India in India are mostly the deadly pacemen of yesteryears

For this I have chosen the following bowlers

1. Courtney Walsh (West Indies)
2. Andy Roberts (West Indies)
3. Malcolm Marshall
4. Glenn McGrath
#cawalshInd=getPlayerData(53216  ,opposition=6,host=6,file="cawalshVsIndInInd.csv",type="bowling")
#arobertsInd=getPlayerData(52817  ,opposition=6,host=6,file="arobertsIndInInd.csv",type="bowling")
#mmarshallInd=getPlayerData(52419  ,opposition=6,host=6,file="mmarshallVsIndInInd.csv",type="bowling")
#gmccgrathInd=getPlayerData(6565  ,opposition=6,host=6,file="mccgrathVsIndInInd.csv",type="bowling")
frames <- list("./data/cawalshVsIndInInd.csv","./data/arobertsIndInInd.csv","./data/mmarshallVsIndInInd.csv",
"./data/mccgrathVsIndInInd.csv")
names <- list("C Walsh","A Roberts","M Marshall","G McGrath")

Courtney Walsh has the best performance, followed by Andy Roberts followed by Andy Roberts and then Malcom Marshall who tips ahead of Glenn McGrath

relativeBowlerCumulativeAvgWickets(frames,names)

#On the other hand McGrath has the best economy rate, followed by A Roberts and then Courtney Walsh
relativeBowlerCumulativeAvgEconRate(frames,names)

### 12. ODI performance of a player against a specific country in the host country

This gets the data for MS Dhoni in ODI matches against Australia and in Australia

#dhoniAusODI=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusODI.csv",type="batting")

### 13. Twenty 20 performance of a player against a specific country in the host country

#dhoniAusTT=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusTT.csv",type="batting")

All the ODI and Twenty20 functions of cricketr can be used on the above dataframes of MS Dhoni.

### Some key observations

Here are some key observations

1. At the top of the batting spectrum is Don Bradman with a very impressive average 100-120 in matches played in England and Australia. Unfortunately there weren’t matches he played in other countries and different pitches. 2.Viv Richard has the best cumulative strike rate overall.
2. Muralitharan strikes more often than Kumble or Warne even in pitches at ENgland, South Africa and West Indies. However Muralitharan is also the most expensive
3. Warne and Kumble have a much better economy rate than Muralitharan.
4. Sunil Gavaskar has an extremely impressive performance in West Indies.
5. Rahul Dravid performs much better than Tendulkar in both England and West Indies.
6. Virat Kohli has the best performance against Australia so far and hope he maintains his stellar performance followed by Sehwag. However Kohli’s performance in England has been very poor
7. West Indies batsmen and bowlers seem to thrive on Indian pitches, with Clive Lloyd and Alvin Kalicharan at the top of the list.

You may like my Shiny apps on cricket

Also see

To see all my posts see Index of posts