Analyzing performances of cricketers using cricketr template

This post includes a template which you can use for analyzing the performances of cricketers, both batsmen and bowlers in Test, ODI and Twenty 20 cricket using my R package cricketr. To see actual usage of functions in the R package cricketr see Introducing cricketr! : An R package to analyze performances of cricketers.

This template can be downloaded from Github at cricketer-template

The ‘cricketr’ package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

The cricketr package

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecast and a function to check whether the batsmans in in-form or out-of-form.

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below

The cricketr package is now available from CRAN!!! You should be able to install directly with

1. Install the cricketr package

if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

The cricketr package includes some pre-packaged sample (.csv) files. You can use these sample to test functions as shown below

# Retrieve the file path of a data file installed with cricketr
#pathToFile <- system.file("data", "tendulkar.csv", package = "cricketr")
#batsman4s(pathToFile, "Sachin Tendulkar")

# The general format is pkg-function(pathToFile,par1,...)
#batsman4s(<path-To-File>,"Sachin Tendulkar")

“` The pre-packaged files can be accessed as shown above. To get the data of any player use the function in Test, ODI and Twenty20 use the following

2. For Test cricket

#tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4))

2a. For ODI cricket

#tendulkarOD <- getPlayerDataOD(35320,dir="..",file="tendulkarOD.csv",type="batting")

2b For Twenty 20 cricket

#tendulkarT20 <- getPlayerDataTT(35320,dir="..",file="tendulkarT20.csv",type="batting")

Analysis of batsmen

Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

Sachin Tendulkar’s performance – Basic Analyses

The 3 plots below provide the following for Tendulkar

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges For example

3. Basic analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
#batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
#batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()
## null device 
##           1
  1. Player 1
  2. Player 2
  3. Player 3
  4. Player 4

4. More analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player1.csv","Player1")
#batsman6s("./player1.csv","Player1")
#batsmanMeanStrikeRate("./player1.csv","Player1")

# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player2.csv","Player2")
#batsman6s("./player2.csv","Player2")
#batsmanMeanStrikeRate("./player2.csv","Player2")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player3.csv","Player3")
#batsman6s("./player3.csv","Player3")
#batsmanMeanStrikeRate("./player3.csv","Player3")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")

dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player4.csv","Player4")
#batsman6s("./player4.csv","Player4")
#batsmanMeanStrikeRate("./player4.csv","Player4")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1

Note: For mean strike rate in ODI and Twenty20 use the function batsmanScoringRateODTT()

5.Boxplot histogram plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

#batsmanPerfBoxHist("./player1.csv","Player1")
#batsmanPerfBoxHist("./player2.csv","Player2")
#batsmanPerfBoxHist("./player3.csv","Player3")
#batsmanPerfBoxHist("./player4.csv","Player4")

6. Contribution to won and lost matches

For the 2 functions below you will have to use the getPlayerDataSp() function. I have commented this as I already have these files. This function can only be used for Test matches

#player1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player4sp.csv",ttype="batting")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanContributionWonLost("player1sp.csv","Player1")
#batsmanContributionWonLost("player2sp.csv","Player2")
#batsmanContributionWonLost("player3sp.csv","Player3")
#batsmanContributionWonLost("player4sp.csv","Player4")
dev.off()
## null device 
##           1

7, Performance at home and overseas

This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfHomeAway("player1sp.csv","Player1")
#batsmanPerfHomeAway("player2sp.csv","Player2")
#batsmanPerfHomeAway("player3sp.csv","Player3")
#batsmanPerfHomeAway("player4sp.csv","Player4")
dev.off()
## null device 
##           1

8. Batsman average at different venues

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsGround("./player1.csv","Player1")
#batsmanAvgRunsGround("./player2.csv","Player2")
#batsmanAvgRunsGround("./player3.csv","Ponting")
#batsmanAvgRunsGround("./player4.csv","Player4")
dev.off()
## null device 
##           1

9. Batsman average against different opposition

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsOpposition("./player1.csv","Player1")
#batsmanAvgRunsOpposition("./player2.csv","Player2")
#batsmanAvgRunsOpposition("./player3.csv","Ponting")
#batsmanAvgRunsOpposition("./player4.csv","Player4")
dev.off()
## null device 
##           1

10. Runs Likelihood of batsman

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanRunsLikelihood("./player1.csv","Player1")
#batsmanRunsLikelihood("./player2.csv","Player2")
#batsmanRunsLikelihood("./player3.csv","Ponting")
#batsmanRunsLikelihood("./player4.csv","Player4")
dev.off()
## null device 
##           1

11. Moving Average of runs in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanMovingAverage("./player1.csv","Player1")
#batsmanMovingAverage("./player2.csv","Player2")
#batsmanMovingAverage("./player3.csv","Ponting")
#batsmanMovingAverage("./player4.csv","Player4")
dev.off()
## null device 
##           1

12. Cumulative Average runs of batsman in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeAverageRuns("./player1.csv","Player1")
#batsmanCumulativeAverageRuns("./player2.csv","Player2")
#batsmanCumulativeAverageRuns("./player3.csv","Ponting")
#batsmanCumulativeAverageRuns("./player4.csv","Player4")
dev.off()
## null device 
##           1

13. Cumulative Average strike rate of batsman in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeStrikeRate("./player1.csv","Player1")
#batsmanCumulativeStrikeRate("./player2.csv","Player2")
#batsmanCumulativeStrikeRate("./player3.csv","Ponting")
#batsmanCumulativeStrikeRate("./player4.csv","Player4")
dev.off()
## null device 
##           1

14. Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfForecast("./player1.csv","Player1")
#batsmanPerfForecast("./player2.csv","Player2")
#batsmanPerfForecast("./player3.csv","Player3")
#batsmanPerfForecast("./player4.csv","Player4")
dev.off()
## null device 
##           1

15. Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanSR(frames,names)

16. Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeRunsFreqPerf(frames,names)

17. Relative cumulative average runs in career

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanCumulativeAvgRuns(frames,names)

18. Relative cumulative average strike rate in career

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","player4")
#relativeBatsmanCumulativeStrikeRate(frames,names)

19. Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

#checkBatsmanInForm("./player1.csv","Player1")
#checkBatsmanInForm("./player2.csv","Player2")
#checkBatsmanInForm("./player3.csv","Player3")
#checkBatsmanInForm("./player4.csv","Player4")

20. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player1.csv","Player1")
#battingPerf3d("./player2.csv","Player2")
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player3.csv","Player3")
#battingPerf3d("./player4.csv","player4")
dev.off()
## null device 
##           1

21. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
#Player1 <- batsmanRunsPredict("./player1.csv","Player1",newdataframe=newDF)
#Player2 <- batsmanRunsPredict("./player2.csv","Player2",newdataframe=newDF)
#ponting <- batsmanRunsPredict("./player3.csv","Player3",newdataframe=newDF)
#sangakkara <- batsmanRunsPredict("./player4.csv","Player4",newdataframe=newDF)
#batsmen <-cbind(round(Player1$Runs),round(Player2$Runs),round(Player3$Runs),round(Player4$Runs))
#colnames(batsmen) <- c("Player1","Player2","Player3","Player4")
#newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
#colnames(newDF) <- c("BallsFaced","MinsAtCrease")
#predictedRuns <- cbind(newDF,batsmen)
#predictedRuns

Analysis of bowlers

  1. Bowler1
  2. Bowler2
  3. Bowler3
  4. Bowler4

player1 <- getPlayerData(xxxx,dir=“..”,file=“player1.csv”,type=“bowling”) Note For One day you will have to use getPlayerDataOD() and for Twenty20 it is getPlayerDataTT()

21. Wicket Frequency Plot

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsFreqPercent("./bowler1.csv","Bowler1")
#bowlerWktsFreqPercent("./bowler2.csv","Bowler2")
#bowlerWktsFreqPercent("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

22. Wickets Runs plot

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsRunsPlot("./bowler1.csv","Bowler1")
#bowlerWktsRunsPlot("./bowler2.csv","Bowler2")
#bowlerWktsRunsPlot("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

23. Average wickets at different venues

#bowlerAvgWktsGround("./bowler3.csv","Bowler3")

24. Average wickets against different opposition

#bowlerAvgWktsOpposition("./bowler3.csv","Bowler3")

25. Wickets taken moving average

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerMovingAverage("./bowler1.csv","Bowler1")
#bowlerMovingAverage("./bowler2.csv","Bowler2")
#bowlerMovingAverage("./bowler3.csv","Bowler3")

dev.off()
## null device 
##           1

26. Cumulative Wickets taken

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgWickets("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgWickets("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgWickets("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

27. Cumulative Economy rate

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgEconRate("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgEconRate("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgEconRate("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

28. Future Wickets forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfForecast("./bowler1.csv","Bowler1")
#bowlerPerfForecast("./bowler2.csv","Bowler2")
#bowlerPerfForecast("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

29. Contribution to matches won and lost

As discussed above the next 2 charts require the use of getPlayerDataSp(). This can only be done for Test matches

#bowler1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler1sp.csv",ttype="bowling")
#bowler2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler2sp.csv",ttype="bowling")
#bowler3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler3sp.csv",ttype="bowling")
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerContributionWonLost("bowler1sp","Bowler1")
#bowlerContributionWonLost("bowler2sp","Bowler2")
#bowlerContributionWonLost("bowler3sp","Bowler3")
dev.off()
## null device 
##           1

30. Performance home and overseas.

This can only be done for Test matches

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfHomeAway("bowler1sp","Bowler1")
#bowlerPerfHomeAway("bowler2sp","Bowler2")
#bowlerPerfHomeAway("bowler3sp","Bowler3")
dev.off()
## null device 
##           1

31 Relative Wickets Frequency Percentage

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingPerf(frames,names)

32 Relative Economy Rate against wickets taken

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingER(frames,names)

33 Relative cumulative average wickets of bowlers in career

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgWickets(frames,names)

34 Relative cumulative average economy rate of bowlers

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgEconRate(frames,names)

35 Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

Note: The check for the form status of the bowlers indicate

#checkBowlerInForm("./bowler1.csv","Bowler1")
#checkBowlerInForm("./bowler2.csv","Bowler2")
#checkBowlerInForm("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

Analyzing batsmen and bowlers with cricpy template

Introduction

This post shows how you can analyze batsmen and bowlers of Test, ODI and T20s using cricpy templates, using data from ESPN Cricinfo.

The cricpy package

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli  etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below

1. For Test players use batting and bowling.
2. For ODI use batting and bowling
3. For T20 use T20 Batting T20 Bowling

Please mindful of the  ESPN Cricinfo Terms of Use

My posts on Cripy were
a. Introducing cricpy:A python package to analyze performances of cricketers
b. Cricpy takes a swing at the ODIs
c. Cricpy takes guard for the Twenty20s

You can clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template. You can uncomment the functions and use them.

The cricpy package is now available with pip install cricpy!!!

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
##   from pandas.core import datetools

2. Invoking functions with Python package cricpy

import cricpy.analytics as ca 
#ca.batsman4s("aplayer.csv","A Player")

3. Getting help from cricpy – Python

import cricpy.analytics as ca
#help(ca.getPlayerData)

The details below will introduce the different functions that are available in cricpy.

4. Get the player data for a player using the function getPlayerData()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

4a. For Test players

import cricpy.analytics as ca
#player1 =ca.getPlayerData(profileNo1,dir="..",file="player1.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#player1 =ca.getPlayerData(profileNo2,dir="..",file="player2.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])

4b. For ODI players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataOD(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataOD(profileNo2,dir="..",file="player2.csv",type="batting"")

4c For T20 players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataTT(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataTT(profileNo2,dir="..",file="player2.csv",type="batting"")

5 A Player’s performance – Basic Analyses

The 3 plots below provide the following for Rahul Dravid

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
#ca.batsmanRunsFreqPerf("aplayer.csv","A Player")
#ca.batsmanMeanStrikeRate("aplayer.csv","A Player")
#ca.batsmanRunsRanges("aplayer.csv","A Player") 

6. More analyses

This gives details on the batsmen’s 4s, 6s and dismissals

import cricpy.analytics as ca
#ca.batsman4s("aplayer.csv","A Player")
#ca.batsman6s("aplayer.csv","A Player") 
#ca.batsmanDismissals("aplayer.csv","A Player")
# The below function is for ODI and T20 only
#ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")  

7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer.csv","A Player")

8. Average runs at different venues

The plot below gives the average runs scored at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
#ca.batsmanAvgRunsGround("aplayer.csv","A Player")

9. Average runs against different opposing teams

This plot computes the average runs scored against different countries.

import cricpy.analytics as ca
#ca.batsmanAvgRunsOpposition("aplayer.csv","A Player")

10. Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman.

import cricpy.analytics as ca
#ca.batsmanRunsLikelihood("aplayer.csv","A Player")

11. A look at the Top 4 batsman

Choose any number of players

1.Player1 2.Player2 3.Player3 …

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
#ca.batsmanPerfBoxHist("aplayer001.csv","A Player001")
#ca.batsmanPerfBoxHist("aplayer002.csv","A Player002")
#ca.batsmanPerfBoxHist("aplayer003.csv","A Player003")
#ca.batsmanPerfBoxHist("aplayer004.csv","A Player004")

13. Get Player Data special

import cricpy.analytics as ca
#player1sp = ca.getPlayerDataSp(profile1,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp = ca.getPlayerDataSp(profile2,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp = ca.getPlayerDataSp(profile3,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp = ca.getPlayerDataSp(profile4,tdir=".",tfile="player4sp.csv",ttype="batting")

14. Contribution to won and lost matches

Note:This can only be used for Test matches

import cricpy.analytics as ca
#ca.batsmanContributionWonLost("player1sp.csv","A Player001")
#ca.batsmanContributionWonLost("player2sp.csv","A Player002")
#ca.batsmanContributionWonLost("player3sp.csv","A Player003")
#ca.batsmanContributionWonLost("player4sp.csv","A Player004")

15. Performance at home and overseas

Note:This can only be used for Test matches This function also requires the use of getPlayerDataSp() as shown above

import cricpy.analytics as ca
#ca.batsmanPerfHomeAway("player1sp.csv","A Player001")
#ca.batsmanPerfHomeAway("player2sp.csv","A Player002")
#ca.batsmanPerfHomeAway("player3sp.csv","A Player003")
#ca.batsmanPerfHomeAway("player4sp.csv","A Player004")

16 Moving Average of runs in career

import cricpy.analytics as ca
#ca.batsmanMovingAverage("aplayer001.csv","A Player001")
#ca.batsmanMovingAverage("aplayer002.csv","A Player002")
#ca.batsmanMovingAverage("aplayer003.csv","A Player003")
#ca.batsmanMovingAverage("aplayer004.csv","A Player004")

17 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career.

import cricpy.analytics as ca
#ca.batsmanCumulativeAverageRuns("aplayer001.csv","A Player001")
#ca.batsmanCumulativeAverageRuns("aplayer002.csv","A Player002")
#ca.batsmanCumulativeAverageRuns("aplayer003.csv","A Player003")
#ca.batsmanCumulativeAverageRuns("aplayer004.csv","A Player004")

18 Cumulative Average strike rate of batsman in career

.

import cricpy.analytics as ca
#ca.batsmanCumulativeStrikeRate("aplayer001.csv","A Player001")
#ca.batsmanCumulativeStrikeRate("aplayer002.csv","A Player002")
#ca.batsmanCumulativeStrikeRate("aplayer003.csv","A Player003")
#ca.batsmanCumulativeStrikeRate("aplayer004.csv","A Player004")

19 Future Runs forecast

import cricpy.analytics as ca
#ca.batsmanPerfForecast("aplayer001.csv","A Player001")

20 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them.

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeAvgRuns(frames,names)

21 Plot of 4s and 6s

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.batsman4s6s(frames,names)

22. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeStrikeRate(frames,names)

23. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer001.csv","A Player001")
#ca.battingPerf3d("aplayer002.csv","A Player002")
#ca.battingPerf3d("aplayer003.csv","A Player003")
#ca.battingPerf3d("aplayer004.csv","A Player004")

24. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
#aplayer = ca.batsmanRunsPredict("aplayer.csv",newDF,"A Player")
#print(aplayer)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

25 Analysis of Top 3 wicket takers

Take any number of bowlers from either Test, ODI or T20

  1. Bowler1
  2. Bowler2
  3. Bowler3 …

26. Get the bowler’s data (Test)

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#abowler1 =ca.getPlayerData(profileNo1,dir=".",file="abowler1.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler2 =ca.getPlayerData(profileNo2,dir=".",file="abowler2.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler3 =ca.getPlayerData(profile3,dir=".",file="abowler3.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])

26b For ODI bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataOD(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataOD(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataOD(profile3,dir=".",file="abowler3.csv",type="bowling")

26c For T20 bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataTT(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataTT(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataTT(profile3,dir=".",file="abowler3.csv",type="bowling")

27. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
#ca.bowlerWktsFreqPercent("abowler1.csv","A Bowler1")
#ca.bowlerWktsFreqPercent("abowler2.csv","A Bowler2")
#ca.bowlerWktsFreqPercent("abowler3.csv","A Bowler3")

28. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken

import cricpy.analytics as ca
#ca.bowlerWktsRunsPlot("abowler1.csv","A Bowler1")
#ca.bowlerWktsRunsPlot("abowler2.csv","A Bowler2")
#ca.bowlerWktsRunsPlot("abowler3.csv","A Bowler3")

29 Average wickets at different venues

The plot gives the average wickets taken bat different venues.

import cricpy.analytics as ca
#ca.bowlerAvgWktsGround("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsGround("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsGround("abowler3.csv","A Bowler3")

30 Average wickets against different opposition

The plot gives the average wickets taken against different countries.

import cricpy.analytics as ca
#ca.bowlerAvgWktsOpposition("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsOpposition("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsOpposition("abowler3.csv","A Bowler3")

31 Wickets taken moving average

import cricpy.analytics as ca
#ca.bowlerMovingAverage("abowler1.csv","A Bowler1")
#ca.bowlerMovingAverage("abowler2.csv","A Bowler2")
#ca.bowlerMovingAverage("abowler3.csv","A Bowler3")

32 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgWickets("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgWickets("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgWickets("abowler3.csv","A Bowler3")

33 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgEconRate("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgEconRate("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgEconRate("abowler3.csv","A Bowler3")

34 Future Wickets forecast

import cricpy.analytics as ca
#ca.bowlerPerfForecast("abowler1.csv","A bowler1")

35 Get player data special

import cricpy.analytics as ca
#abowler1sp =ca.getPlayerDataSp(profile1,tdir=".",tfile="abowler1sp.csv",ttype="bowling")
#abowler2sp =ca.getPlayerDataSp(profile2,tdir=".",tfile="abowler2sp.csv",ttype="bowling")
#abowler3sp =ca.getPlayerDataSp(profile3,tdir=".",tfile="abowler3sp.csv",ttype="bowling")

36 Contribution to matches won and lost

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerContributionWonLost("abowler1sp.csv","A Bowler1")
#ca.bowlerContributionWonLost("abowler2sp.csv","A Bowler2")
#ca.bowlerContributionWonLost("abowler3sp.csv","A Bowler3")

37 Performance home and overseas

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerPerfHomeAway("abowler1sp.csv","A Bowler1")
#ca.bowlerPerfHomeAway("abowler2sp.csv","A Bowler2")
#ca.bowlerPerfHomeAway("abowler3sp.csv","A Bowler3")

38 Relative cumulative average economy rate of bowlers

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgEconRate(frames,names)

39 Relative Economy Rate against wickets taken

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlingER(frames,names)

40 Relative cumulative average wickets of bowlers in career

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgWickets(frames,names)

Clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template

Important note: Do check out my other posts using cricpy at cricpy-posts

Key Findings

Analysis of Top 4 batsman

Analysis of Top 3 bowlers

You may also like
1. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2. Presentation on ‘Evolution to LTE’
3. Stacks of protocol stacks – A primer
4. Taking baby steps in Lisp
5. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

To see all posts click Index of posts

cricketr flexes new muscles: The final analysis

Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

       Jabberwocky by Lewis Carroll
                   

No analysis of cricket is complete, without determining how players would perform in the host country. Playing Test cricket on foreign pitches, in the host country, is a ‘real test’ for both batsmen and bowlers. Players, who can perform consistently both on domestic and foreign pitches are the genuinely ‘class’ players. Player performance on foreign pitches lets us differentiate the paper tigers, and home ground bullies among batsmen. Similarly, spinners who perform well, only on rank turners in home ground or pace bowlers who can only swing and generate bounce on specially prepared pitches are neither  genuine spinners nor  real pace bowlers.

So this post, helps in identifying those with real strengths, and those who play good only when the conditions are in favor, in home grounds. This post brings a certain level of finality to the analysis of players with my R package ‘cricketr’

Besides, I also meant ‘final analysis’ in the literal sense, as I intend to take a long break from cricket analysis/analytics and focus on some other domains like Neural Networks, Deep Learning and Spark.

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

 

As already mentioned, my R package ‘cricketr’ uses the statistics info available in ESPN Cricinfo Statsguru. You should be able to install the package from CRAN and use many of the functions available in the package. Please be mindful of ESPN Cricinfo Terms of Use

(Note: This page is also hosted at RPubs as cricketrFinalAnalysis. You can download the PDF file at cricketrFinalAnalysis.

Important note: Do check out my other posts using cricketr at cricketr-posts

For getting data of a player against a particular country for the match played in the host country, I just had to add 2 extra parameters to the getPlayerData() function. The cricketr package has been updated with the changed functions for getPlayerData() – Tests, getPlayerDataOD() – ODI and getPlayerDataTT() for the Twenty20s. The updated functions will be available in cricketr Version -0.0.14

The data for the following players have already been obtained with the new, changed getPlayerData() function and have been saved as *.csv files. I will be re-using these files, instead of getting them all over again. Hence the getPlayerData() lines have been commented below

library(cricketr)

1. Performance of a batsman against a host ountry in the host country

For e.g We can the get the data for Sachin Tendulkar for matches played against Australia and in Australia Here opposition=2 and host =2 indicate that the opposition is Australia and the host country is also Australia

#tendulkarAus=getPlayerData(35320,opposition=2,host=2,file="tendulkarVsAusInAus.csv",type="batting")

All cricketr functions can be used with this data frame, as before. All the charts show the performance of Tendulkar in Australia against Australia.

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsman6s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanRunsRanges("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanDismissals("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanAvgRunsGround("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanMovingAverage("./data/tendulkarVsAusInAus.csv","Tendulkar")

dev.off()
## null device 
##           1

2. Relative performances of international batsmen against England in England

While we can analyze the performance of a player against an opposition in some host country, I wanted to compare the relative performances of players, to see how players from different nations play in a host country which is not their home ground.

The following lines gets player’s data of matches played in England and against England.The Oval, Lord’s are famous for generating some dangerous swing and bounce. I chose the following players

  1. Sir Don Bradman (Australia)
  2. Steve Waugh (Australia)
  3. Rahul Dravid (India)
  4. Vivian Richards (West Indies)
  5. Sachin Tendulkar (India)
#tendulkarEng=getPlayerData(35320,opposition=1,host=1,file="tendulkarVsEngInEng.csv",type="batting")
#bradmanEng=getPlayerData(4188,opposition=1,host=1,file="bradmanVsEngInEng.csv",type="batting")
#srwaughEng=getPlayerData(8192,opposition=1,host=1,file="srwaughVsEngInEng.csv",type="batting")
#dravidEng=getPlayerData(28114,opposition=1,host=1,file="dravidVsEngInEng.csv",type="batting")
#vrichardEng=getPlayerData(52812,opposition=1,host=1,file="vrichardsEngInEng.csv",type="batting")
frames <- list("./data/tendulkarVsEngInEng.csv","./data/bradmanVsEngInEng.csv","./data/srwaughVsEngInEng.csv",
               "./data/dravidVsEngInEng.csv","./data/vrichardsEngInEng.csv")
names <- list("S Tendulkar","D Bradman","SR Waugh","R Dravid","Viv Richards")

The Lords and the Oval in England are some of the best pitches in the world. Scoring on these pitches and weather conditions, where there is both swing and bounce really requires excellent batting skills. It can be easily seen that Don Bradman stands heads and shoulders over everybody else, averaging close a cumulative average of 100+. He is followed by Viv Richards, who averages around ~60. Interestingly in English conditions, Rahul Dravid edges out Sachin Tendulkar.

relativeBatsmanCumulativeAvgRuns(frames,names)

# The other 2 plots on relative strike rate and cumulative average strike rate,
shows Viv Richards really  blasts the bowling. Viv Richards has a strike rate 
of 70, while Bradman 62+, followed by Tendulkar.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

3. Relative performances of international batsmen against Australia in Australia

The following players from these countries were chosen

  1. Sachin Tendulkar (India)
  2. Viv Richard (West Indies)
  3. David Gower (England)
  4. Jacques Kallis (South Africa)
  5. Alastair Cook (Emgland)
frames <- list("./data/tendulkarVsAusInAus.csv","./data/vrichardsVAusInAus.csv","./data/dgowerVsAusInAus.csv",
               "./data/kallisVsAusInAus.csv","./data/ancookVsWIInWI.csv")
names <- list("S Tendulkar","Viv Richards","David Gower","J Kallis","AN Cook")

Alastair Cook of England has fantastic cumulative average of 55+ on the pitches of Australia. There is a dip towards the end, but we cannot predict whether it would have continued. AN Cook is followed by Tendulkar who has a steady average of 50+ runs, after which there is Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to cumulative or relative strike rate Viv Richards is a class apart.He seems to really
#tear into bowlers. David Gower has an excellent strike rate and is followed by Tendulkar
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

4. Relative performances of international batsmen against India in India

While England & Australia are famous for bouncy tracks with swing, Indian pitches are renowed for being extraordinary turners. Also India has always thrown up world class spinners, from the spin quartet of BS Chandraskehar, Bishen Singh Bedi, EAS Prasanna, S Venkatraghavan, to the times of dangerous Anil Kumble, and now to the more recent Ravichander Ashwon and Harbhajan Singh.

A batsmen who can score runs in India against Indian spinners has to be really adept in handling all kinds of spin.

While Clive Lloyd & Alvin Kallicharan had the best performance against India, they have not been included as ESPN Cricinfo had many of the columns missing.

So I chose the following international players for the analysis against India

  1. Hashim Amla (South Africa)
  2. Alastair Cook (England)
  3. Matthew Hayden (Australia)
  4. Viv Richards (West Indies)
frames <- list("./data/amlaVsIndInInd.csv","./data/ancookVsIndInInd.csv","./data/mhaydenVsIndInInd.csv",
               "./data/vrichardsVsIndInInd.csv")
names <- list("H Amla","AN Cook","M Hayden","Viv Riachards")

Excluding Clive Lloyd & Alvin Kallicharan the next best performer against India is Hashim Amla,followed by Alastair Cook, Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to strike rate, there is no contest when Viv Richards is around. He is clearly the best 
#striker of the ball regardless of whether it is the pacy wickets of 
#Australia/England or the spinning tracks of the subcontinent. After 
#Viv Richards, Hayden and Alastair Cook have good cumulative strike rates
#in India
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

5. All time greats of Indian batting

I couldn’t resist checking out how the top Indian batsmen perform when playing in host countries So here is a look at how the top Indian batsmen perform against different host countries

6. Top Indian batsmen against Australia in Australia

The following Indian batsmen were chosen

  1. Sunil Gavaskar
  2. Sachin Tendulkar
  3. Virat Kohli
  4. Virendar Sehwag
  5. VVS Laxman
frames <- list("./data/tendulkarVsAusInAus.csv","./data/gavaskarVsAusInAus.csv","./data/kohliVsAusInAus.csv",
               "./data/sehwagVsAusInAus.csv","./data/vvslaxmanVsAusInAus.csv")
names <- list("S Tendulkar","S Gavaskar","V Kohli","V Sehwag","VVS Laxman")

Virat Kohli has the best overall performance against Australia, with a current cumulative average of 60+ runs for the total number of innings played by him (15). With 15 matches the 2nd best is Virendar Sehwag, followed by VVS Laxman. Tendulkar maintains a cumulative average of 48+ runs for an excess of 30+ innings.

relativeBatsmanCumulativeAvgRuns(frames,names)

# Sehwag leads the strike rate against host Australia, followed by 
# Tendulkar in Australia and then Kohli
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

7. Top Indian batsmen against England in England

The top Indian batmen’s performances against England are shown below

  1. Rahul Dravid
  2. Dilip Vengsarkar
  3. Rahul Dravid
  4. Sourav Ganguly
  5. Virat Kohli
frames <- list("./data/tendulkarVsEngInEng.csv","./data/dravidVsEngInEng.csv","./data/vengsarkarVsEngInEng.csv",
               "./data/gangulyVsEngInEng.csv","./data/gavaskarVsEngInEng.csv","./data/kohliVsEngInEng.csv")
names <- list("S Tendulkar","R Dravid","D Vengsarkar","S Ganguly","S Gavaskar","V Kohli")

Rahul Dravid has the best performance against England and edges out Tendulkar. He is followed by Tendulkar and then Sourav Ganguly. Note:Incidentally Virat Kohli’s performance against England in England so far has been extremely poor and he averages around 13-15 runs per innings. However he has a long way to go and I hope he catches up. In any case it will be an uphill climb for Kohli in England.

relativeBatsmanCumulativeAvgRuns(frames,names)

#Tendulkar, Ganguly and Dravid have the best strike rate and in that order.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

8. Top Indian batsmen against West Indies in West Indies

frames <- list("./data/tendulkarVsWInWI.csv","./data/dravidVsWInWI.csv","./data/vvslaxmanVsWIInWI.csv",
               "./data/gavaskarVsWIInWI.csv")
names <- list("S Tendulkar","R Dravid","VVS Laxman","S Gavaskar")

Against the West Indies Sunil Gavaskar is heads and shoulders above the rest. Gavaskar has a very impressive cumulative average against West Indies

relativeBatsmanCumulativeAvgRuns(frames,names)

# VVS Laxman followed by  Tendulkar & then Dravid have a very 
# good strike rate against the West Indies
relativeBatsmanCumulativeStrikeRate(frames,names)

9. World’s best spinners on tracks suited for pace & bounce

In this part I compare the performances of the top 3 spinners in recent years and check out how they perform on surfaces that are known for pace, and bounce. I have taken the following 3 spinners

  1. Anil Kumble (India)
  2. M Muralitharan (Sri Lanka)
  3. Shane Warne (Australia)
#kumbleEng=getPlayerData(30176  ,opposition=3,host=3,file="kumbleVsEngInEng.csv",type="bowling")
#muraliEng=getPlayerData(49636  ,opposition=3,host=3,file="muraliVsEngInEng.csv",type="bowling")
#warneEng=getPlayerData(8166  ,opposition=3,host=3,file="warneVsEngInEng.csv",type="bowling")

10. Top international spinners against England in England

frames <- list("./data/kumbleVsEngInEng.csv","./data/muraliVsEngInEng.csv","./data/warneVsEngInEng.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")

Against England and in England, Muralitharan shines with a cumulative average of nearly 5 wickets per match with a peak of almost 8 wickets. Shane Warne has a steady average at 5 wickets and then Anil Kumble.

relativeBowlerCumulativeAvgWickets(frames,names)

# The order relative cumulative Economy rate, Warne has the best figures,followed by Anil Kumble. Muralitharan
# is much more expensive.
relativeBowlerCumulativeAvgEconRate(frames,names)

11. Top international spinners against South Africa in South Africa

frames <- list("./data/kumbleVsSAInSA.csv","./data/muraliVsSAInSA.csv","./data/warneVsSAInSA.csv")
names <- list("Anil Kumble","M Muralitharan","Shane Warne")

In South Africa too, Muralitharan has the best wicket taking performance averaging about 4 wickets. Warne averages around 3 wickets and Kumble around 2 wickets

relativeBowlerCumulativeAvgWickets(frames,names)

# Muralitharan is expensive in South Africa too, while Kumble and Warne go neck-to-neck in the economy rate.
# Kumble edges out Warne and has a better cumulative average economy rate
relativeBowlerCumulativeAvgEconRate(frames,names)

11. Top international pacers against India in India

As a final analysis I check how the world’s pacers perform in India against India. India pitches are supposed to be flat devoid of bounce, while being terrific turners. Hence Indian pitches are more suited to spin bowling than pace bowling. This is changing these days.

The best performers against India in India are mostly the deadly pacemen of yesteryears

For this I have chosen the following bowlers

  1. Courtney Walsh (West Indies)
  2. Andy Roberts (West Indies)
  3. Malcolm Marshall
  4. Glenn McGrath
#cawalshInd=getPlayerData(53216  ,opposition=6,host=6,file="cawalshVsIndInInd.csv",type="bowling")
#arobertsInd=getPlayerData(52817  ,opposition=6,host=6,file="arobertsIndInInd.csv",type="bowling")
#mmarshallInd=getPlayerData(52419  ,opposition=6,host=6,file="mmarshallVsIndInInd.csv",type="bowling")
#gmccgrathInd=getPlayerData(6565  ,opposition=6,host=6,file="mccgrathVsIndInInd.csv",type="bowling")
frames <- list("./data/cawalshVsIndInInd.csv","./data/arobertsIndInInd.csv","./data/mmarshallVsIndInInd.csv",
               "./data/mccgrathVsIndInInd.csv")
names <- list("C Walsh","A Roberts","M Marshall","G McGrath")

Courtney Walsh has the best performance, followed by Andy Roberts followed by Andy Roberts and then Malcom Marshall who tips ahead of Glenn McGrath

relativeBowlerCumulativeAvgWickets(frames,names)

#On the other hand McGrath has the best economy rate, followed by A Roberts and then Courtney Walsh
relativeBowlerCumulativeAvgEconRate(frames,names)

12. ODI performance of a player against a specific country in the host country

This gets the data for MS Dhoni in ODI matches against Australia and in Australia

#dhoniAusODI=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusODI.csv",type="batting")

13. Twenty 20 performance of a player against a specific country in the host country

#dhoniAusTT=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusTT.csv",type="batting")

All the ODI and Twenty20 functions of cricketr can be used on the above dataframes of MS Dhoni.

Some key observations

Here are some key observations

  1. At the top of the batting spectrum is Don Bradman with a very impressive average 100-120 in matches played in England and Australia. Unfortunately there weren’t matches he played in other countries and different pitches. 2.Viv Richard has the best cumulative strike rate overall.
  2. Muralitharan strikes more often than Kumble or Warne even in pitches at ENgland, South Africa and West Indies. However Muralitharan is also the most expensive
  3. Warne and Kumble have a much better economy rate than Muralitharan.
  4. Sunil Gavaskar has an extremely impressive performance in West Indies.
  5. Rahul Dravid performs much better than Tendulkar in both England and West Indies.
  6. Virat Kohli has the best performance against Australia so far and hope he maintains his stellar performance followed by Sehwag. However Kohli’s performance in England has been very poor
  7. West Indies batsmen and bowlers seem to thrive on Indian pitches, with Clive Lloyd and Alvin Kalicharan at the top of the list.

You may like my Shiny apps on cricket

  1. Inswinger- Analyzing International. T20s
  2. GooglyPlus – An app for analyzing IPL
  3. Sixer – App based on R package cricketr

Also see

  1. Exploring Quantum Gate operations with QCSimulator
  2. Neural Networks: The mechanics of backpropagation
  3. Re-introducing cricketr! : An R package to analyze performances of cricketers
  4. yorkr crashes the IPL party ! – Part 1
  5. cricketr and yorkr books – Paperback now in Amazon
  6.  Hand detection through Haartraining: A hands-on approach

To see all my posts see Index of posts

 

Re-introducing cricketr! : An R package to analyze performances of cricketers

In this post I re-introduce R package cricketr. I have added 8 new functions to my R package cricketr, available from version cricketr_0.0.13 namely

  1. batsmanCumulativeAverageRuns
  2. batsmanCumulativeStrikeRate
  3. bowlerCumulativeAvgEconRate
  4. bowlerCumulativeAvgWicketRate
  5. relativeBatsmanCumulativeAvgRuns
  6. relativeBatsmanCumulativeStrikeRate
  7. relativeBowlerCumulativeAvgWickets
  8. relativeBowlerCumulativeAvgEconRate

This post updates my earlier post Introducing cricketr:An R package for analyzing performances of cricketrs

Yet all experience is an arch wherethro’
Gleams that untravell’d world whose margin fades
For ever and forever when I move.
How dull it is to pause, to make an end,
To rust unburnish’d, not to shine in use!

Ulysses by Alfred Tennyson

 Introduction

This is an initial post in which I introduce a cricketing package ‘cricketr’ which I have created. This package was a natural culmination to my earlier posts on cricket and my finishing 10 modules of Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition.

So here it is. My R package ‘cricketr!!!’

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package can handle all formats of the game including Test, ODI and Twenty20 cricket.

You should be able to install the package from GitHub and use  many of the functions available in the package. Please be mindful of  ESPN Cricinfo Terms of Use

Note: This page is also hosted as a GitHub page at cricketr

This post is also hosted on Rpubs at Reintroducing cricketr. You can also down the pdf version of this post at reintroducing_cricketr.pdf

(Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial)

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Also see my 2nd book “Beaten by sheer pace”  based on my R package yorkr which is now available in paperback and kindle versions at Amazon

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

 The cricketr package

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecast and a function to check whether the batsman/bowler is in in-form or out-of-form.

The data for a particular player can be obtained with the getPlayerData() function from the package. To do this you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below

The cricketr package is now available from  CRAN!!!.  You should be able to install directly with

# Install from CRAN
if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

Getting help from cricketr

?getPlayerData

## 
## getPlayerData(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2], result=[1, 2, 4], create=True)
##     Get the player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a .csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerData(profile,opposition="",host="",dir="./data",file="player001.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Sachin Tendulkar this turns out to be http://www.espncricinfo.com/india/content/player/35320.html. Hence the profile for Sachin is 35320
##     opposition  
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     host        
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is a vector with either 1,2 or both. 1 is for home 2 is for away
##     result      
##     This is a vector that can take values 1,2,4. 1 - won match 2- lost match 4- draw
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp
##     
##     Examples
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar1.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     
##     # Only away. Get data only for won and lost innings
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar2.csv",
##     type="batting",homeOrAway=c(2),result=c(1,2))
##     
##     # Get bowling data and store in file for future
##     kumble = getPlayerData(30176,dir=".",file="kumble1.csv",
##     type="bowling",homeOrAway=c(1),result=c(1,2))
##     
##     #Get the Tendulkar's Performance against Australia in Australia
##     tendulkar = getPlayerData(35320, opposition = 2,host=2,dir=".", 
##     file="tendulkarVsAusInAus.csv",type="batting")

The cricketr package includes some pre-packaged sample (.csv) files. You can use these sample to test functions  as shown below

# Retrieve the file path of a data file installed with cricketr
pathToFile ,"Sachin Tendulkar")

unnamed-chunk-2-1

Alternatively, the cricketr package can be installed from GitHub with

if (!require("cricketr")){ 
    library(devtools) 
    install_github("tvganesh/cricketr") 
}
library(cricketr)

The pre-packaged files can be accessed as shown above.
To get the data of any player use the function getPlayerData()

tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2),
                           result=c(1,2,4))

Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

Sachin Tendulkar’s performance – Basic Analyses

The 3 plots below provide the following for Tendulkar

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Sachin Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Sachin Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Sachin Tendulkar")

tendulkar-batting-1

dev.off()
## null device 
##           1

More analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")

tendulkar-4s6sout-1

 

3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Sachin’s Runs versus Balls Faced and Minutes at crease. A linear regression model is then fitted between Runs and Balls Faced + Minutes at crease

battingPerf3d("./tendulkar.csv","Sachin Tendulkar")

tendulkar-3d-1

Average runs at different venues

The plot below gives the average runs scored by Tendulkar at different grounds. The plot also displays the number of innings at each ground as a label at x-axis. It can be seen Tendulkar did great in Colombo (SSC), Melbourne ifor matches overseas and Mumbai, Mohali and Bangalore at home

batsmanAvgRunsGround("./tendulkar.csv","Sachin Tendulkar")
tendulkar-avggrd-1

Average runs against different opposing teams

This plot computes the average runs scored by Tendulkar against different countries. The x-axis also gives the number of innings against each team

batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
tendulkar-avgopn-1

Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease using. K-Means. The centroids of 3 clusters are computed and plotted. In this plot. Sachin Tendulkar’s highest tendencies are computed and plotted using K-Means

batsmanRunsLikelihood("./tendulkar.csv","Sachin Tendulkar")

tendulkar-kmeans-1

## Summary of  Sachin Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Sachin Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 58.41 % likelihood that Sachin Tendulkar  will make  16 Runs in  31 balls over  44  Minutes 
## There is a 25.08 % likelihood that Sachin Tendulkar  will make  66 Runs in  122 balls over 167  Minutes

A look at the Top 4 batsman – Tendulkar, Kallis, Ponting and Sangakkara

The batsmen with the most hundreds in test cricket are

  1. Sachin Tendulkar :Average:53.78,100’s – 51, 50’s – 68
  2. Jacques Kallis : Average: 55.47, 100’s – 45, 50’s – 58
  3. Ricky Ponting : Average: 51.85, 100’s – 41 , 50’s – 62
  4. Kumara Sangakarra: Average: 58.04 ,100’s – 38 , 50’s – 52

in that order.

The following plots take a closer at their performances. The box plots show the mean (red line) and median (blue line). The two ends of the boxplot display the 25th and 75th percentile.

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency. The calculated Mean differ from the stated means possibly because of data cleaning. Also not sure how the means were arrived at ESPN Cricinfo for e.g. when considering not out..

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")

tkps-boxhist-1

batsmanPerfBoxHist("./kallis.csv","Jacques Kallis")

tkps-boxhist-2

batsmanPerfBoxHist("./ponting.csv","Ricky Ponting")

tkps-boxhist-3

batsmanPerfBoxHist("./sangakkara.csv","K Sangakkara")

tkps-boxhist-4

Contribution to won and lost matches

The plot below shows the contribution of Tendulkar, Kallis, Ponting and Sangakarra in matches won and lost. The plots show the range of runs scored as a boxplot (25th & 75th percentile) and the mean scored. The total matches won and lost are also printed in the plot.

All the players have scored more in the matches they won than the matches they lost. Ricky Ponting is the only batsman who seems to have more matches won to his credit than others. This could also be because he was a member of strong Australian team

For the next 2 functions below you will have to use the getPlayerDataSp() function. I
have commented this as I already have these files

tendulkarsp 
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("kallissp.csv","Kallis")
batsmanContributionWonLost("pontingsp.csv","Ponting")
batsmanContributionWonLost("sangakkarasp.csv","Sangakarra")

tkps-wonlost-1

dev.off()
## null device 
##           1

Performance at home and overseas

From the plot below it can be seen
Tendulkar has more matches overseas than at home and his performance is consistent in all venues at home or abroad. Ponting has lesser innings than Tendulkar and has an equally good performance at home and overseas.Kallis and Sangakkara’s performance abroad is lower than the performance at home.

This function also requires the use of getPlayerDataSp() as shown above

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("kallissp.csv","Kallis")
batsmanPerfHomeAway("pontingsp.csv","Ponting")
batsmanPerfHomeAway("sangakkarasp.csv","Sangakarra")
dev.off()
tkps-homeaway-1
dev.off()
## null device 
##           1
 

Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4. Clearly . Kallis and Sangakkara have a few more years of great batting ahead. They seem to average on 50. . Tendulkar and Ponting definitely show a slump in the later years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Sachin Tendulkar")
batsmanMovingAverage("./kallis.csv","Jacques Kallis")
batsmanMovingAverage("./ponting.csv","Ricky Ponting")
batsmanMovingAverage("./sangakkara.csv","K Sangakkara")

tkps-ma-1

dev.off()
## null device 
##           1

Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Tendulkar averages around 50, while Sangakkarra touches 55 towards the end of his career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")

tkps-car-1

batsmanCumulativeAverageRuns("./kallis.csv","Kallis")

tkps-car-2

batsmanCumulativeAverageRuns("./ponting.csv","Ponting")

tkps-car-3

batsmanCumulativeAverageRuns("./sangakkara.csv","Sangakkara")

tkps-car-4

dev.off()
## null device 
##           1

Cumulative Average strike rate of batsman in career

This function gives the cumulative strike rate of the batsman over the career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")

tkps-casr-1

batsmanCumulativeStrikeRate("./kallis.csv","Kallis")

tkps-casr-2

batsmanCumulativeStrikeRate("./ponting.csv","Ponting")

tkps-casr-3

batsmanCumulativeStrikeRate("./sangakkara.csv","Sangakkara")

tkps-casr-4

dev.off()
## null device 
##           1

Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

  • Tendulkar’s forecasted performance seems to tally with his actual performance with an average of 50
  • Kallis the forecasted runs are higher than the actual runs he scored
  • Ponting seems to have a good run in the future
  • Sangakkara has a decent run in the future averaging 50 runs
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kallis.csv","Jacques Kallis")
batsmanPerfForecast("./ponting.csv","Ricky Ponting")
batsmanPerfForecast("./sangakkara.csv","K Sangakkara")

tkps-perffcst-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 0 – 50 Runs – Ponting leads followed by Tendulkar Range 50 -100 Runs – Ponting followed by Sangakkara Range 100 – 150 – Ponting and then Tendulkar

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeBatsmanSR(frames,names)

tkps-relSR-1

Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

Sangakkara leads followed by Ponting

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeRunsFreqPerf(frames,names)

tkps-relRunFreq-1

Relative cumulative average runs in career

The plot below compares the relative cumulative runs of the batsmen over the career. While Tendulkar seems to lead over the others with a cumulative average of 50, we can see that Sangakkara goes over everybody else between 180-220th inning. It is likely that Sangakkarra may have overtaken Tendulkar if he had played more

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeBatsmanCumulativeAvgRuns(frames,names)

tkps-relcar-11

Relative cumulative average strike rate in career

As seen in earlier charts Ponting has the best overall strike rate, followed by Sangakkara and then Tendulkar

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeBatsmanCumulativeStrikeRate(frames,names)

tkps-relcsr-1

Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## *******************************************************************************************
## 
## Population size: 294  Mean of population: 50.48 
## Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 
## 
## Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./kallis.csv","Jacques Kallis")
## *******************************************************************************************
## 
## Population size: 240  Mean of population: 47.5 
## Sample size: 27  Mean of sample: 47.11 SD of sample: 59.19 
## 
## Null hypothesis H0 : Jacques Kallis 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Jacques Kallis 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Jacques Kallis 's Form Status: In-Form because the p value: 0.48647  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./ponting.csv","Ricky Ponting")
## *******************************************************************************************
## 
## Population size: 251  Mean of population: 47.5 
## Sample size: 28  Mean of sample: 36.25 SD of sample: 48.11 
## 
## Null hypothesis H0 : Ricky Ponting 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Ricky Ponting 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Ricky Ponting 's Form Status: In-Form because the p value: 0.113115  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./sangakkara.csv","K Sangakkara")
## *******************************************************************************************
## 
## Population size: 193  Mean of population: 51.92 
## Sample size: 22  Mean of sample: 71.73 SD of sample: 82.87 
## 
## Null hypothesis H0 : K Sangakkara 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : K Sangakkara 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "K Sangakkara 's Form Status: In-Form because the p value: 0.862862  is greater than alpha=  0.05"
## *******************************************************************************************

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Tendulkar")
battingPerf3d("./kallis.csv","Kallis")
plot-3-1par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./ponting.csv","Ponting")
battingPerf3d("./sangakkara.csv","Sangakkara")
plot-4-1dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. A sample sequence of Balls Faced(BF) and Minutes at crease (Mins) is setup as shown below. The fitted model is used to predict the runs for these values

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kallis <- batsmanRunsPredict("./kallis.csv","Kallis",newdataframe=newDF)
ponting <- batsmanRunsPredict("./ponting.csv","Ponting",newdataframe=newDF)
sangakkara <- batsmanRunsPredict("./sangakkara.csv","Sangakkara",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Ponting has the will score the highest for a given Balls Faced and Minutes at crease.

Ponting is followed by Tendulkar who has Sangakkara close on his heels and finally we have Kallis. This is intuitive as we have already seen that Ponting has a highest strike rate.

batsmen <-cbind(round(tendulkar$Runs),round(kallis$Runs),round(ponting$Runs),round(sangakkara$Runs))
colnames(batsmen) <- c("Tendulkar","Kallis","Ponting","Sangakkara")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kallis Ponting Sangakkara
## 1          10           30         7      6       9          2
## 2          38           71        23     20      25         18
## 3          66          111        39     34      42         34
## 4          94          152        54     48      59         50
## 5         121          193        70     62      76         66
## 6         149          234        86     76      93         82
## 7         177          274       102     90     110         98
## 8         205          315       118    104     127        114
## 9         233          356       134    118     144        130
## 10        261          396       150    132     161        146
## 11        289          437       165    146     178        162
## 12        316          478       181    159     194        178
## 13        344          519       197    173     211        194
## 14        372          559       213    187     228        210
## 15        400          600       229    201     245        226

Checkout my book ‘Deep Learning from first principles Second Edition- In vectorized Python, R and Octave’.  My book is available on Amazon  as paperback ($18.99) and in kindle version($9.99/Rs449).

You may also like my companion book “Practical Machine Learning with R and Python:Second Edition- Machine Learning in stereo” available in Amazon in paperback($12.99) and Kindle($9.99/Rs449) versions.

Analysis of Top 3 wicket takers

The top 3 wicket takes in test history are
1. M Muralitharan:Wickets: 800, Average = 22.72, Economy Rate – 2.47
2. Shane Warne: Wickets: 708, Average = 25.41, Economy Rate – 2.65
3. Anil Kumble: Wickets: 619, Average = 29.65, Economy Rate – 2.69

How do Anil Kumble, Shane Warne and M Muralitharan compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

Wicket Frequency Plot

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./kumble.csv","Anil Kumble")
bowlerWktsFreqPercent("./warne.csv","Shane Warne")
bowlerWktsFreqPercent("./murali.csv","M Muralitharan")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./kumble.csv","Kumble")
bowlerWktsRunsPlot("./warne.csv","Warne")
bowlerWktsRunsPlot("./murali.csv","Muralitharan")
wktsrun-1
dev.off()
## null device 
##           1

Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. Muralitharan has taken an average of 8 and 6 wickets at Oval & Wellington respectively in 2 different innings. His best performances are at Kandy and Colombo (SSC)

bowlerAvgWktsGround("./murali.csv","Muralitharan")
avgWktshrg-1

Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

bowlerAvgWktsOpposition("./murali.csv","Muralitharan")
avgWktoppn-1

Wickets taken moving average

From th eplot below it can be see 1. Shane Warne’s performance at the time of his retirement was still at a peak of 3 wickets 2. M Muralitharan seems to have become ineffective over time with his peak years being 2004-2006 3. Anil Kumble also seems to slump down and become less effective.

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./kumble.csv","Anil Kumble")
bowlerMovingAverage("./warne.csv","Shane Warne")
bowlerMovingAverage("./murali.csv","M Muralitharan")

tkps-bowlma-1

dev.off()
## null device 
##           1

Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("./kumble.csv","Kumble")

kwm-bowlcaw-1

bowlerCumulativeAvgWickets("./warne.csv","Warne")

kwm-bowlcaw-2

bowlerCumulativeAvgWickets("./murali.csv","Muralitharan")

kwm-bowlcaw-3

dev.off()
## null device 
##           1

Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("./kumble.csv","Kumble")

kwm-bowlcer-1

bowlerCumulativeAvgEconRate("./warne.csv","Warne")

kwm-bowlcer-2

bowlerCumulativeAvgEconRate("./murali.csv","Muralitharan")

kwm-bowlcer-3

dev.off()
## null device 
##           1

Future Wickets forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the wickets forecasted for the bowlers below. – Shane Warne and Muralitharan have a fairly consistent forecast – Kumble forecast shows a small dip

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./kumble.csv","Anil Kumble")
bowlerPerfForecast("./warne.csv","Shane Warne")
bowlerPerfForecast("./murali.csv","M Muralitharan")

kwm-perffcst-1

dev.off()
## null device 
##           1

Contribution to matches won and lost

The plot below is extremely interesting
1. Kumble wickets range from 2 to 4 wickets in matches wons with a mean of 3
2. Warne wickets in won matches range from 1 to 4 with more matches won. Clearly there are other bowlers contributing to the wins, possibly the pacers
3. Muralitharan wickets range in winning matches is more than the other 2 and ranges ranges 3 to 5 and clearly had a hand (pun unintended) in Sri Lanka’s wins

As discussed above the next 2 charts require the use of getPlayerDataSp()

kumblesp 
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerContributionWonLost("kumblesp.csv","Kumble")
bowlerContributionWonLost("warnesp.csv","Warne")
bowlerContributionWonLost("muralisp.csv","Murali")

kwm-wl-1

dev.off()
## null device 
##           1

Performance home and overseas

From the plot below it can be seen that Kumble & Warne have played more matches overseas than Muralitharan. Both Kumble and Warne show an average of 2 wickers overseas,  Murali on the other hand has an average of 2.5 wickets overseas but a slightly less number of matches than Kumble & Warne

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerPerfHomeAway("kumblesp.csv","Kumble")
bowlerPerfHomeAway("warnesp.csv","Warne")
bowlerPerfHomeAway("muralisp.csv","Murali")

kwm-ha-1
dev.off()
## null device 
##           1
 

Relative Wickets Frequency Percentage

The Relative Wickets Percentage plot shows that M Muralitharan has a large percentage of wickets in the 3-8 wicket range

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Clearly from the plot below it can be seen that Muralitharan has the best Economy Rate among the three

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlingER(frames,names)

relBowlER-1

Relative cumulative average wickets of bowlers in career

The plot below shows that Murali has the best cumulative average wickets taken followed by Kumble and then Warne

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlerCumulativeAvgWickets(frames,names)

rbcaw-1

Relative cumulative average economy rate of bowlers

Muralitharan has the best economy rate followed by Warne and then Kumble

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlerCumulativeAvgEconRate(frames,names)

rbcer-1

Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

Note: The check for the form status of the bowlers indicate 1. That both Kumble and Muralitharan were out of form. This also shows in the moving average plot 2. Warne is still in great form and could have continued for a few more years. Too bad we didn’t see the magic later

checkBowlerInForm("./kumble.csv","Anil Kumble")
## *******************************************************************************************
## 
## Population size: 212  Mean of population: 2.69 
## Sample size: 24  Mean of sample: 2.04 SD of sample: 1.55 
## 
## Null hypothesis H0 : Anil Kumble 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Anil Kumble 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Anil Kumble 's Form Status: Out-of-Form because the p value: 0.02549  is less than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./warne.csv","Shane Warne")
## *******************************************************************************************
## 
## Population size: 240  Mean of population: 2.55 
## Sample size: 27  Mean of sample: 2.56 SD of sample: 1.8 
## 
## Null hypothesis H0 : Shane Warne 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Shane Warne 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Shane Warne 's Form Status: In-Form because the p value: 0.511409  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./murali.csv","M Muralitharan")
## *******************************************************************************************
## 
## Population size: 207  Mean of population: 3.55 
## Sample size: 23  Mean of sample: 2.87 SD of sample: 1.74 
## 
## Null hypothesis H0 : M Muralitharan 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : M Muralitharan 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "M Muralitharan 's Form Status: Out-of-Form because the p value: 0.036828  is less than alpha=  0.05"
## *******************************************************************************************
dev.off()
## null device 
##           1

Key Findings

The plots above capture some of the capabilities and features of my cricketr package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.
Here are the main findings from the analysis above

Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing

  1. Sangakkara has the highest average, followed by Tendulkar, Kallis and then Ponting.
  2. Ponting has the highest strike rate followed by Tendulkar,Sangakkara and then Kallis
  3. The predicted runs for a given Balls faced and Minutes at crease is highest for Ponting, followed by Tendulkar, Sangakkara and Kallis
  4. The moving average for Tendulkar and Ponting shows a downward trend while Kallis and Sangakkara retired too soon
  5. Tendulkar was out of form about the time of retirement while the rest were in-form. But this result has to be taken along with the moving average plot. Ponting was clearly on the way out.
  6. The home and overseas performance indicate that Tendulkar is the clear leader. He has the highest number of matches played overseas and his performance has been consistent. He is followed by Ponting, Kallis and finally Sangakkara

Analysis of Top 3 legs spinners

The analysis of Anil Kumble, Shane Warne and M Muralitharan show the following

  1. Muralitharan has the highest wickets and best economy rate followed by Warne and Kumble
  2. Muralitharan has higher wickets frequency percentage between 3 to 8 wickets
  3. Muralitharan has the best Economy Rate for wickets between 2 to 7
  4. The moving average plot shows that the time was up for Kumble and Muralitharan but Warne had a few years ahead
  5. The check for form status shows that Muralitharan and Kumble time was over while Warne still in great form
  6. Kumble’s has more matches abroad than the other 2, yet Kumble averages of 3 wickets at home and 2 wickets overseas liek Warne . Murali has played few matches but has an average of 4 wickets at home and 3 wickets overseas.

Final thoughts

Here are my final thoughts

Batting

Among the 4 batsman Tendulkar, Kallis, Ponting and Sangakkara the clear leader is Tendulkar for the following reasons

  1. Tendulkar has the highest test centuries and runs of all time.Tendulkar’s average is 2nd to Sangakkara, Tendulkar’s predicted runs for a given Balls faced and Minutes at Crease is 2nd and is behind Ponting. Also Tendulkar’s performance at home and overseas are consistent throughtout despite the fact that he has a highest number of overseas matches
  2. Ponting takes the 2nd spot with the 2nd highest number of centuries, 1st in Strike Rate and 2nd in home and away performance.
  3. The 3rd spot goes to Sangakkara, with the highest average, 3rd highest number of centuries, reasonable run frequency percentage in different run ranges. However he has a fewer number of matches overseas and his performance overseas is significantly lower than at home
  4. Kallis has the 2nd highest number of centuries but his performance overseas and strike rate are behind others
  5. Finally Kallis and Sangakkara had a few good years of batting still left in them (pity they retired!) while Tendulkar and Ponting’s time was up
  6. While Tendulkars cumulative average stays around 50 runs, Sangakkara briefly overtakes Tendulkar towards the end of his career. Sangakkara may have finished with a better average if he had played for a few more years
  7. Ponting has the best overall strike rate followed by Sangakkara

Bowling

Muralitharan leads the way followed closely by Warne and finally Kumble. The reasons are

  1. Muralitharan has the highest number of test wickets with the best Wickets percentage and the best Economy Rate. Murali on average gas taken 4 wickets at home and 3 wickets overseas
  2. Warne follows Murali in the highest wickets taken, however Warne has less matches overseas than Murali and average 3 wickets home and 2 wickets overseas
  3. Kumble has the 3rd highest wickets, with 3 wickets on an average at home and 2 wickets overseas. However Kumble has played more matches overseas than the other two. In that respect his performance is great. Also Kumble has played less matches at home otherwise his numbers would have looked even better.
  4. Also while Kumble and Muralitharan’s career was on the decline , Warne was going great and had a couple of years ahead.
  5. Muralitharan has the best cumulative wicket rate and economy rate. Kumble has a better wicket rate than Warne but is more expensive than Warne

You can download this analysis at Introducing cricketrYou can download this analysis at Re-Introducing cricketr

Important note: Do check out my other posts using cricketr at cricketr-posts

Also see

1.Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
2.yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
3.yorkr crashes the IPL party !Part 1
4.Introducing cricketr! : An R package to analyze performances of cricketers
5.Beaten by sheer pace! Cricket analytics with yorkr in paperback and Kindle versions
6. Cricket analytics with cricketr in paperback and Kindle versions

You may also like
1. A crime map of India in R: Crimes against women
2.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
3.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
6. Deblurring with OpenCV:Weiner filter reloaded
7. Fun simulation of a Chain in Androidhttp://www.r-bloggers.com/introducing-cricketr-an-r-package-to-analyze-performances-of-cricketers/

yorkr ranks ODI batsmen and bowlers

This is the last and final post in which yorkr ranks ODI batsmen and bowlers. These are based on match data from Cricsheet. The ranking is done on

  1. average runs and average strike rate for batsmen and
  2. average wickets and average economy rate for bowlers.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

nd $4.99/Rs 320 and $6.99/Rs448 respectively

 

This post has also been published in RPubs RankODIPlayers. You can download this as a pdf file at RankODIPlayers.pdf.

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

You can take a look at the code at rankODIPlayers (available in yorkr_0.0.5)

rm(list=ls())
library(yorkr)
library(dplyr)
source("rankODIBatsmen.R")
source("rankODIBowlers.R")

Rank ODI batsmen

The top 3 ODI batsmen are hashim Amla (SA), Matther Hayden(Aus) & Virat Kohli (Ind) . Note: For ODI a a cutoff of at least 50 matches played was chosen.

ODIBatsmanRank <- rankODIBatsmen()
as.data.frame(ODIBatsmanRank[1:30,])
##            batsman matches meanRuns    meanSR
## 1          HM Amla     185 51.96216  84.15508
## 2        ML Hayden      79 50.08861  81.20646
## 3          V Kohli     279 48.51971  78.55197
## 4   AB de Villiers     253 47.93676  95.05561
## 5     SR Tendulkar     151 45.82119  79.62311
## 6         S Dhawan     116 45.03448  81.54043
## 7         V Sehwag     167 44.49102 106.27563
## 8          JE Root     111 43.64865  81.66054
## 9        Q de Kock      85 43.61176  82.55235
## 10       IJL Trott     113 43.36283  70.69761
## 11   KC Sangakkara     293 42.81911  75.10420
## 12      TM Dilshan     283 41.76678  89.70360
## 13   KS Williamson     146 41.24658  73.49267
## 14   S Chanderpaul      93 40.07527  70.59613
## 15        HH Gibbs      75 40.00000  79.03813
## 16     Salman Butt      57 39.85965  59.29807
## 17    Anamul Haque      58 39.72414  56.45224
## 18      RT Ponting     238 38.88235  71.94294
## 19       JH Kallis     136 38.77941  67.17794
## 20        MS Dhoni     328 38.57927  90.30555
## 21      MJ Guptill     199 38.54774  73.88090
## 22       DA Warner     138 38.52174  87.24978
## 23 Mohammad Yousuf      94 38.44681  72.69851
## 24        JD Ryder      66 38.40909  91.29667
## 25       GJ Bailey     133 38.38346  75.74519
## 26       G Gambhir     209 37.83254  75.15483
## 27      AJ Strauss     122 37.80328  71.54844
## 28       MJ Clarke     301 37.67442  69.78415
## 29       SR Watson     274 37.08029  83.46489
## 30        AJ Finch     103 36.36893  79.49845

Rank ODI bowlers

The top 3 ODI bowlers are R J Harris (Aus), MJ Henry(NZ) and MA Starc(Aus). Mohammed Shami is 4th and Amit Mishra is 8th A cutoff of 20 matches was considered for bowlers

ODIBowlersRank <- rankODIBowlers()
## [1] 35072     3
## [1] "C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches"
as.data.frame(ODIBowlersRank[1:30,])
##               bowler matches meanWickets   meanER
## 1  Mustafizur Rahman      56    4.000000 4.293214
## 2           JH Davey      53    3.528302 4.455094
## 3          RJ Harris      94    3.276596 4.361489
## 4           MA Starc     208    3.144231 4.425865
## 5           MJ Henry      88    3.125000 4.961250
## 6         A Flintoff     139    2.956835 4.283022
## 7           A Mishra     106    2.886792 4.365849
## 8     Mohammed Shami     144    2.777778 5.609306
## 9     MJ McClenaghan     165    2.751515 5.640424
## 10          CJ McKay     230    2.704348       NA
## 11       MF Maharoof     114    2.701754 4.427018
## 12       Imran Tahir     156    2.660256 4.461923
## 13        BAW Mendis     234    2.641026 4.532308
## 14     RK Kleinveldt      54    2.629630 4.306667
## 15      Arafat Sunny      62    2.612903 4.103226
## 16         JE Taylor     156    2.602564 5.115192
## 17           AJ Hall      55    2.600000 3.879091
## 18        WD Parnell     129    2.596899 5.477597
## 19         CR Woakes     129    2.596899 5.340620
## 20      DE Bollinger     152    2.592105 4.282763
## 21        Wahab Riaz     206    2.567961 5.431748
## 22        PJ Cummins     148    2.567568 5.715405
## 23         R Rampaul     173    2.549133 4.726590
## 24      Taskin Ahmed      56    2.535714 5.325357
## 25          DW Steyn     292    2.534247 4.534007
## 26      JR Hazlewood      64    2.531250 4.392500
## 27        Abdur Rauf      84    2.523810 4.786667
## 28           SW Tait     141    2.517730 5.173191
## 29      Hamid Hassan     106    2.509434 4.686038
## 30        SL Malinga     419    2.498807 4.968974

Hope you have fun with my yorkr package.!

Important note: Do check out my other posts using yorkr at yorkr-posts

yorkr ranks T20 batsmen and bowlers

Here is another short post which ranks T20 batsmen and bowlers. These are based on match data from Cricsheet. The ranking is done on

  1. average runs and average strike rate for batsmen and
  2. average wickets and average economy rate for bowlers.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

.99/Rs 320 and $6.99/Rs448 respectively

 

This post has also been published in RPubs RankT20Players. You can download this as a pdf file at RankT20Players.pdf.

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

You can take a look at the code at rankT20Players (available in yorkr_0.0.5)

rm(list=ls())
library(yorkr)
library(dplyr)
source("rankT20Batsmen.R")
source("rankT20Bowlers.R")

Rank T20 batsmen

Virat Kohli (Ind), Chris Gayle (WI) and Kevin Pietersen (Eng) top the T20 rankings. Virat Kohli stands tall among the batsmen with a average of 39.1935, followed by Chris Gayle who has an average of 32.69 and finally Kevin Pietersen.

Note: For T20 a cutoff of at least 30 matches played was chosen.

T20BatsmanRank <- rankT20Batsmen()
as.data.frame(T20BatsmanRank[1:30,])
##             batsman matches meanRuns   meanSR
## 1           V Kohli      31 39.19355 128.8371
## 2          CH Gayle      43 32.69767 119.6467
## 3      KP Pietersen      37 32.43243 138.6732
## 4     KS Williamson      31 32.25806 130.1255
## 5  Mohammad Shahzad      33 31.66667 115.4582
## 6       BB McCullum      69 30.98551 126.0610
## 7        MJ Guptill      54 30.83333 120.0669
## 8          AD Hales      37 30.75676 115.3511
## 9       H Masakadza      38 29.26316 109.6182
## 10         GC Smith      32 27.59375 114.1831
## 11        DA Warner      56 27.53571 123.2209
## 12        JP Duminy      58 26.84483 117.3952
## 13 DPMD Jayawardene      51 26.47059 112.4257
## 14        SR Watson      50 26.30000 118.9464
## 15    KC Sangakkara      52 26.23077 112.4665
## 16       TM Dilshan      66 26.18182 102.5683
## 17         SK Raina      43 25.90698 124.3044
## 18        RG Sharma      41 25.68293 123.3983
## 19        G Gambhir      36 25.66667 117.5764
## 20     Yuvraj Singh      41 25.12195 119.5846
## 21    Misbah-ul-Haq      32 25.09375 106.6762
## 22       EJG Morgan      52 24.71154 121.1462
## 23       MN Samuels      40 24.35000 105.8547
## 24       MEK Hussey      30 24.03333 129.1250
## 25    Ahmed Shehzad      41 23.82927 100.8805
## 26  Shakib Al Hasan      40 23.35000 109.3798
## 27          HM Amla      30 23.33333 111.2513
## 28         CL White      45 22.73333       NA
## 29      LMP Simmons      33 22.54545       NA
## 30       Umar Akmal      69 22.20290 108.3590

Rank T20 bowlers

The top 3 T20 bowlers are BAW Mendis (SL) Umar Gul (Pak) and Steyn(SA). R Ashwin is 13th. As with batsmen, a minimum of 30 matches played was taken into consideration.

T20BowlersRank <- rankT20Bowlers()
as.data.frame(T20BowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       BAW Mendis      36   1.6944444 6.581111
## 2         Umar Gul      57   1.5964912 7.306842
## 3         DW Steyn      38   1.5526316 6.407632
## 4      Saeed Ajmal      63   1.4920635 6.316190
## 5       SL Malinga      59   1.4576271 7.163898
## 6       TG Southee      37   1.4054054 8.840000
## 7       MG Johnson      30   1.4000000 7.080667
## 8         GP Swann      38   1.3947368 6.576842
## 9      JW Dernbach      33   1.3636364 8.550303
## 10        M Morkel      39   1.3333333 7.384872
## 11 Shakib Al Hasan      37   1.2972973 6.648649
## 12       SP Narine      32   1.2500000 5.757812
## 13        R Ashwin      33   1.2424242 7.247273
## 14 KMDN Kulasekara      42   1.2380952 6.938095
## 15       SCJ Broad      55   1.2363636 7.832182
## 16      WD Parnell      34   1.2058824 8.227941
## 17        KD Mills      41   1.1951220 8.077317
## 18      DL Vettori      34   1.1470588 5.708235
## 19   Shahid Afridi      85   1.1294118 6.748000
## 20       SR Watson      44   1.1136364 8.015227
## 21   Sohail Tanvir      48   1.1041667 7.354167
## 22   Sohail Tanvir      48   1.1041667 7.354167
## 23     NL McCullum      56   1.0535714 7.246964
## 24     NLTC Perera      34   1.0294118 8.916471
## 25         J Botha      39   1.0256410 6.647436
## 26        DJ Bravo      45   1.0222222 8.630000
## 27   Mohammad Nabi      32   0.9687500 7.208437
## 28       DJG Sammy      55   0.8909091 7.899818
## 29 Mohammad Hafeez      56   0.8392857 6.996964
## 30      AD Mathews      44   0.7954545 6.827727

Conclusion

Conclusion

As expected Virat Kohli stands head and shoulders above the rest. Hamid Hasan and Mohammed Shami figuring the top T20 bowlers was a bit of a surprise to me.

Important note: Do check out my other posts using yorkr at yorkr-posts

Watch this space!

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  3. yorkr crashes the IPL party !Part 1
  4. Introducing cricketr! : An R package to analyze performances of cricketers
  5. Cricket analytics with cricketr in paperback and Kindle versions

yorkr ranks IPL batsmen and bowlers

Here is a short post which ranks IPL batsmen and bowlers. These are based on match data from Cricsheet. Ranking batsmen and bowlers in IPL is more challenging as the players can belong to different teams in different years. Hence I create a combined data frame of the batsmen and bowlers regardless of their IPL teams and calculate a) average runs and average strike rate for batsmen and c) average wickets and d) average economy rate for bowlers.

I will be doing this ranking for T20 and ODI batting and bowling performances shortly.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This post has also been published in RPubs RankIPLPlayers. You can download this as a pdf file at RankIPLPlayers.pdf.

You can take a look at the code at rankIPLPlayers (should be available in yorkr_0.0.5)

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

The results are slightly surprising

rm(list=ls())
library(yorkr)
library(dplyr)
setwd("C:/software/cricket-package/cricsheet/cleanup/IPL/rank")
source("rankIPLBatsmen.R")
source("rankIPLBowlers.R")

Rank IPL batsmen

Chris Gayle, MEK Hussey and Shane Watson are top 3 IPL batsmen. Gayle towers over the others in mean runs and mean strike rate. Surprisingly Ajinkya Rahane is the top Indian T20 batsman, if we leave out Sachin Tendulkar (who tops India yet again!). The other top IPL T20 batsmen are Raina, Gambhir, Rohit Sharma in that order. Virat Kohli comes a distant 14th.

iplBatsmanRank <- rankIPLBatsmen()
as.data.frame(iplBatsmanRank[1:30,])
##             batsman matches meanRuns    meanSR
## 1          CH Gayle     128 40.00781 144.92188
## 2        MEK Hussey      64 33.57812 107.23500
## 3         SR Watson      75 31.46667 129.97733
## 4      SR Tendulkar     127 29.74803 108.86622
## 5         AM Rahane      77 29.14286 101.40065
## 6         DA Warner     134 29.10448 118.38313
## 7         JP Duminy      94 28.77660 124.61702
## 8          SK Raina     128 28.62500 122.12656
## 9         G Gambhir     210 28.13810 108.78090
## 10        RG Sharma     181 28.07182 118.57801
## 11         DR Smith      78 27.82051 119.64462
## 12      BB McCullum      98 27.81633 114.91255
## 13         S Dhawan     109 27.74312 112.21000
## 14          V Kohli     188 27.56915 113.81261
## 15   AB de Villiers     150 27.46000 136.70860
## 16         R Dravid     104 27.02885 107.78923
## 17        JH Kallis     167 26.54491  94.65641
## 18         V Sehwag     174 26.39655 140.29011
## 19       RV Uthappa     166 26.27711 120.48506
## 20       SC Ganguly      86 25.98837  96.39849
## 21     AC Gilchrist      81 25.77778 122.69074
## 22    KC Sangakkara      70 25.67143 112.97529
## 23         MS Dhoni     119 25.29412 130.99832
## 24       TM Dilshan      82 24.13415 101.12634
## 25          M Vijay      96 23.92708 102.01771
## 26        AT Rayudu     146 23.63014 117.91000
## 27 DPMD Jayawardene     109 22.95413 110.73862
## 28        MK Pandey     105 22.71429        NA
## 29     Yuvraj Singh     112 22.48214 114.51018
## 30      S Badrinath      66 22.22727 114.97061

Rank IPL bowlers

The top 3 IPL T20 bowlers are SL Malinga,SP Narine and DJ Bravo.

Don’t get hung up on the decimals in the average wickets for the bowlers. All it implies is that if 2 bowlers have average wickets of 1.0 and 1.5, it implies that in 2 matches the 1st bowler will take 2 wickets and the 2nd bowler will take 3 wickets.

iplBowlersRank <- rankIPLBowlers()
as.data.frame(iplBowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       SL Malinga      96    1.645833 6.545208
## 2        SP Narine      54    1.555556 5.967593
## 3         DJ Bravo      58    1.517241 7.929310
## 4         M Morkel      37    1.405405 7.626216
## 5        IK Pathan      40    1.400000 7.579250
## 6         RP Singh      42    1.357143 7.966429
## 7         MM Patel      31    1.354839 7.282581
## 8  Shakib Al Hasan      32    1.343750 6.911250
## 9    R Vinay Kumar      63    1.317460 8.342540
## 10       MM Sharma      46    1.304348 7.740652
## 11         P Awana      33    1.303030 8.325758
## 12        MM Patel      30    1.300000 7.569667
## 13          Z Khan      41    1.292683 7.735854
## 14        A Mishra      43    1.255814 7.226512
## 15         PP Ojha      53    1.245283 7.268679
## 16     JP Faulkner      40    1.225000 8.502250
## 17     DS Kulkarni      32    1.156250 8.372188
## 18        UT Yadav      46    1.152174 8.394783
## 19        A Kumble      41    1.146341 6.567073
## 20       JA Morkel      73    1.136986 8.131370
## 21        SK Warne      53    1.132075 7.277170
## 22 Harbhajan Singh     107    1.102804 7.014953
## 23        L Balaji      34    1.088235 7.186176
## 24        R Ashwin      92    1.065217 6.812391
## 25        AR Patel      31    1.064516 7.137097
## 26  M Muralitharan      39    1.051282 6.470256
## 27         P Kumar      36    1.027778 8.148056
## 28       PP Chawla      85    1.023529 8.017765
## 29       SR Watson      67    1.014925 7.695224
## 30        DJ Bravo      30    1.000000 7.966333

Conclusion: The results are somewhat surprising. The ranking was based on data from Cricsheet. The data in this site are available from 2008-2015. I hope to do this ranking for T20 and ODIs shortly

Important note: Do check out my other posts using yorkr at yorkr-posts

Watch this space!

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  3. yorkr crashes the IPL party !Part 1
  4. Introducing cricketr! : An R package to analyze performances of cricketers
  5. Cricket analytics with cricketr in paperback and Kindle versions