Analyzing performances of cricketers using cricketr template
This post includes a template which you can use for analyzing the performances of cricketers, both batsmen and bowlers in Test, ODI and Twenty 20 cricket using my R package cricketr. To see actual usage of functions in the R package cricketr see Introducing cricketr! : An R package to analyze performances of cricketers.
This template can be downloaded from Github at cricketer-template
The ‘cricketr’ package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.
You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use
Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial
Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar
Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!! See Cricketr adds team analytics to its repertoire!!!
Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players
Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers”
The cricketr package
The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.
Other interesting functions include batting performance moving average, forecast and a function to check whether the batsmans in in-form or out-of-form.
The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below
The cricketr package is now available from CRAN!!! You should be able to install directly with
1. Install the cricketr package
if (!require("cricketr")){
install.packages("cricketr",lib = "c:/test")
}
library(cricketr)
The cricketr package includes some pre-packaged sample (.csv) files. You can use these sample to test functions as shown below
# Retrieve the file path of a data file installed with cricketr
#pathToFile <- system.file("data", "tendulkar.csv", package = "cricketr")
#batsman4s(pathToFile, "Sachin Tendulkar")
# The general format is pkg-function(pathToFile,par1,...)
#batsman4s(<path-To-File>,"Sachin Tendulkar")
“` The pre-packaged files can be accessed as shown above. To get the data of any player use the function in Test, ODI and Twenty20 use the following
2. For Test cricket
#tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4))
2a. For ODI cricket
#tendulkarOD <- getPlayerDataOD(35320,dir="..",file="tendulkarOD.csv",type="batting")
2b For Twenty 20 cricket
#tendulkarT20 <- getPlayerDataTT(35320,dir="..",file="tendulkarT20.csv",type="batting")
Analysis of batsmen
Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses
Sachin Tendulkar’s performance – Basic Analyses
The 3 plots below provide the following for Tendulkar
- Frequency percentage of runs in each run range over the whole career
- Mean Strike Rate for runs scored in the given range
- A histogram of runs frequency percentages in runs ranges For example
3. Basic analyses
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
#batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
#batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()
## null device
## 1
- Player 1
- Player 2
- Player 3
- Player 4
4. More analyses
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player1.csv","Player1")
#batsman6s("./player1.csv","Player1")
#batsmanMeanStrikeRate("./player1.csv","Player1")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device
## 1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player2.csv","Player2")
#batsman6s("./player2.csv","Player2")
#batsmanMeanStrikeRate("./player2.csv","Player2")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device
## 1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player3.csv","Player3")
#batsman6s("./player3.csv","Player3")
#batsmanMeanStrikeRate("./player3.csv","Player3")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device
## 1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player4.csv","Player4")
#batsman6s("./player4.csv","Player4")
#batsmanMeanStrikeRate("./player4.csv","Player4")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device
## 1
Note: For mean strike rate in ODI and Twenty20 use the function batsmanScoringRateODTT()
5.Boxplot histogram plot
This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency
#batsmanPerfBoxHist("./player1.csv","Player1")
#batsmanPerfBoxHist("./player2.csv","Player2")
#batsmanPerfBoxHist("./player3.csv","Player3")
#batsmanPerfBoxHist("./player4.csv","Player4")
6. Contribution to won and lost matches
For the 2 functions below you will have to use the getPlayerDataSp() function. I have commented this as I already have these files. This function can only be used for Test matches
#player1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player4sp.csv",ttype="batting")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanContributionWonLost("player1sp.csv","Player1")
#batsmanContributionWonLost("player2sp.csv","Player2")
#batsmanContributionWonLost("player3sp.csv","Player3")
#batsmanContributionWonLost("player4sp.csv","Player4")
dev.off()
## null device
## 1
7, Performance at home and overseas
This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfHomeAway("player1sp.csv","Player1")
#batsmanPerfHomeAway("player2sp.csv","Player2")
#batsmanPerfHomeAway("player3sp.csv","Player3")
#batsmanPerfHomeAway("player4sp.csv","Player4")
dev.off()
## null device
## 1
8. Batsman average at different venues
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsGround("./player1.csv","Player1")
#batsmanAvgRunsGround("./player2.csv","Player2")
#batsmanAvgRunsGround("./player3.csv","Ponting")
#batsmanAvgRunsGround("./player4.csv","Player4")
dev.off()
## null device
## 1
9. Batsman average against different opposition
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsOpposition("./player1.csv","Player1")
#batsmanAvgRunsOpposition("./player2.csv","Player2")
#batsmanAvgRunsOpposition("./player3.csv","Ponting")
#batsmanAvgRunsOpposition("./player4.csv","Player4")
dev.off()
## null device
## 1
10. Runs Likelihood of batsman
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanRunsLikelihood("./player1.csv","Player1")
#batsmanRunsLikelihood("./player2.csv","Player2")
#batsmanRunsLikelihood("./player3.csv","Ponting")
#batsmanRunsLikelihood("./player4.csv","Player4")
dev.off()
## null device
## 1
11. Moving Average of runs in career
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanMovingAverage("./player1.csv","Player1")
#batsmanMovingAverage("./player2.csv","Player2")
#batsmanMovingAverage("./player3.csv","Ponting")
#batsmanMovingAverage("./player4.csv","Player4")
dev.off()
## null device
## 1
12. Cumulative Average runs of batsman in career
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeAverageRuns("./player1.csv","Player1")
#batsmanCumulativeAverageRuns("./player2.csv","Player2")
#batsmanCumulativeAverageRuns("./player3.csv","Ponting")
#batsmanCumulativeAverageRuns("./player4.csv","Player4")
dev.off()
## null device
## 1
13. Cumulative Average strike rate of batsman in career
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeStrikeRate("./player1.csv","Player1")
#batsmanCumulativeStrikeRate("./player2.csv","Player2")
#batsmanCumulativeStrikeRate("./player3.csv","Ponting")
#batsmanCumulativeStrikeRate("./player4.csv","Player4")
dev.off()
## null device
## 1
14. Future Runs forecast
Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
Take a look at the runs forecasted for the batsman below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfForecast("./player1.csv","Player1")
#batsmanPerfForecast("./player2.csv","Player2")
#batsmanPerfForecast("./player3.csv","Player3")
#batsmanPerfForecast("./player4.csv","Player4")
dev.off()
## null device
## 1
15. Relative Mean Strike Rate plot
The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following
frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanSR(frames,names)
16. Relative Runs Frequency plot
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeRunsFreqPerf(frames,names)
17. Relative cumulative average runs in career
frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanCumulativeAvgRuns(frames,names)
18. Relative cumulative average strike rate in career
frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","player4")
#relativeBatsmanCumulativeStrikeRate(frames,names)
19. Check Batsman In-Form or Out-of-Form
The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
This is done for the Top 4 batsman
#checkBatsmanInForm("./player1.csv","Player1")
#checkBatsmanInForm("./player2.csv","Player2")
#checkBatsmanInForm("./player3.csv","Player3")
#checkBatsmanInForm("./player4.csv","Player4")
20. 3D plot of Runs vs Balls Faced and Minutes at Crease
The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player1.csv","Player1")
#battingPerf3d("./player2.csv","Player2")
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player3.csv","Player3")
#battingPerf3d("./player4.csv","player4")
dev.off()
## null device
## 1
21. Predicting Runs given Balls Faced and Minutes at Crease
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
#Player1 <- batsmanRunsPredict("./player1.csv","Player1",newdataframe=newDF)
#Player2 <- batsmanRunsPredict("./player2.csv","Player2",newdataframe=newDF)
#ponting <- batsmanRunsPredict("./player3.csv","Player3",newdataframe=newDF)
#sangakkara <- batsmanRunsPredict("./player4.csv","Player4",newdataframe=newDF)
#batsmen <-cbind(round(Player1$Runs),round(Player2$Runs),round(Player3$Runs),round(Player4$Runs))
#colnames(batsmen) <- c("Player1","Player2","Player3","Player4")
#newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
#colnames(newDF) <- c("BallsFaced","MinsAtCrease")
#predictedRuns <- cbind(newDF,batsmen)
#predictedRuns
Analysis of bowlers
- Bowler1
- Bowler2
- Bowler3
- Bowler4
player1 <- getPlayerData(xxxx,dir=“..”,file=“player1.csv”,type=“bowling”) Note For One day you will have to use getPlayerDataOD() and for Twenty20 it is getPlayerDataTT()
21. Wicket Frequency Plot
This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsFreqPercent("./bowler1.csv","Bowler1")
#bowlerWktsFreqPercent("./bowler2.csv","Bowler2")
#bowlerWktsFreqPercent("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
22. Wickets Runs plot
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsRunsPlot("./bowler1.csv","Bowler1")
#bowlerWktsRunsPlot("./bowler2.csv","Bowler2")
#bowlerWktsRunsPlot("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
23. Average wickets at different venues
#bowlerAvgWktsGround("./bowler3.csv","Bowler3")
24. Average wickets against different opposition
#bowlerAvgWktsOpposition("./bowler3.csv","Bowler3")
25. Wickets taken moving average
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerMovingAverage("./bowler1.csv","Bowler1")
#bowlerMovingAverage("./bowler2.csv","Bowler2")
#bowlerMovingAverage("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
26. Cumulative Wickets taken
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgWickets("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgWickets("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgWickets("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
27. Cumulative Economy rate
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgEconRate("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgEconRate("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgEconRate("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
28. Future Wickets forecast
Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfForecast("./bowler1.csv","Bowler1")
#bowlerPerfForecast("./bowler2.csv","Bowler2")
#bowlerPerfForecast("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
29. Contribution to matches won and lost
As discussed above the next 2 charts require the use of getPlayerDataSp(). This can only be done for Test matches
#bowler1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler1sp.csv",ttype="bowling")
#bowler2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler2sp.csv",ttype="bowling")
#bowler3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler3sp.csv",ttype="bowling")
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerContributionWonLost("bowler1sp","Bowler1")
#bowlerContributionWonLost("bowler2sp","Bowler2")
#bowlerContributionWonLost("bowler3sp","Bowler3")
dev.off()
## null device
## 1
30. Performance home and overseas.
This can only be done for Test matches
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfHomeAway("bowler1sp","Bowler1")
#bowlerPerfHomeAway("bowler2sp","Bowler2")
#bowlerPerfHomeAway("bowler3sp","Bowler3")
dev.off()
## null device
## 1
31 Relative Wickets Frequency Percentage
frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingPerf(frames,names)
32 Relative Economy Rate against wickets taken
frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingER(frames,names)
33 Relative cumulative average wickets of bowlers in career
frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgWickets(frames,names)
34 Relative cumulative average economy rate of bowlers
frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgEconRate(frames,names)
35 Check for bowler in-form/out-of-form
The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
Note: The check for the form status of the bowlers indicate
#checkBowlerInForm("./bowler1.csv","Bowler1")
#checkBowlerInForm("./bowler2.csv","Bowler2")
#checkBowlerInForm("./bowler3.csv","Bowler3")
dev.off()
## null device
## 1
Key Findings
Analysis of batsman
Analysis of bowlers
Also see
1. Re-introducing cricketr! : An R package to analyze performances of cricketers
2. Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket
3. Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr
4. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
5. yorkpy takes a hat-trick, bowls out Intl. T20s, BBL and Natwest T20!!!
6. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
7. Introducing cricpy:A python package to analyze performances of cricketers
The Clash of the Titans in Test and ODI cricket
Who looks outside, dreams; who looks inside, awakes.
Show me a sane man and I will cure him for you.
Carl Jung
We’re made of star stuff. We are a way for the cosmos to know itself.
If you want to make an apple pie from scratch, you must first create the universe.
Carl Sagan
Introduction
The biggest nag in the collective psyche of cricketing fraternity these days, is whether Virat Kohli has surpassed Sachin Tendulkar. This question has been troubling cricket lovers the world over and particularly in India, for quite a while. This nagging question has only grown stronger with Kohli’s 41st ODI century and with Michael Vaughan bestowing the GOAT title to Virat Kohli for ODI cricket. Hence, I decided to do my bit in addressing this, by doing analysis of Kohli’s and Tendulkar’s performance in ODI cricket. I also wanted to address the the best among the cricketing idols of India in Test cricket, namely Sunil Gavaskar, Sachin Tendulkar and Virat Kohli. Hence this post has 2 parts
- Analysis of Tendulkar, Gavaskar and Kohli in Test cricket
- Analysis of Tendulkar and Kohli in ODIs
In this post, I analyze the performances of these titans in Test and ODI cricket using my R package cricketr. While some may feel that comparisons are not possible as these batsmen are from different eras. To some extent this is true. I would give some leeway to Gavaskar as he had to bat in a pre-helmet era. But with Tendulkar and Kohli a fair and objective comparison is possible. There were pre-eminient bowlers in the times of Tendulkar as there are now.
From the analysis below, it can be seen that Tendulkar is ahead of everybody else in Test cricket. However it must be noted that Tendulkar’s performance deteriorated towards the end of his career. Such was not the case with Gavaskar. Kohli has some catching up to do and he still has a lot of Test cricket in him.
In ODI Kohli can be seen to pulling ahead of Tendulkar in several aspects.
My R package cricketr can be installed directly from CRAN and you can use it analyze cricketers.
This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.
You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use
Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!! See Cricketr adds team analytics to its repertoire!!!
Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players
Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers”
Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial
Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar
Note 1: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr templatefrom Github (which is the R Markdown file I have used for the analysis below).
Note 2: I sprinkle the charts with my observations. Feel free to look at them more closely and come to your conclusions.
If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!
Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post Introducing cricpy:A python package to analyze performances of cricketers
- This post is also available at Rpubs Clash of the Titans
- You can download this in PDF format at Clash of the Titans
- You can download this R Markdown file from Github at Clash of titans
1 Load the cricketr package
if (!require("cricketr")){
install.packages("cricketr",lib = "c:/test")
}
library(cricketr)
A Test cricket – Analysis of Gavaskar, Tendulkar and Kohli
2. Get player data
tendulkar <- getPlayerData(35320,dir=".",file="tendulkar.csv",type="batting")
kohli <- getPlayerData(253802,dir=".",file="kohli.csv",type="batting")
gavaskar <- getPlayerData(28794,dir=".",file="gavaskar.csv",type="batting")
3a. Basic analyses for Tendulkar
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()
3b Basic analyses for Kohli
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohli.csv","Kohli")
batsmanMeanStrikeRate("./kohli.csv","Kohli")
batsmanRunsRanges("./kohli.csv","Kohli")
dev.off()
3c Basic analyses for Gavaskar
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./gavaskar.csv","Gavaskar")
batsmanMeanStrikeRate("./gavaskar.csv","Gavaskar")
batsmanRunsRanges("./gavaskar.csv","Gavaskar")
dev.off()
4a.More analyses for Tendulkar
It can be seen that Tendulkar and Gavaskar has been bowled more often than Kohli. Also Kohli does not have as many sixes in Test cricket as Tendulkar and Gavaskar
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")
dev.off()
4b. More analyses for Kohli
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanDismissals("./kohli.csv","Kohli")
dev.off()
4c More analyses for Gavaskar
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gavaskar.csv","Gavaskar")
batsman6s("./gavaskar.csv","Gavaskar")
batsmanDismissals("./gavaskar.csv","Gavaskar")
dev.off()
5 Performance of batsmen on different grounds
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsGround("./kohli.csv","Kohli")
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")
a
#dev.off()
6. Performance if batsmen against different Opposition
- Tendulkar averages 50 against the following countries – Australia, Bangladesh, England, Sri Lanka, West Indies and Zimbabwe
- Kohli average almost 50 against all the nations he has played – Australia, Bangladesh, England, New Zealand, Sri Lanka and West Indies
- Gavaskar averages 50 against Australia, Pakistan, West Indies, Sri Lanka
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohli.csv","Kohli")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")
7. Get player data special
This is required for the next 2 function calls
tendulkarsp <- getPlayerDataSp(35320,tdir=".",tfile="tendulkarsp.csv",ttype="batting")
kohlisp <- getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
gavaskarsp <- getPlayerDataSp(28794,tdir=".",tfile="gavaskarsp.csv",ttype="batting")
#dev.off()
8 Get contribution of batsmen in matches won and lost
Kohli contribution has had an equal contribution in won and lost matches. Tendulkar’s runs seem to have not helped in winning as much as only 50% of matches he has played have been won
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("./kohlisp.csv","Kohli")
batsmanContributionWonLost("./gavaskarsp.csv","Gavaskar")
a
9 Performance of batsmen at home and overseas
The boxplots show that Kohli performs better overseas than at home. The 3rd quartile is higher, though the median seems to lower overseas. For Tendulkar the performance is similar both ways. Gavaskar’s median runs scored overseas is higher.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("./kohlisp.csv","Kohli")
batsmanPerfHomeAway("./gavaskarsp.csv","Gavaskar")
10. Moving average of runs
Gavaskar’s moving average was very good at the time of his retirement. Kohli seems to be going very strong. Tendulkar’s performance shows signs of deterioration around the time of his retirement.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")
#dev.off()
11 Boxplot and histogram of runs
Kohli has a marginally higher average (50.69) than Tendulkar (48.65) while Gavaskar 46. The median runs are same for Tendulkar and Kohli at 32
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfBoxHist("./kohli.csv","Kohli")
batsmanPerfBoxHist("./gavaskar.csv","Gavaskar")
12 Cumulative average Runs for batsmen
Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohli.csv","Kohli")
batsmanCumulativeAverageRuns("./gavaskar.csv","Gavaskar")
13. Cumulative average strike rate of batsmen
Tendulkar’s strike rate is better than Kohli and Gavaskar
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohli.csv","Kohli")
batsmanCumulativeStrikeRate("./gavaskar.csv","Gavaskar")
14 Performance forecast of batsmen
The forecasted performance for Kohli and Gavaskar is higher than that of Tendulkar
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kohli.csv","Kohli")
batsmanPerfForecast("./gavaskar.csv","Gavaskar")
#dev.off()
15. Relative strike rate of batsmen
par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanSR(frames,names)
#dev.off()
16. Relative Runs frequency of batsmen
par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeRunsFreqPerf(frames,names)
#dev.off()

17. Relative cumulative average runs of batsmen
Tendulkar leads the way here, but it can be seem Kohli catching up.
par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

18. Relative cumulative average strike rate
Tendulkar has better strike rate than the other two.
par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

19. Check batsman in form
As in the moving average and performance forecast and cumulative average runs, Kohli and Gavaskar are in-form while Tendulkar was out-of-form towards the end.
checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## [1] "**************************** Form status of Sachin Tendulkar ****************************
\n\n Population size: 294 Mean of population: 50.48 \n Sample size: 33 Mean of sample: 32.42 SD of
sample: 29.8 \n\n Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval
of population average\n Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below
the 95% confidence interval of population average\n\n
Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 117
Mean of population: 50.35 \n Sample size: 13 Mean of sample: 53.77 SD of sample: 46.15 \n\n Null
hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population
average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## [1] "**************************** Form status of Gavaskar ****************************\n\n
Population size: 125 Mean of population: 44.67 \n Sample size: 14 Mean of sample: 57.86 SD of sample:
58.55 \n\n Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval of population
average\n Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence interval of
population average\n\n Gavaskar 's Form Status: In-Form because the p value: 0.793276 is greater
than alpha= 0.05 \n *******************************************************************************************\n\n"
#dev.off()
20. Performance 3D
A 3D regression plane is fitted between the the Balls faced, Minutes at crease and Runs scored
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Sachin Tendulkar")
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./gavaskar.csv","Gavaskar")
#dev.off()
20. Runs likelihood
This functions computes the K-Means and determines the runs the batsmen are likely to score.
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")
## Summary of Tendulkar 's runs scoring likelihood
## **************************************************
##
## There is a 16.51 % likelihood that Tendulkar will make 139 Runs in 251 balls over 353 Minutes
## There is a 25.08 % likelihood that Tendulkar will make 66 Runs in 122 balls over 167 Minutes
## There is a 58.41 % likelihood that Tendulkar will make 16 Runs in 31 balls over 44 Minutes
batsmanRunsLikelihood("./kohli.csv","Kohli")
## Summary of Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 20 % likelihood that Kohli will make 143 Runs in 232 balls over 330 Minutes
## There is a 33.85 % likelihood that Kohli will make 51 Runs in 92 balls over 127 Minutes
## There is a 46.15 % likelihood that Kohli will make 11 Runs in 24 balls over 31 Minutes
batsmanRunsLikelihood("./gavaskar.csv","Gavaskar")
## Summary of Gavaskar 's runs scoring likelihood
## **************************************************
##
## There is a 33.81 % likelihood that Gavaskar will make 69 Runs in 159 balls over 214 Minutes
## There is a 8.63 % likelihood that Gavaskar will make 172 Runs in 364 balls over 506 Minutes
## There is a 57.55 % likelihood that Gavaskar will make 13 Runs in 35 balls over 48 Minutes
21. Predict runs for a random combination of Balls faced and runs scored
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)
batsmen <-cbind(round(tendulkar$Runs),round(kohli$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
## BallsFaced MinsAtCrease Tendulkar Kohli Gavaskar
## 1 10 30 7 6 4
## 2 38 71 23 24 17
## 3 66 111 39 42 30
## 4 94 152 54 60 43
## 5 121 193 70 78 56
## 6 149 234 86 96 69
## 7 177 274 102 114 82
## 8 205 315 118 132 95
## 9 233 356 134 150 108
## 10 261 396 150 168 121
## 11 289 437 165 186 134
## 12 316 478 181 204 147
## 13 344 519 197 222 160
## 14 372 559 213 240 173
## 15 400 600 229 258 186
#dev.off()
Key findings
- Kohli has a marginally higher average than Tendulkar
- Tendulkar has the best strike rate of all the 3.
- The cumulative average runs and the performance forecast for Kohli and Gavaskar show an improving trend, while Tendulkar’s numbers deteriorate towards the end of his career
- Kohli is fast catching up Tendulkar on cumulative average runs vs innings in career.
B ODI Cricket – Analysis of Tendulkar and Kohli
The functions below get the ODI data for Tendulkar and Kohli as CSV files so that the analyses can be done
22 Get player data for ODIs
tendulkarOD <- getPlayerDataOD(35320,dir=".",file="tendulkarOD.csv",type="batting")
kohliOD <- getPlayerDataOD(253802,dir=".",file="kohliOD.csv",type="batting")
#dev.off()
23a Basic performance of Tendulkar in ODI
par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkarOD.csv","Tendulkar")
batsmanRunsRanges("./tendulkarOD.csv","Tendulkar")
batsman4s("./tendulkarOD.csv","Tendulkar")
batsman6s("./tendulkarOD.csv","Tendulkar")
batsmanScoringRateODTT("./tendulkarOD.csv","Tendulkar")
#dev.off()
23b. Basic performance of Kohli in ODI
par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohliOD.csv","Kohli")
batsmanRunsRanges("./kohliOD.csv","Kohli")
batsman4s("./kohliOD.csv","Kohli")
batsman6s("./kohliOD.csv","Kohli")
batsmanScoringRateODTT("./kohliOD.csv","Kohli")
#dev.off()
24. Performance forecast in ODIs
Kohli’s forecasted runs are much higher than Tendulkar’s in ODIs
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkarOD.csv","Tendulkar")
batsmanPerfForecast("./kohliOD.csv","Kohli")
25. Batting performance
A 3D regression plane is fitted between Balls faced, Minutes at crease and Runs scored.
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkarOD.csv","Tendulkar")
battingPerf3d("./kohliOD.csv","Kohli")
26. Predicting runs scored for the ODI batsmen
Kohli will score runs than Tendulkar for the same minutes at crease and balls faced.
BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)
tendulkarDF <- batsmanRunsPredict("./tendulkarOD.csv","Tendulkar",newdataframe=newDF)
kohliDF <- batsmanRunsPredict("./kohliOD.csv","Kohli",newdataframe=newDF)
batsmen <-cbind(round(tendulkarDF$Runs),round(kohliDF$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
## BallsFaced MinsAtCrease Tendulkar Kohli
## 1 10 30 7 8
## 2 31 51 26 28
## 3 52 72 45 48
## 4 73 93 64 68
## 5 94 114 83 88
## 6 116 136 102 108
## 7 137 157 121 128
## 8 158 178 140 149
## 9 179 199 159 169
## 10 200 220 178 189
27. Runs likelihood for the ODI batsmen
Tendulkar has clusters around 13, 53 and 111 runs while Kohli has clusters around 13, 63,116. So it more likely that Kohli will tend to score higher
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkarOD.csv","Tendulkar")
## Summary of Tendulkar 's runs scoring likelihood
## **************************************************
##
## There is a 18.09 % likelihood that Tendulkar will make 111 Runs in 118 balls over 172 Minutes
## There is a 28.39 % likelihood that Tendulkar will make 53 Runs in 63 balls over 95 Minutes
## There is a 53.52 % likelihood that Tendulkar will make 13 Runs in 18 balls over 27 Minutes
batsmanRunsLikelihood("./kohliOD.csv","Kohli")
## Summary of Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 31.41 % likelihood that Kohli will make 63 Runs in 69 balls over 97 Minutes
## There is a 49.74 % likelihood that Kohli will make 13 Runs in 18 balls over 24 Minutes
## There is a 18.85 % likelihood that Kohli will make 116 Runs in 113 balls over 163 Minutes
28. Runs in different venues for the ODI batsmen
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsGround("./kohliOD.csv","Kohli")
28. Runs against different opposition for the ODI batsmen
Tendulkar’s has 50+ average against Bermuda, Kenya and Namibia. While Kohli has a 50+ average against New Zealand, West Indies, South Africa, Zimbabwe and Bangladesh
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohliOD.csv","Kohli")
29. Moving average of runs for the ODI batsmen
Tendulkar’s moving average shows an improvement (50+) towards the end of his career, but Kohli shows a marked increase 60+ currently
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkarOD.csv","Tendulkar")
batsmanMovingAverage("./kohliOD.csv","Kohli")
30. Cumulative average runs of ODI batsmen
Tendulkar plateaus at 40+ while Kohli’s cumulative average runs goes up and up!!!
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohliOD.csv","Kohli")
31 Cumulative strike rate of ODI batsmen
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohliOD.csv","Kohli")
32. Relative batsmen strike rate
par(mar=c(4,4,2,2))
frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanSRODTT(frames,names)
#dev.off()

33. Relative Run Frequency percentages
par(mar=c(4,4,2,2))
frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeRunsFreqPerfODTT(frames,names)
#dev.off()

34. Relative cumulative average runs of ODI batsmen
Kohli breaks away from Tendulkar in cumulative average runs after 100 innings
par(mar=c(4,4,2,2))
frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

35. Relative cumulative strike rate of ODI batsmen
This seems to be tussle with Kohli having an edge till about 40 innings and then from 40+ to 180 innings Tendulkar leads. Kohli just seems to be edging forward.
par(mar=c(4,4,2,2))
frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

36. Batsmen 4s and 6s
par(mar=c(4,4,2,2))
frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
batsman4s6s(frames,names)
## Tendulkar Kohli
## Runs(1s,2s,3s) 66.29 69.67
## 4s 29.65 25.90
## 6s 4.06 4.43
#dev.off()
37. Check ODI batsmen form
par(mar=c(4,4,2,2))
checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## [1] "**************************** Form status of Tendulkar ********
********************\n\n Population size: 294 Mean of population: 50.48 \n
Sample size: 33 Mean of sample: 32.42 SD of sample: 29.8 \n\n
Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence
interval of population average\n Alternative hypothesis
Ha : Tendulkar 's sample average is below the 95% confidence interval
of population average\n\n Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713 is less than alpha= 0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ***********
*****************\n\n Population size: 117 Mean of population: 50.35 \n
Sample size: 13 Mean of sample: 53.77 SD of sample: 46.15 \n\n
Null hypothesis H0 : Kohli 's sample average is within 95% confidence
interval of population average\n Alternative hypothesis
Ha : Kohli 's sample average is below the 95% confidence interval
of population average\n\n Kohli 's Form Status: In-Form because
the p value: 0.603244 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"
#dev.off()
Key Findings
- Kohli has a better performance against oppositions like West Indies, South Africa and New Zealand
- Kohli breaks away from Tendulkar in cumulative average runs
- Tendulkar has been leading the strike rate rate but Kohli in recent times seems to be breaking loose.
Check out some other players with my R package cricketr
Important note: Do check out my other posts using cricketr at cricketr-posts
Also see
- My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
- A primer on Qubits, Quantum gates and Quantum Operations
- De-blurring revisited with Wiener filter using OpenCV
- Deep Learning from first principles in Python, R and Octave – Part 4
- The Many Faces of Latency
- Fun simulation of a Chain in Android
- Presentation on Wireless Technologies – Part 1
- yorkr crashes the IPL party ! – Part 1
To see all posts click Index of posts
Analyzing T20 matches with yorkpy templates
1. Introduction
In this post I create yorkpy templates for end-to-end analysis of any T20 matches that are available on Cricsheet as yaml format. These templates can be used to analyze Intl. T20, IPL, BBL and Natwest T20. In fact they can be used for any T20 games which have been saved in the yaml format as specified by Cricsheet Cricheet.
Note: yorkpy is the clone of my R package yorkr see yorkr pads up for the Twenty20s: Part 1- Analyzing team”s match performance
With these templates you can convert all T20 match data which is in yaml format to Pandas dataframes and save them as CSV. Note The data for Intl T20, IPL, BBL and Natwest T20 have already been converted and are available at allYorkpyData. This templates is also available at Github at yorkpyTemplate. The template includes the following steps
- Template for conversion and setup
- Analysis of Any T20 match
- Analysis of a T20 team in all matches against another T20 team
- Analysis of a T20 team in all matches against all other teams
- Analysis of T20 batsmen and bowlers
You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future!
Install yorkpy with pip install yorkpy
Data conversion of the yaml files have to be done before any analysis of T20 batsmen, bowlers, any T20 match matches between any 2 T20 team or analysis of a teams performance against all other team can be done
The first step is To convert the YAML files that are available for the different T20 leagues namely Intl. T20, IPL, BBL, Natwest T20 which are available in yaml format in Cricsheet. For initial data setup we need to use slighly different functions for each of the T20 leagues since the teams are different. The function to convert yaml to Pandas dataframe and save as CSV is common for all leagues
A. For International T20
import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
# Save all matches between any 2 Intl T20 countries
#yka.saveAllMatchesBetween2IntlT20s(dir1)
#Save all matches between an Intl.T20 country and all other countries
#yka.saveAllMatchesAllOppositionIntlT20(dir1)
# Get batting details for a country
#yka.getTeamBattingDetails(<country>,dir=dir1, save=True)
#Get bowling details
#yka.getTeamBowlingDetails(<country>,dir=dir1, save=True)
B. For Indian Premier League (IPL)
import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
# Save all matches between any 2 IPL teams
#yka.saveAllMatchesBetween2IPLTeams(dir1)
#Save all matches between an IPL team and all other teams
#yka.saveAllMatchesAllOppositionIPLT20(dir1)
# Get batting details for an IPL team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)
#Get bowling details for an IPL team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)
C. For Big Bash League (BBL)
import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
# Save all matches between any 2 BBL teams
#yka.saveAllMatchesBetween2BBLTeams(dir1)
#Save all matches between an BBL team and all other teams
#yka.saveAllMatchesAllOppositionBBLT20(dir1)
# Get batting details for an BBL team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)
#Get bowling details for an BBL team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)
D For Natwest T20
import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
# Save all matches between any 2 NWB teams
#yka.saveAllMatchesBetween2NWBTeams(dir1)
#Save all matches between an NWB team and all other teams
#yka.saveAllMatchesAllOppositionNWBT20(dir1)
# Get batting details for an NWB team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)
#Get bowling details for an NWB team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)
Once the conversion has been done and the data has been setup we can use any of the yorkpy functions for the the 4 leagues (Intl. T20, IPL, BBL or Natwest T20) There are four classes of functions. These functions can be used for any of the
- Class 1 – Functions that analyze a single T20 match
- Class 2 – Functions that analyze the performance of a T20 team in all matches against another T20 team
- Class 3 – Functions that analyze the performance of a T20 team against all other teams
- Class 4 – Functions that analyze individual T20 batsmen or bowler
2. Class 1 functions
These functions analyze a single T20 match (Intl T20, BBL, IPL or Natwest T20) To see actual usage of Class 1 function see Pitching yorkpy … short of good length to IPL – Part 1
import yorkpy.analytics as yka
# Get scorecard
#scorecard,extras=yka.teamBattingScorecardMatch(<team1>,"Name of Team")
#Get partnership
#match=pd.read_csv("<match.csv>")
#yka.teamBatsmenPartnershipMatch(match,<team1>,<team2>,plot=True/False)
#Batsmen vs bowler
#match=pd.read_csv("<match.csv>")
#yka.teamBatsmenVsBowlersMatch(match,<team1>,<team2>,plot=True/False)
#Bowling scorecard
#match=pd.read_csv("<match.csv>")
#a=yka.teamBowlingScorecardMatch(match,<team1>)
#Wicket Kind
#match=pd.read_csv("<match.csv>")
#yka.teamBowlingWicketKindMatch((match,<team1>,<team2>)
#Wicket Match
#match=pd.read_csv("<match.csv>")
#yka.teamBowlingWicketMatch(match,<team1>,<team2>,plot=True/False)
#Bowler vs Batsman
#match=pd.read_csv("<match.csv>")
#yka.teamBowlersVsBatsmenMatch(match,<team1>,<team2>)
#Match worm chart
#match=pd.read_csv("<match.csv>")
#yka.matchWormChart(match,<team1>,<team2>,)
3. Class 2 functions
These set of functions analyze the performance a T20 team for e.g. Intl T20, BBL or Natwest T20 in all matches against another T20 team (country or IPL, BBL or Natwest T20 team. To see usages of Class 2 functions see Pitching yorkpy…on the middle and outside off-stump to IPL – Part 2
import yorkpy.analytics as yka
# Batting partnerships - Table
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#m=yka.teamBatsmenPartnershiOppnAllMatches(team1_team2_matches,<team1/team2>,report="summary/detailed", top=<n>)
# Batting partnerships - Plot
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBatsmenPartnershipOppnAllMatchesChart(team1_team2_matches,<team1>,<team2> plot=<True/False>, top=<N>, partnershipRuns=<M>)
#Batsmen vs Bowlers
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBatsmenVsBowlersOppnAllMatches(team1_team2_matches,<team1>,<team2> plot=<True/False>, top=<N>,runsScored=<M>)
# Batting scorecard
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#scorecard=yka.teamBattingScorecardOppnAllMatches(team1_team2_matches,<team1>,<team2>)
#Bowling scorecard
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#scorecard=yka.teamBowlingScorecardOppnAllMatches(team1_team2_matches,<team1>,<team2>)
#Bowling wicket kind
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBowlingWicketKindOppositionAllMatches(team1_team2_matches,<team1>,<team2>,plot=<True/False>,top=<N>,wickets=<M>)
#Bowler vs batsman
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBowlersVsBatsmenOppnAllMatches(team1_team2_matches,<team1>,<team2>,plot=<True/False>,top=<N>,runsConceded=<M>)
# Wins vs losses
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinLossBetweenTeams(team1_team2_matches,<team1>,<team2>)
#Wins by win type
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinsByRunOrWickets(team1_team2_matches,<team1>)
#Wins by toss decision
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinsbyTossDecision(team1_team2_matches,<team1>,tossDecision=<field/bat>)
4. Class 3 functions
This set of functions deals with analyzing the performance of a T20 team (Intl. T20, IPL, BBL or Natwest T20) in all matches against all other teams. To see usages of Class 3 functions see Pitching yorkpy…swinging away from the leg stump to IPL – Part 3. After the data is save all matches between all oppositions we can use this data
import yorkpy.analytics as yka
#Batsman partnerships
#allmatches = pd.read_csv("<allmatchesForteam")
#m=yka.teamBatsmenPartnershiAllOppnAllMatches(allmatches,<team1>,report=<"summary"/"detailed", top=<N>,partnershipRuns=<M>)
#Batsmen vs Bowlers
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBatsmenVsBowlersAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=N>,runsScored=<M>)
#Batting scorecard
#allmatches = pd.read_csv("<allmatchesForteam")
#scorecard=yka.teamBattingScorecardAllOppnAllMatches(allmatches,<team1>)
#Bowling scorecard
#allmatches = pd.read_csv("<allmatchesForteam")
#scorecard=yka.teamBowlingScorecardAllOppnAllMatches(allmatches,<team1>)
#Bowling wicket kind
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBowlingWicketKindAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=<N>,wickets=<M>)
# Bowler vs Batsmen
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBowlersVsBatsmenAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=<N>,runsConceded=<M>)
# Wins vs losses
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinLossByTeamAllOpposition(allmatches,<team1>,plot=<"summary"/"detailed">)
# Wins by win type
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinsByRunOrWicketsAllOpposition(allmatches,<team1>)
# Wins by toss decision
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinsbyTossDecisionAllOpposition(allmatches,<team1>,tossDecision='bat'/'field',plot='summary'/'detailed')
5. Class 4 functions
This set of functions are used for analyzing individual batsman/bowler. From the converted xxx-BattingDetails.csv and xxx-BowlingDetails.csv we can get the batsman and bowler details as shown below. Subsequenly we can perform analyses of the individual batsman and bowler. To see actual usages of Class 4 functions see Pitching yorkpy … in the block hole – Part 4
import yorkpy.analytics as yka
#Batsman analyses
#Get batsman Dataframe
#batsmanDF=yka.getBatsmanDetails(<team1>,<batsman>,dir=dir1)
#Batsman Runs vs Deliveries
#yka.batsmanRunsVsDeliveries(batsmanDF,<batsmanName>)
#Batsman fours and sixes
#yka.batsmanFoursSixes(batsmanDF,<batsmanName>)
#Batsman dismissals
#yka.batsmanDismissals(batsmanDF,<batsmanName>)
#Batsman Runs vs Strike Rate
#yka.batsmanRunsVsStrikeRate(batsmanDF,<batsmanName>)
#Batsman Moving average
#yka.batsmanMovingAverage(batsmanDF,<batsmanName>)
#Batsman Cumulative average
#yka.batsmanCumulativeAverageRuns(batsmanDF,<batsmanName>)
#Batsman Cumulative Strike rate
#yka.batsmanCumulativeStrikeRate(batsmanDF,<batsmanName>)
#Batsman Runs against opposition
#yka.batsmanRunsAgainstOpposition(batsmanDF,<batsmanName>)
#Batsman Runs against opposition
#yka.batsmanRunsVenue(batsmanDF,<batsmanName>)
#Bowler analyses
#Get bowler dataframe
#bowlerDF=yka.getBowlerWicketDetails(<team1>,<bowler>dir=dir1)
#Mean economy rate
#yka.bowlerMeanEconomyRate(bowlerDF,<bowlerName>)
#Mean Economy rate
#yka.bowlerMeanEconomyRate(bowlerDF,<bowlerName>)
#Mean Runs conceded
#yka.bowlerMeanRunsConceded(bowlerDF,<bowlerName>)
#Moving average of wickets
#yka.bowlerMovingAverage((bowlerDF,<bowlerName>)
# Cumulative average of wickets
#yka.bowlerCumulativeAvgWickets(bowlerDF,<bowlerName>)
# Cumulative economy rate
#yka.bowlerCumulativeAvgEconRate(bowlerDF,<bowlerName>)
# Wicket plot
#yka.bowlerWicketPlot(df,name)
# Wicket against opposition
#yka.bowlerWicketsAgainstOpposition(bowlerDF,<bowlerName>)
# Wickets at venue
#yka.bowlerWicketsVenue(bowlerDF,<bowlerName>)
Important note: Do check out my other posts using yorkpy at yorkpy-posts
Conclusion
With the above templates detailed analyis can be done on
- A T20 match
- Performance of a team in all matches against another team
- Performance of a team in all matches against all other teams
- Individual batting and bowling performances
See also
- Deep Learning from first principles in Python, R and Octave – Part 5
- My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI)
- Practical Machine Learning with R and Python – Part 4
- Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
- A method to crowd source pothole marking on (Indian) roads
To see all posts click Index of posts
yorkpy takes a hat-trick, bowls out Intl. T20s, BBL and Natwest T20!!!
“Dear, dear! How queer everything is to-day! And yesterday things went on just as usual. I wonder if I’ve been changed in the night? Let me think: was I the same when I got up this morning? I almost think I can remember feeling a little different. But if I’m not the same, the next question is ’Who in the world am I? Ah, that’s the great puzzle!”
Alice's adventures in Wonderland, Lewis Carroll
1. Introduction
In this post, yorkpy clean bowls the following T20 formats namely International T20s, Big Bash League and Natwest T20 Blast. I take yorkpy on a spin through these T20 leagues. In the post below,I choose a random set of about 10-12 of the overall 63 functions that yorkpy has, and execute them for each of the different T20 leagues – Intl T20s, BBL and Natwest T20s. yorkpy, is the python avatar of my R package yorkr, see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!
There were a couple of new functions that needed to be added for each of the T20 leagues – Intl T20, BBL and Natwest T20 to take into account the different teams in each of these leagues. Further some bugs were also ironed out in tje latest version of yorkpy. yorkpy uses data from Cricsheet . The match data is in the form of YAML files. yorkpy converts these YAML files to dataframes. YAML files are very detailed and include a ball-by-ball account of the match.
– You can clone/fork the latest code for yorkpy from github yorkpy
– This post has also been published in RPubs at yorkpy takes a hat-trick
– You can download the PDF version of this post at yorkpy takes a hat-trick
The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data)
yorkpy has the following 4 main classes of functions
A.Functions analyzing individual T20 match (Class 1)
This was demonstrated in Pitching yorkpy . short of good length to IPL – Part 1 The functions deal with individual T20 matches. The functions are
- convertYaml2PandasDataframeT20()
- convertAllYaml2PandasDataframesT20()
- teamBattingScorecardMatch()
- teamBatsmenPartnershipMatch()
- teamBatsmenVsBowlersMatch()
- teamBowlingScorecardMatch()
- teamBowlingWicketKindMatch()
- teamBowlingWicketRunsMatch()
- teamBowlingWicketMatch()
- teamBowlersVsBatsmenMatch()
- matchWormChart()
B. Functions that analyze all matches between 2 T20 teams (Class 2
Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 included functions that analyze head-to-head confrontation between any 2 T20 teams The functions are
- getAllMatchesBetweenTeams()
- saveAllMatchesBetween2IPLTeams()
- getAllMatchesBetweenTeams()
- saveAllMatchesBetween2IPLTeams()
- teamBatsmenPartnershiOppnAllMatches()
- teamBatsmenPartnershipOppnAllMatchesChart()
- teamBatsmenVsBowlersOppnAllMatches()
- teamBattingScorecardOppnAllMatches()
- teamBowlingScorecardOppnAllMatches()
- teamBowlingWicketKindOppositionAllMatches()
- teamBowlersVsBatsmenOppnAllMatches()
- plotWinLossBetweenTeams()
- plotWinsByRunOrWickets() 23.plotWinsbyTossDecision()
C. Functions that analyze the performance of a T20 team against all other teams (Class 3)
The post Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 is based on Class C set of functions shown below
- getAllMatchesAllOpposition()
- saveAllMatchesAllOppositionIPLT20(dir1)
- getAllMatchesAllOpposition()
- saveAllMatchesAllOppositionIPLT20()
- teamBatsmenPartnershiAllOppnAllMatches()
- teamBatsmenPartnershipAllOppnAllMatchesChart()
- teamBatsmenVsBowlersAllOppnAllMatches()
- teamBattingScorecardAllOppnAllMatches()
- teamBowlingScorecardAllOppnAllMatches()
- teamBowlingWicketKindAllOppnAllMatches()
- teamBowlersVsBatsmenAllOppnAllMatches()
- plotWinLossByTeamAllOpposition()
- plotWinsByRunOrWicketsAllOpposition()
- plotWinsbyTossDecisionAllOpposition()
D. Functions that analyze performances of T20 batsmen and bowlers (Class 4)
These set of functions analyze individual batsmen and bowlers and have been used in Pitching yorkpy . in the block hole – Part 4 The functions are
- getTeamBattingDetails()
- getBatsmanDetails()
- batsmanRunsVsDeliveries()
- batsmanFoursSixes()
- batsmanDismissals()
- batsmanRunsVsStrikeRate()
- batsmanMovingAverage()
- batsmanCumulativeAverageRuns()
- batsmanCumulativeStrikeRate()
- batsmanRunsAgainstOpposition()
- batsmanRunsVenue
- getTeamBowlingDetails()
- getBowlerWicketDetails()
- bowlerMeanEconomyRate()
- bowlerMeanRunsConceded()
- bowlerMovingAverage()
- bowlerCumulativeAvgWickets()
- bowlerCumulativeAvgEconRate()
- bowlerWicketPlot()
- bowlerWicketsAgainstOpposition()
- bowlerWicketsVenue()
Additional new functions were added to handle Intl T20s, Big Bash League and Natwest T20 Blast, since the teams are different. They are
59. saveAllMatchesBetween2IntlT20s()
60. saveAllMatchesAllOppositionIntlT20()
61. saveAllMatchesBetween2BBLTeams()
62 saveAllMatchesAllOppositionBBLT20()
63. saveAllMatchesBetween2NWBTeams()
64. saveAllMatchesAllOppositionNWBT20()
All other functions can be used as is! You can get the help of any function in yorkpy using
import yorkpy.analytics as yka
help(yka.teamBatsmenPartnershiOppnAllMatches)
## Help on function teamBatsmenPartnershiOppnAllMatches in module yorkpy.analytics:
##
## teamBatsmenPartnershiOppnAllMatches(matches, theTeam, report='summary', top=5)
## Team batting partnership against a opposition all IPL matches
##
## Description
##
## This function computes the performance of batsmen against all bowlers of an oppositions in
## all matches. This function returns a dataframe
##
## Usage
##
## teamBatsmenPartnershiOppnAllMatches(matches,theTeam,report="summary")
## Arguments
##
## matches
## All the matches of the team against the oppositions
## theTeam
## The team for which the the batting partnerships are sought
## report
## If the report="summary" then the list of top batsmen with the highest partnerships
## is displayed. If report="detailed" then the detailed break up of partnership is returned
## as a dataframe
## top
## The number of players to be displayed from the top
## Value
##
## partnerships The data frame of the partnerships
##
## Note
##
## Maintainer: Tinniam V Ganesh tvganesh.85@gmail.com
##
## Author(s)
##
## Tinniam V Ganesh
##
## References
##
## http://cricsheet.org/
## https://gigadom.wordpress.com/
##
##
## See Also
##
## teamBatsmenVsBowlersOppnAllMatchesPlot
## teamBatsmenPartnershipOppnAllMatchesChart
As I mentioned above I will be randomly choosing a set of 12 functions from Class 1,2,3,4 for each of the T20 leagues (Intl T20, BBL and NWB T20) for analysis
2. International T20s
The following functions were added for handling Intl. T20s
- saveAllMatchesBetween2IntlT20s()
- saveAllMatchesAllOppositionIntlT20()
To handle the countries in Intl. T20s below
Afghanistan, Australia, Bangladesh, Bermuda, Canada, England,Hong Kong,India, Ireland, Kenya, Nepal, Netherlands, “New Zealand, Oman,Pakistan,Scotland,South Africa, Sri Lanka, United Arab Emirates,West Indies, Zimbabwe
import os
#os.chdir('C:\\software\\cricket-package\\yorkpyT20\\t20s')
#import yorkpy.analytics as yka
#1. Convert all YAML files to dataframes and CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches'
#2. Save all matches between 2 T20 teams
#yka.saveAllMatchesBetween2IntlT20s(dir1)
#3. Save all matches between a T20 team and all other teams
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches'
#yka.saveAllMatchesAllOppositionIntlT20(dir1)
#4. Get batting details
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches
#yka.getTeamBattingDetails("Afghanistan",dir=dir1, save=True)
#yka.getTeamBattingDetails("Australia",dir=dir1,save=True)
#yka.getTeamBattingDetails("Bangladesh",dir=dir1,save=True)
#...
#5. Get bowling details
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches
#yka.getTeamBowlingDetails("Afghanistan",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Australia",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Bangladesh",dir=dir1,save=True)
# ...
Once the data is converted you can use the yorkpy functions. The data has been converted for Intl T20 and is available at Github at IntlT20
To use the yorkpy functions for a new league we need to initial convert the YAML files into appropriate format for processing by yorkpy functions
This will create the necessary files which are are used in the functions below
2.2 2.1 Intl. T20 – Team score card (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\India-New Zealand-2007-09-16.csv")
ind_nz=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(ind_nz,"India")
print(scorecard)
## batsman runs balls 4s 6s SR
## 0 G Gambhir 51 34 5 2 150.000000
## 1 V Sehwag 40 18 6 2 222.222222
## 2 RV Uthappa 0 2 0 0 0.000000
## 3 MS Dhoni 24 20 2 0 120.000000
## 4 Yuvraj Singh 5 7 0 0 71.428571
## 5 KD Karthik 17 12 3 0 141.666667
## 6 IK Pathan 11 10 2 0 110.000000
## 7 AB Agarkar 1 2 0 0 50.000000
## 8 Harbhajan Singh 7 6 1 0 116.666667
## 9 S Sreesanth 19 10 4 0 190.000000
## 10 RP Singh 1 1 0 0 100.000000
print(extras)
## total wides noballs legbyes byes penalty extras
## 0 370 6 0 8 0 0 14
2.2 Intl. T20 -Team batsmen partnership (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\South Africa-Australia-2009-03-27.csv")
sa_aus=pd.read_csv(path)
yka.teamBatsmenPartnershipMatch(sa_aus,'Australia','New Zealand',plot=True)
2.3 Intl. T20 -Team bowling scorecard match (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\Sri Lanka-West Indies-2012-09-28.csv")
sl_wi=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(sl_wi,'Sri Lanka')
print(a)
## bowler overs runs maidens wicket econrate
## 0 A Mohammed 2 13 0 0 6.5
## 1 SA Campbelle 1 8 0 1 8.0
## 2 SC Selman 1 3 0 0 3.0
## 3 SF Daley 2 5 0 1 2.5
## 4 SR Taylor 2 4 0 1 2.0
## 5 TD Smartt 2 17 0 0 8.5
2.4 Intl. T20 -Match Worm chart (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\England-India-2012-09-29.csv")
eng_ind=pd.read_csv(path)
yka.matchWormChart(eng_ind,"England", "India")
path=os.path.join(dir1,".\\Bangladesh-Ireland-2015-12-05.csv")
ban_ire=pd.read_csv(path)
yka.matchWormChart(ban_ire,"Bangladesh", "Ireland")
2.5 Intl. T20 -Team Batting partnerships all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"India-England-allMatches.csv")
dc_mi_matches = pd.read_csv(path)
theTeam='India'
m=yka.teamBatsmenPartnershiOppnAllMatches(dc_mi_matches,theTeam,report="detailed", top=4)
print(m)
## batsman totalPartnershipRuns non_striker partnershipRuns
## 0 SK Raina 265 G Gambhir 2
## 1 SK Raina 265 KL Rahul 40
## 2 SK Raina 265 MK Tiwary 24
## 3 SK Raina 265 MS Dhoni 124
## 4 SK Raina 265 P Kumar 0
## 5 SK Raina 265 PP Chawla 4
## 6 SK Raina 265 R Ashwin 1
## 7 SK Raina 265 RG Sharma 16
## 8 SK Raina 265 V Kohli 47
## 9 SK Raina 265 Yuvraj Singh 7
## 10 MS Dhoni 264 A Mishra 1
## 11 MS Dhoni 264 AT Rayudu 18
## 12 MS Dhoni 264 HH Pandya 8
## 13 MS Dhoni 264 IK Pathan 2
## 14 MS Dhoni 264 JJ Bumrah 2
## 15 MS Dhoni 264 MK Pandey 3
## 16 MS Dhoni 264 Parvez Rasool 21
## 17 MS Dhoni 264 R Ashwin 11
## 18 MS Dhoni 264 RA Jadeja 11
## 19 MS Dhoni 264 RG Sharma 9
## 20 MS Dhoni 264 RR Pant 6
## 21 MS Dhoni 264 RV Uthappa 5
## 22 MS Dhoni 264 SK Raina 98
## 23 MS Dhoni 264 YK Pathan 36
## 24 MS Dhoni 264 Yuvraj Singh 33
## 25 V Kohli 236 AM Rahane 3
## 26 V Kohli 236 G Gambhir 78
## 27 V Kohli 236 KL Rahul 46
## 28 V Kohli 236 RG Sharma 2
## 29 V Kohli 236 RV Uthappa 4
## 30 V Kohli 236 S Dhawan 45
## 31 V Kohli 236 SK Raina 48
## 32 V Kohli 236 Yuvraj Singh 10
## 33 M Raj 176 A Sharma 2
## 34 M Raj 176 H Kaur 18
## 35 M Raj 176 J Goswami 6
## 36 M Raj 176 KV Jain 5
## 37 M Raj 176 L Kumari 5
## 38 M Raj 176 N Niranjana 3
## 39 M Raj 176 N Tanwar 17
## 40 M Raj 176 PG Raut 41
## 41 M Raj 176 R Malhotra 5
## 42 M Raj 176 S Mandhana 8
## 43 M Raj 176 S Naik 10
## 44 M Raj 176 S Pandey 19
## 45 M Raj 176 SK Naidu 37
2.6 Intl. T20 -Team Batsmen vs Bowlers all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Ireland-Netherlands-allMatches.csv")
ire_nl_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersOppnAllMatches(ire_nl_matches,'Ireland',"Netherlands",plot=True,top=3,runsScored=10)
2.7 Intl. T20 -Team Bowling scorecard all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Bangladesh-Nepal-allMatches.csv")
bang_nep_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardOppnAllMatches(bang_nep_matches,'Bangladesh',"Nepal")
print(scorecard)
## bowler overs runs maidens wicket econrate
## 0 B Regmi 3 14 0 1 4.666667
## 3 SP Gauchan 4 40 0 1 10.000000
## 1 JK Mukhiya 2 16 0 0 8.000000
## 2 P Khadka 3 23 0 0 7.666667
## 4 Sagar Pun 1 16 0 0 16.000000
## 5 Sompal Kami 2 21 0 0 10.500000
2.8 Intl. T20 -Team Batsmen vs Bowlers all Oppositions (Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\"
path=os.path.join(dir1,"Australia-allMatchesAllOpposition.csv")
aus_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersAllOppnAllMatches(aus_matches,"Australia",plot=True,top=3,runsScored=40)
2.9 Intl. T20 -Wins vs Losses of a team against all other teams (Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\"
path=os.path.join(dir1,"South Africa-allMatchesAllOpposition.csv")
sa_matches = pd.read_csv(path)
team1='South Africa'
yka.plotWinLossByTeamAllOpposition(sa_matches,team1,plot="detailed")
2.10 Intl. T20 -Batsmen analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\"
# Rohit Sharma
name="RG Sharma"
team='India'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)
# MJ Guptill
name="MJ Guptill"
team='New Zealand'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)
2.11 Intl. T20 -Bowler analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\"
# Shakib Al Hasan
name="Shakib Al Hasan"
team='Bangladesh'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)
# Rashid Khan
name="SL Malinga"
team='Sri Lanka'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)
3. Big Bash League
The following functions for added to handle BBL teams
- saveAllMatchesBetween2BBLTeams()
- saveAllMatchesAllOppositionBBLT20
The BBL teams are included are Adelaide Strikers, Brisbane Heat, Hobart Hurricanes, Melbourne Renegades, Perth Scorchers, Sydney Sixers, Sydney Thunder
To use the yorkpy functions first the YAML files have to be converted into pandas dataframe and then saved as CSV as shown below
import os
import yorkpy.analytics as yka
os.chdir('C:\\software\\cricket-package\\yorkpyBBL\\bbl')
#1. Convert all YAML files to dataframes and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\BBLT20-Matches")
#2. Save all matches between 2 BBL teams
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.saveAllMatchesBetween2BBLTeams(dir1)
#3. Save T20 matches between a BBL team and all other teams
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.saveAllMatchesAllOppositionBBLT20(dir1)
#4. Get the batting details
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.getTeamBattingDetails("Adelaide Strikers",dir=dir1, save=True)
#yka.getTeamBattingDetails("Brisbane Heat",dir=dir1,save=True)
#yka.getTeamBattingDetails("Hobart Hurricanes",dir=dir1,save=True)
#...
# Get the bowling details
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.getTeamBowlingDetails("Adelaide Strikers",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Brisbane Heat",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Hobart Hurricanes",dir=dir1,save=True)
#...
The functions below perform analysis on the generated files from above. The YAML files have already been converted and are available at Github at BBL
3.1 Big Bash League – Team score card (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Adelaide Strikers-Brisbane Heat-2012-12-13.csv")
as_bh=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(as_bh,"Brisbane Heat")
print(scorecard)
## batsman runs balls 4s 6s SR
## 0 LA Pomersbach 65 42 8 2 154.761905
## 1 JR Hopes 1 2 0 0 50.000000
## 2 JA Burns 37 31 2 2 119.354839
## 3 DT Christian 12 15 0 0 80.000000
## 4 NLTC Perera 12 4 0 2 300.000000
## 5 CA Lynn 19 18 1 1 105.555556
## 6 BCJ Cutting 13 5 0 2 260.000000
## 7 PJ Forrest 12 8 0 1 150.000000
## 8 CD Hartley 5 2 1 0 250.000000
print(extras)
## total wides noballs legbyes byes penalty extras
## 0 371 10 2 5 0 0 17
3.2 Big Bash League -Team batsmen vs Bowlers (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Hobart Hurricanes-Melbourne Renegades-2012-01-18.csv")
hh_mr=pd.read_csv(path)
yka.teamBatsmenVsBowlersMatch(hh_mr,'Hobart Hurricanes','Melbourne Renegades',plot=True)
3.3 Big Bash League -Team bowling scorecard match (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Melbourne Stars-Sydney Thunder-2016-01-24.csv")
ms_st=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(ms_st,'Sydney Thunder')
print(a)
## bowler overs runs maidens wicket econrate
## 0 A Zampa 4 32 0 2 8.000000
## 1 BW Hilfenhaus 2 21 0 0 10.500000
## 2 DJ Hussey 1 9 0 1 9.000000
## 3 DJ Worrall 3 42 0 0 14.000000
## 4 EP Gulbis 2 19 0 0 9.500000
## 5 MA Beer 3 25 0 1 8.333333
## 6 MP Stoinis 4 30 0 3 7.500000
3.4 Big Bash League – Match Worm chart (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Sydney Sixers-Melbourne Stars-2011-12-27.csv")
ss_ms=pd.read_csv(path)
yka.matchWormChart(ss_ms,"Melbourne Stars", "Sydney Sixers")
path=os.path.join(dir1,".\\Hobart Hurricanes-Brisbane Heat-2015-01-02.csv")
hh_bh=pd.read_csv(path)
yka.matchWormChart(hh_bh,"Hobart Hurricanes", "Brisbane Heat")
3.5 Big Bash League -Team Batting partnerships all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Brisbane Heat-Adelaide Strikers-allMatches.csv")
bh_as_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipOppnAllMatchesChart(bh_as_matches,"Brisbane Heat","Adelaide Strikers",plot=True, top=4, partnershipRuns=20)
3.6 Big Bash League -Team Bowling wicket kind all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Sydney Sixers-Perth Scorchers-allMatches.csv")
ss_ps_matches = pd.read_csv(path)
yka.teamBowlingWicketKindOppositionAllMatches(ss_ps_matches,'Perth Scorchers','Sydney Sixers',plot=True,top=5,wickets=1)
3.7 Big Bash League -Team Bowling scorecard all teams (Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Hobart Hurricanes-allMatchesAllOpposition.csv")
hh_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(hh_matches,"Hobart Hurricanes")
print(scorecard)
## bowler overs runs maidens wicket econrate
## 16 B Lee 20 132 0 9 6.600000
## 30 CJ McKay 13 110 0 9 8.461538
## 88 NJ Rimmington 16 103 1 9 6.437500
## 67 JW Hastings 15 88 0 8 5.866667
## 63 JP Faulkner 15 146 0 7 9.733333
## 27 CJ Gannon 17 147 1 7 8.647059
## 93 NM Lyon 8 51 0 7 6.375000
## 20 BCJ Cutting 27 226 0 7 8.370370
## 48 GB Hogg 22 167 0 7 7.590909
## 107 SM Boland 12 96 0 7 8.000000
## 15 B Laughlin 13 99 0 7 7.615385
## 87 MT Steketee 15 134 0 5 8.933333
## 121 Yasir Arafat 9 48 0 4 5.333333
## 96 PJ Cummins 8 83 0 4 10.375000
## 46 Fawad Ahmed 11 64 0 4 5.818182
## 76 MA Beer 12 63 0 4 5.250000
## 108 SNJ O'Keefe 15 104 0 4 6.933333
## 75 M Muralitharan 7 31 0 4 4.428571
## 10 AJ Tye 16 127 0 4 7.937500
## 52 J Botha 13 94 0 4 7.230769
## 56 JL Pattinson 7 71 0 4 10.142857
## 62 JP Behrendorff 16 119 0 4 7.437500
## 3 AC Agar 12 87 0 4 7.250000
## 24 BM Edmondson 4 40 0 4 10.000000
## 37 DJ Hussey 8 47 0 3 5.875000
## 49 GJ Maxwell 8 65 0 3 8.125000
## 84 MN Samuels 4 22 0 3 5.500000
## 81 MG Neser 5 54 0 3 10.800000
## 44 DT Christian 9 114 0 3 12.666667
## 50 GS Sandhu 7 51 0 3 7.285714
## .. ... ... ... ... ... ...
## 43 DP Nannes 8 58 0 1 7.250000
## 51 IA Moran 4 25 0 1 6.250000
## 55 JK Lalor 10 82 0 1 8.200000
## 54 JH Kallis 3 18 0 1 6.000000
## 73 LR Butterworth 4 25 0 1 6.250000
## 4 AC McDermott 2 28 0 1 14.000000
## 70 LA Doran 4 38 0 1 9.500000
## 69 KW Richardson 6 44 0 1 7.333333
## 119 WD Sheridan 2 6 0 0 3.000000
## 2 AB McDonald 1 15 0 0 15.000000
## 115 TD Andrews 3 23 0 0 7.666667
## 11 AK Heal 4 33 0 0 8.250000
## 7 AD Russell 4 40 0 0 10.000000
## 8 AJ Finch 2 15 0 0 7.500000
## 9 AJ Turner 3 28 0 0 9.333333
## 60 JM Mennie 1 20 0 0 20.000000
## 18 BA Stokes 1 9 0 0 9.000000
## 26 CH Gayle 1 16 0 0 16.000000
## 28 CJ Green 4 44 0 0 11.000000
## 95 PD Collingwood 2 20 0 0 10.000000
## 31 CJ Simmons 4 21 0 0 5.250000
## 59 JM Holland 3 34 0 0 11.333333
## 36 DJ Bravo 6 64 0 0 10.666667
## 38 DJ Pattinson 2 16 0 0 8.000000
## 41 DJ Worrall 8 90 0 0 11.250000
## 72 LN O'Connor 6 56 0 0 9.333333
## 71 LJ Wright 3 27 0 0 9.000000
## 68 KA Pollard 1 7 0 0 7.000000
## 58 JM Herrick 4 23 0 0 5.750000
## 92 NM Hauritz 5 42 0 0 8.400000
##
## [122 rows x 6 columns]
3.8 Big Bash League -Plot wins vs losses against all teams(Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Sydney Sixers-allMatchesAllOpposition.csv")
ss_matches = pd.read_csv(path)
yka.plotWinLossByTeamAllOpposition(ss_matches,'Sydney Sixers')
3.9 Big Bash League -Wins vs losses by toss decision (Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Adelaide Strikers-allMatchesAllOpposition.csv")
as_matches = pd.read_csv(path)
yka.plotWinsByRunOrWicketsAllOpposition(as_matches,'Adelaide Strikers')
3.10 Big Bash League -Batsmen Analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails"
# CA Lynn
name="CA Lynn"
team='Brisbane Heat'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)
# UT Khawaja
name="UT Khawaja"
team='Sydney Thunder'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)
3.11Big Bash League – Bowler analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails"
# CJ McKay
name="CJ McKay"
team='Sydney Thunder'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)
# AU Rashid
name="AU Rashid"
team='Adelaide Strikers'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)
4. Natwest T20 Blast
The following functions for added to handle Natwest T20 teams
- saveAllMatchesBetween2NWBTeams()
- saveAllMatchesAllOppositionNWBT20
The Natwest teams are
Derbyshire, Durham, Essex, Glamorgan, Gloucestershire, Hampshire, Kent,Lancashire, Leicestershire, Middlesex,Northamptonshire, Nottinghamshire, Somerset, Surrey, Sussex, Warwickshire, Worcestershire,Yorkshire
In order to perform analysis with yorkpy, the YAML data has to be converted to pandas dataframe and saves as CSV as shown
#import os
#import yorkpy.analytics as yka
#os.chdir('C:\\software\\cricket-package\\yorkpyNWB\\nwb')
#1. Convert YAML to dataframes and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\NWBT20-Matches")
#2. Save all matches between 2 NWBT20 teams
#dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.saveAllMatchesBetween2NWBTeams(dir1)
#3. Save all matches between a NWB T20 team and all other teams
#dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.saveAllMatchesAllOppositionNWBT20(dir1)
#4. Compute the batting details
dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.getTeamBattingDetails("Derbyshire",dir=dir1, save=True)
#yka.getTeamBattingDetails("Durham",dir=dir1,save=True)
#yka.getTeamBattingDetails("Essex",dir=dir1,save=True)
#..
#5. Compute bowling details
dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.getTeamBowlingDetails("Derbyshire",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Durham",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Essex",dir=dir1,save=True)
#...
Once the data is converted all yorkpy functions can be used. This has already been done and is available at github NWB
4.1 Natwest T20 Blast – Team score card (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Durham-Yorkshire-2016-08-20.csv")
d_y=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(d_y,"Durham")
print(scorecard)
## batsman runs balls 4s 6s SR
## 0 MD Stoneman 25 20 4 0 125.000000
## 1 KK Jennings 11 13 1 0 84.615385
## 2 BA Stokes 56 37 4 3 151.351351
## 3 MJ Richardson 29 23 4 1 126.086957
## 4 JTA Burnham 17 15 1 1 113.333333
## 5 RD Pringle 10 9 1 0 111.111111
## 6 PD Collingwood 2 3 0 0 66.666667
## 7 U Arshad 1 1 0 0 100.000000
print(extras)
## total wides noballs legbyes byes penalty extras
## 0 305 2 0 5 0 0 7
4.2 Natwest T20 Blast -Team batsmen vs Bowlers (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Derbyshire-Lancashire-2016-07-13.csv")
d_l=pd.read_csv(path)
yka.teamBatsmenVsBowlersMatch(d_l,'Lancashire','Derbyshire',plot=True)
4.3 Natwest T20 Blast -Team bowling scorecard match (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Essex-Surrey-2016-05-20.csv")
e_s=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(e_s,'Essex')
print(a)
## bowler overs runs maidens wicket econrate
## 0 Azhar Mahmood 3 38 0 4 12.666667
## 1 GJ Batty 4 33 0 1 8.250000
## 2 JE Burke 1 18 0 0 18.000000
## 3 MW Pillans 3 28 0 0 9.333333
## 4 SM Curran 4 23 0 2 5.750000
## 5 TK Curran 4 21 0 3 5.250000
4.4 Natwest T20 Blast -Match Worm chart (Class 1)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Gloucestershire-Glamorgan-2016-06-10.csv")
ss_ms=pd.read_csv(path)
yka.matchWormChart(ss_ms,"Gloucestershire", "Glamorgan")
path=os.path.join(dir1,".\\Leicestershire-Northamptonshire-2016-05-20.csv")
hh_bh=pd.read_csv(path)
yka.matchWormChart(hh_bh,"Northamptonshire", "Leicestershire")
4.5 Natwest T20 Blast -Team Batting partnerships all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Hampshire-Sussex-allMatches.csv")
h_s_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipOppnAllMatchesChart(h_s_matches,"Hampshire","Sussex",plot=True, top=4, partnershipRuns=10)
4.6 Natwest T20 Blast -Team Bowling wicket kind all matches 2 teams (Class 2)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Kent-Somerset-allMatches.csv")
k_s_matches = pd.read_csv(path)
yka.teamBowlersVsBatsmenOppnAllMatches(k_s_matches,'Kent','Somerset',plot=True,
top=5,runsConceded=10)
4.7 Natwest T20 Blast -Team Bowling scorecard all teams (Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Middlesex-allMatchesAllOpposition.csv")
m_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(m_matches,"Middlesex")
print(scorecard)
## bowler overs runs maidens wicket econrate
## 1 AJ Tye 8 75 0 6 9.375000
## 5 BAC Howell 8 41 0 5 5.125000
## 26 GR Napier 7 65 0 5 9.285714
## 15 DI Stevens 4 31 0 4 7.750000
## 19 DW Lawrence 6 37 0 4 6.166667
## 32 JW Dernbach 4 33 0 3 8.250000
## 7 BTJ Wheal 4 43 0 3 10.750000
## 18 DR Briggs 4 24 0 3 6.000000
## 50 RK Kleinveldt 4 24 0 3 6.000000
## 46 R McLaren 7 59 0 3 8.428571
## 47 R Rampaul 3 21 0 3 7.000000
## 34 L Gregory 6 51 0 2 8.500000
## 33 KMDN Kulasekara 2 24 0 2 12.000000
## 40 MG Hogan 3 17 0 2 5.666667
## 43 MTC Waller 4 31 0 2 7.750000
## 49 RJ Gleeson 4 20 0 2 5.000000
## 48 RE van der Merwe 5 24 0 2 4.800000
## 51 RN ten Doeschate 4 32 0 2 8.000000
## 53 S Prasanna 4 20 0 2 5.000000
## 56 SW Tait 3 17 0 2 5.666667
## 57 Shahid Afridi 8 55 0 2 6.875000
## 59 T van der Gugten 3 13 1 2 4.333333
## 64 TS Mills 3 34 0 2 11.333333
## 65 WAT Beer 4 23 0 2 5.750000
## 31 JH Davey 4 28 0 2 7.000000
## 68 ZS Ansari 3 16 0 2 5.333333
## 25 GM Andrew 3 19 0 2 6.333333
## 23 GJ Batty 6 55 0 2 9.166667
## 16 DJ Bravo 3 27 0 2 9.000000
## 41 MR Quinn 6 65 0 1 10.833333
## .. ... ... ... ... ... ...
## 24 GL van Buuren 7 49 0 1 7.000000
## 37 MD Hunn 3 35 0 1 11.666667
## 36 LC Norwell 6 62 0 1 10.333333
## 29 JC Tredwell 4 35 0 1 8.750000
## 35 LA Dawson 6 53 0 1 8.833333
## 62 TL Best 4 51 0 0 12.750000
## 58 T Westley 2 12 0 0 6.000000
## 4 Azharullah 3 24 0 0 8.000000
## 60 TD Groenewald 1 21 0 0 21.000000
## 61 TK Curran 4 35 0 0 8.750000
## 38 MD Taylor 3 30 0 0 10.000000
## 30 JG Myburgh 1 5 0 0 5.000000
## 8 C Overton 2 18 0 0 9.000000
## 2 Ashar Zaidi 1 5 0 0 5.000000
## 66 WR Smith 2 25 0 0 12.500000
## 28 J Overton 2 24 0 0 12.000000
## 6 BJ Taylor 1 6 0 0 6.000000
## 22 GG White 4 31 0 0 7.750000
## 55 SP Crook 1 9 0 0 9.000000
## 39 ME Claydon 4 40 0 0 10.000000
## 52 RS Bopara 4 32 0 0 8.000000
## 10 CD Nash 2 19 0 0 9.500000
## 11 CH Morris 4 36 0 0 9.000000
## 12 DA Cosker 3 32 0 0 10.666667
## 13 DA Griffiths 4 39 0 0 9.750000
## 45 PD Trego 1 11 0 0 11.000000
## 44 PA van Meekeren 2 19 0 0 9.500000
## 42 MS Crane 2 25 0 0 12.500000
## 20 FK Cowdrey 1 19 0 0 19.000000
## 14 DD Masters 2 16 0 0 8.000000
##
## [69 rows x 6 columns]
4.8 Natwest T20 Blast -Plot wins vs losses against all teams(Class 3)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Warwickshire-allMatchesAllOpposition.csv")
w_matches = pd.read_csv(path)
yka.plotWinLossByTeamAllOpposition(w_matches,'Warwickshire')
4.9 Natwest T20 Blast -Batsmen Analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails"
# M Klinger
name="M Klinger"
team='Gloucestershire'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)
# CA Ingram
name="CA Ingram"
team='Glamorgan'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)
4.11 Natwest T20 Blast -Bowler analysis (Class 4)
import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails"
# BAC Howell
name="BAC Howell"
team='Gloucestershire'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)
# GR Napier
name="GR Napier"
team='Essex'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)
Note: yorkpy will work for all T20 leagues which are in YAML format as specified in Cricsheet.
You can clone/fork the latest code for yorkpy from github yorkpy
The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data)
Conclusion This post shows the kind of detailed analysis that can be performed with yorkpy. In fact with all the converted data it should be possible to also train a Machine Learning model, which I will probably keep for another day. You could go ahead and use the data in other innovative ways. Do keep me posted if you do!!
Important note: Do check out my other posts using yorkpy at yorkpy-posts
Have fun with yorkpy!!
See also
1. Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
2. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
3. Hand detection through Haartraining: A hands-on approach
4.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
5. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
6. The 3rd paperback & kindle editions of my books on Cricket, now on Amazon
To see all posts click Index of posts
Pitching yorkpy … in the block hole – Part 4
A good programmer is someone who always looks both ways before crossing a one-way street. Doug Linder
There are two ways to write error-free programs; only the third one works. Alan J. Perlis
In order to understand recursion, one must first understand recursion. Anonymous
This is the fourth and final part of my Python package yorkpy. In this part yorkpy, the python avatar of my R package yorkr see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!, develops wings and is prepared for take-off. The yorkpy package uses data from Cricsheet
You can clone/download the code at Github yorkpy
This post has been published to RPubs at yorkpy-Part4
You can download this post as PDF at IPLT20-yorkpy-part4
You can download all the data used in this post and the previous post at yorkpyData
This post is a continuation of the earlier posts on yorkpy
1. Pitching yorkpy . short of good length to IPL – Part 1 In this part I included functions that convert the yaml data of IPL matches into Pandas dataframe which are then saved as CSV. This part can perform analysis of individual IPL matches. Note The converted data is available at yorkpyData
2. Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 This part included functions to create a large data frame for head-to-head confrontation between any 2IPL teams says CSK-MI, DD-KKR etc, which can be saved as CSV. Analysis is then performed on these team-2-team confrontations. Note The converted data is available at yorkpyData
3. Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 The 3rd part includes the performance of any IPL team against all other IPL teams. The data can also be saved as CSV.Note The converted data is available at yorkpyData
Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below).
This 4th and final part includes analysis of batting and bowling performances of any IPL player. The batting and bowling details for all teams have already been converted and are available at IPLT20-Batting-BowlingDetails
This part includes the following new functions
Batsman functions
- batsmanRunsVsDeliveries
- batsmanFoursSixes
- batsmanDismissals
- batsmanRunsVsStrikeRate
- batsmanMovingAverage
- batsmanCumulativeAverageRuns
- batsmanCumulativeStrikeRate
- batsmanRunsAgainstOpposition
- batsmanRunsVenue
Bowler functions
- bowlerMeanEconomyRate
- bowlerMeanRunsConceded
- bowlerMovingAverage
- bowlerCumulativeAvgWickets
- bowlerCumulativeAvgEconRate
- bowlerWicketPlot
- bowlerWicketsAgainstOpposition
- bowlerWicketsVenue
A. Batsman functions
1. Get IPL Team Batting details
The function below gets the overall IPL team batting details based on the CSV files that were saved for IPL T20 matches. This is currently also available in Github at yorkpyData. The batting details of the IPL team in each match is created and a huge data frame is created by combining the batting details from each match. This can be saved as a csv file with name as for e.g. Delhi Daredevils-BattingDetails.csv.
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
#csk_details = yka.getTeamBattingDetails("Chennai Super Kings",dir=dir1, save=True)
#dd_details = yka.getTeamBattingDetails("Delhi Daredevils",dir=dir1,save=True)
#kkr_details = yka.getTeamBattingDetails("Kolkata Knight Riders",dir=dir1,save=True)
2. Get IPL batsman details
This function is used to get the individual IPL T20 batting record for a the specified batsman of the team as in the functions below.
For the batsmen functions below I have chosen Rishabh Pant, Kane Williamson and Ambati Rayudu for the analysis as they top the batting lists. You can choose any IPL batsmen for the analysis
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
rpant=yka.getBatsmanDetails(team,name,dir=dir1)
3 Batsman Runs vs Deliveries (in IPL matches)
This functions plots the runs vs deliveries faced for batsman
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)
4. Batsman fours and sixes (in IPL matches)
This plots the fours, sixes and the total runs for a batsman
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)
5. Batsman dismissals (in IPL matches)
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)
6. Batsman Runs vs Strike Rate (in IPL matches)
The plots below give the Runs vs Strike rate for batsmen
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)
7. Batsman Moving average of runs (in IPL matches)
The plots below compute and plot the moving average of batsmen
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)
8. Batsman Cumulative average of runs (in IPL matches)
The functions below plot the cumulative average of the batsmen
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)
9. Batsman Cumulative Strike Rate (in IPL matches)
The functions below plot the cumulative strike rate of the batsmen
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)
10. Batsman performance against opposition (in IPL matches)
The plots below show how the batsmen performed against other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)
11. Batsman performance at different venues (in IPL matches)
The plots below show how the batsmen performed at different venues
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)
# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)
#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)
B. Bowler functions
12. Get bowling details in IPL matches
The function below gets the overall team IPL T20 bowling details based on the RData file available in IPL T20 matches. This is currently also available in Github at yorkpyData. The IPL T20 bowling details of the IPL team in each match is created, and a huge data frame is created by stacking the individual dataframes. This can be saved as a CSV file for e.g. Chennai Super Kings-BowlingDetails.csv
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
#kkr_bowling = yka.getTeamBowlingDetails("Kolkata Knight Riders",dir=dir1,save=True)
#csk_bowling = yka.getTeamBowlingDetails("Chennai Super Kings",dir=dir1,save=True)
#kxip_bowling = yka.getTeamBowlingDetails("Kings XI Punjab",dir=dir1,save=True)
13. Get bowling details of the individual IPL bowlers
This function is used to get the individual bowling record for a specified bowler of the country as in the functions below.
The plots below deal with bowler’s performance. For this analysis I have chosen Amit Mishra, Piyush Chawla and Bhuvaneshwar Kumar for the analysis. You can chose any other IPL bowler
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
#df=yka.getBowlerWicketDetails(team,name,dir=dir1)
14. Bowler Economy Rate (in IPL matches)
The plots below show the economy rate of the selected bowlers
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)
15. Bowler Mean Runs conceded (in IPL matches)
The plots below show the mean runs conceded by the selected bowlers
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)
16. Moving average of wickets for bowler (in IPL matches)
The moving average of the bowlers are plotted below
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)
17. Cumulative average wickets for bowler (in IPL matches)
The cumulative average wickets for each bowler is computed and plotted
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)
18. Cumulative average economy rate for bowler (in IPL matches)
The plots below give the cumulative average economy rate for each bowler
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)
19. Bowler wicket plot (in IPL matches)
The plots below give the over vs wickets for bowlers
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)
20. Bowler wicket against opposition (in IPL matches)
The performance of the bowlers against different IPL teams is shown below
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)
21. Bowler wicket in different venues (in IPL matches)
The plots below show how the bowlers perform at different venues
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)
# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)
#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)
Note:You can clone/download the code at Github yorkpy
Important note: Do check out my other posts using yorkpy at yorkpy-posts
Conclusion: This concludes the python package yorkpy. Go ahead and give yorkpy a spin!
Also see
1. Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
2. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
3. Hand detection through Haartraining: A hands-on approach
4.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
5. Big Data-1: Move into the big league:Graduate from Python to Pyspark
6. Cricpy takes a swing at the ODIs
To see all posts click Index of posts
Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
“Lights, camera and … action – Take 4+!”
This post includes a rework of all presentation of ‘Elements of Neural Networks and Deep Learning Parts 1-8 ‘ since my earlier presentations had some missing parts, omissions and some occasional errors. So I have re-recorded all the presentations.
This series of presentation will do a deep-dive into Deep Learning networks starting from the fundamentals. The equations required for performing learning in a L-layer Deep Learning network are derived in detail, starting from the basics. Further, the presentations also discuss multi-class classification, regularization techniques, and gradient descent optimization methods in deep networks methods. Finally the presentations also touch on how Deep Learning Networks can be tuned.
The corresponding implementations are available in vectorized R, Python and Octave are available in my book ‘Deep Learning from first principles:Second edition- In vectorized Python, R and Octave‘
1. Elements of Neural Networks and Deep Learning – Part 1
This presentation introduces Neural Networks and Deep Learning. A look at history of Neural Networks, Perceptrons and why Deep Learning networks are required and concluding with a simple toy examples of a Neural Network and how they compute. This part also includes a small digression on the basics of Machine Learning and how the algorithm learns from a data set
2. Elements of Neural Networks and Deep Learning – Part 2
This presentation takes logistic regression as an example and creates an equivalent 2 layer Neural network. The presentation also takes a look at forward & backward propagation and how the cost is minimized using gradient descent
The implementation of the discussed 2 layer Neural Network in vectorized R, Python and Octave are available in my post ‘Deep Learning from first principles in Python, R and Octave – Part 1‘
3. Elements of Neural Networks and Deep Learning – Part 3
This 3rd part, discusses a primitive neural network with an input layer, output layer and a hidden layer. The neural network uses tanh activation in the hidden layer and a sigmoid activation in the output layer. The equations for forward and backward propagation are derived.
To see the implementations for the above discussed video see my post ‘Deep Learning from first principles in Python, R and Octave – Part 2‘
4. Elements of Neural Network and Deep Learning – Part 4
This presentation is a continuation of my 3rd presentation in which I derived the equations for a simple 3 layer Neural Network with 1 hidden layer. In this video presentation, I discuss step-by-step the derivations for a L-Layer, multi-unit Deep Learning Network, with any activation function g(z)
The implementations of L-Layer, multi-unit Deep Learning Network in vectorized R, Python and Octave are available in my post Deep Learning from first principles in Python, R and Octave – Part 3
5. Elements of Neural Network and Deep Learning – Part 5
This presentation discusses multi-class classification using the Softmax function. The detailed derivation for the Jacobian of the Softmax is discussed, and subsequently the derivative of cross-entropy loss is also discussed in detail. Finally the final set of equations for a Neural Network with multi-class classification is derived.
The corresponding implementations in vectorized R, Python and Octave are available in the following posts
a. Deep Learning from first principles in Python, R and Octave – Part 4
b. Deep Learning from first principles in Python, R and Octave – Part 5
6. Elements of Neural Networks and Deep Learning – Part 6
This part discusses initialization methods specifically like He and Xavier. The presentation also focuses on how to prevent over-fitting using regularization. Lastly the dropout method of regularization is also discussed
The corresponding implementations in vectorized R, Python and Octave of the above discussed methods are available in my post Deep Learning from first principles in Python, R and Octave – Part 6
7. Elements of Neural Networks and Deep Learning – Part 7
This presentation introduces exponentially weighted moving average and shows how this is used in different approaches to gradient descent optimization. The key techniques discussed are learning rate decay, momentum method, rmsprop and adam.
The equivalent implementations of the gradient descent optimization techniques in R, Python and Octave can be seen in my post Deep Learning from first principles in Python, R and Octave – Part 7
8. Elements of Neural Networks and Deep Learning – Part 8
This last part touches on the method to adopt while tuning hyper-parameters in Deep Learning networks
Checkout my book ‘Deep Learning from first principles: Second Edition – In vectorized Python, R and Octave’. My book starts with the implementation of a simple 2-layer Neural Network and works its way to a generic L-Layer Deep Learning Network, with all the bells and whistles. The derivations have been discussed in detail. The code has been extensively commented and included in its entirety in the Appendix sections. My book is available on Amazon as paperback ($18.99) and in kindle version($9.99/Rs449).
This concludes this series of presentations on “Elements of Neural Networks and Deep Learning’
Also
1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
2. Introducing cricpy:A python package to analyze performances of cricketers
3. Natural language processing: What would Shakespeare say?
4. Big Data-2: Move into the big league:Graduate from R to SparkR
5. Presentation on Wireless Technologies – Part 1
6. Introducing cricketr! : An R package to analyze performances of cricketers
To see all posts click Index of posts
Pitching yorkpy…swinging away from the leg stump to IPL – Part 3
Clocks offer at best a convenient fiction They imply that time ticks steadily, predictably forward, when our experience shows that it often does the opposite: it stretches and compresses, skips a beat and doubles back.
David Eagleman
Memory is the space in which a thing happens for a second time
Paul Auster
Introduction
In this 3rd post, yorkpy, the python avatar of my R package yorkr develops more muscle. The first two posts of yorkpy were
1. Pitching yorkpy . short of good length to IPL – Part 1 This post dealt with function which perform analytics on an IPL match between any 2 IPL teams
2. Pitching yorkpy…on the middle and outside off-stump to IPL – Part 2 The second post dealt with analytics on all matches between any 2 IPL teams.
This third post deals with analyses and analytics of an IPL team in all matches against all other IPL teams. The data for yorkpy comes from Cricsheet. The data in Cricsheet are in the form of yaml files. These files have already been converted as dataframes and stored as CSV as seen in the earlier posts.You can download all the data used in this post and the previous post at yorkpyData
The signatures of yorkpy and yorkr are identical and will work in almost the same way. However there may be some unique functions in yorkr & yorkpy, based on what my thought process was on that day!
-You can clone/download the code at Github yorkpy
-This post has been published to RPubs at yorkpy-Part3
-Download this post as PDF at IPLT20-yorkpy-part3
-You can download all the data used in this post and the previous post at yorkpyData
Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below).
The IPL T20 functions in yorkpy are shown below
2. Get data for all T20 matches between an IPL team and all other IPL teams
We can get all IPL T20 matches between an IPL team and all other teams using the function below. The dir parameter should point to the folder which has the IPL T20 csv files of the individual matches (see Pitching yorkpy…short of good length to IPL-Part 1). This function creates a data frame of all the IPL T20 matches between the IPL team and all other teams and and also saves the dataframe as CSV file if save=True. If save=False the dataframe is just returned and not saved.
import pandas as pd
import os
import yorkpy.analytics as yka
#dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted"
#getAllMatchesAllOpposition("Kolkata Knight Riders",dir=dir1,save=True)
3. Save data for all matches between an IPL team and all oppositions
This can be done locally using the function below. You could use this function to get combine all IPL matches of an IPL team against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
#dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted"
#saveAllMatchesAllOppositionIPLT20(dir1)
Note: In the functions below, I have randomly chosen an IPL team for the analyses. You are free to choose any IPL team for your analysis
4.Team Batsmen partnership in Twenty20 (all matches against all IPL teams – summary)
The function below computes the highest partnerships for an IPL team against all other IPL teams for e.g. the batsmen with the highest partnership from Chennai Super Kings in all matches against all other IPL teams. Any other IPL team could have also been chosen.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv")
csk_matches = pd.read_csv(path)
m=yka.teamBatsmenPartnershiAllOppnAllMatches(csk_matches,'Chennai Super Kings',report="summary")
print(m)
## batsman totalPartnershipRuns
## 42 SK Raina 3699
## 28 MS Dhoni 2986
## 25 MEK Hussey 1768
## 24 M Vijay 1600
## 36 S Badrinath 1441
5. Team Batsmen partnership in Twenty20 (all matches against all IPL teams -detailed)
The function below gives the detailed breakup of partnerships for Mumbai Indian against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv")
mi_matches = pd.read_csv(path)
theTeam='Mumbai Indians'
m=yka.teamBatsmenPartnershiAllOppnAllMatches(mi_matches,theTeam,report="detailed", top=3)
print(m)
## batsman totalPartnershipRuns non_striker partnershipRuns
## 0 RG Sharma 3037.0 A Symonds 142.0
## 1 RG Sharma 3037.0 AC Blizzard 5.0
## 2 RG Sharma 3037.0 AJ Finch 2.0
## 3 RG Sharma 3037.0 AP Tare 32.0
## 4 RG Sharma 3037.0 AT Rayudu 566.0
## 5 RG Sharma 3037.0 BR Dunk 1.0
## 6 RG Sharma 3037.0 CJ Anderson 183.0
## 7 RG Sharma 3037.0 CM Gautam 22.0
## 8 RG Sharma 3037.0 DR Smith 50.0
## 9 RG Sharma 3037.0 GJ Maxwell 6.0
## 10 RG Sharma 3037.0 HH Gibbs 109.0
## 11 RG Sharma 3037.0 HH Pandya 105.0
## 12 RG Sharma 3037.0 Harbhajan Singh 86.0
## 13 RG Sharma 3037.0 JC Buttler 105.0
## 14 RG Sharma 3037.0 JEC Franklin 50.0
## 15 RG Sharma 3037.0 KA Pollard 633.0
## 16 RG Sharma 3037.0 KD Karthik 170.0
## 17 RG Sharma 3037.0 KH Pandya 34.0
## 18 RG Sharma 3037.0 KV Sharma 33.0
## 19 RG Sharma 3037.0 LMP Simmons 172.0
## 20 RG Sharma 3037.0 MEK Hussey 21.0
## 21 RG Sharma 3037.0 MJ Guptill 61.0
## 22 RG Sharma 3037.0 MJ McClenaghan 2.0
## 23 RG Sharma 3037.0 N Rana 25.0
## 24 RG Sharma 3037.0 PA Patel 103.0
## 25 RG Sharma 3037.0 RE Levi 25.0
## 26 RG Sharma 3037.0 SL Malinga 0.0
## 27 RG Sharma 3037.0 SR Tendulkar 208.0
## 28 RG Sharma 3037.0 SS Tiwary 27.0
## 29 RG Sharma 3037.0 TL Suman 7.0
## .. ... ... ... ...
## 70 KA Pollard 2344.0 CJ Anderson 82.0
## 71 KA Pollard 2344.0 CM Gautam 16.0
## 72 KA Pollard 2344.0 DR Smith 10.0
## 73 KA Pollard 2344.0 DS Kulkarni 15.0
## 74 KA Pollard 2344.0 HH Pandya 158.0
## 75 KA Pollard 2344.0 Harbhajan Singh 158.0
## 76 KA Pollard 2344.0 J Suchith 26.0
## 77 KA Pollard 2344.0 JC Buttler 37.0
## 78 KA Pollard 2344.0 JEC Franklin 38.0
## 79 KA Pollard 2344.0 JP Duminy 63.0
## 80 KA Pollard 2344.0 KD Karthik 40.0
## 81 KA Pollard 2344.0 KH Pandya 111.0
## 82 KA Pollard 2344.0 KV Sharma 13.0
## 83 KA Pollard 2344.0 LMP Simmons 77.0
## 84 KA Pollard 2344.0 MEK Hussey 10.0
## 85 KA Pollard 2344.0 MG Johnson 1.0
## 86 KA Pollard 2344.0 N Rana 60.0
## 87 KA Pollard 2344.0 PA Patel 18.0
## 88 KA Pollard 2344.0 PP Ojha 12.0
## 89 KA Pollard 2344.0 R Dhawan 25.0
## 90 KA Pollard 2344.0 R McLaren 20.0
## 91 KA Pollard 2344.0 R Sathish 27.0
## 92 KA Pollard 2344.0 RG Sharma 587.0
## 93 KA Pollard 2344.0 RJ Peterson 0.0
## 94 KA Pollard 2344.0 S Dhawan 20.0
## 95 KA Pollard 2344.0 SL Malinga 14.0
## 96 KA Pollard 2344.0 SR Tendulkar 69.0
## 97 KA Pollard 2344.0 SS Tiwary 42.0
## 98 KA Pollard 2344.0 TL Suman 2.0
## 99 KA Pollard 2344.0 Z Khan 1.0
##
## [100 rows x 4 columns]
6. Team Batsmen partnership in Twenty20 – Chart (all matches against all IPL teams)
The function below plots the partnerships of an IPL team against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Delhi Daredevils-allMatchesAllOpposition.csv")
dd_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipAllOppnAllMatchesChart(dd_matches,'Delhi Daredevils', plot=True, top=4, partnershipRuns=100)
7.Team Batsmen partnership in Twenty20 – Dataframe (all matches against all IPL teams)
This function does not plot the data but returns the dataframe to the user to plot or manipulate.
Note: Many of the plots include an additional parameters for e.g. plot which is either True or False. The default value is plot=True. When plot=True the plot will be displayed. When plot=False the data frame will be returned to the user. The user can use this to create an interactive charts. The parameter top= specifies the number of top batsmen that need to be included in the chart, and partnershipRuns gives the minimum cutoff runs in partnwerships to be considered
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Kochi Tuskers Kerala-allMatchesAllOpposition.csv")
ktk_matches = pd.read_csv(path)
m=yka.teamBatsmenPartnershipAllOppnAllMatchesChart(ktk_matches,'Kochi Tuskers Kerala', plot=False, top=3, partnershipRuns=100)
print(m)
## batsman non_striker partnershipRuns
## 0 BB McCullum BJ Hodge 17.0
## 1 BB McCullum DPMD Jayawardene 160.0
## 2 BB McCullum M Klinger 67.0
## 3 BB McCullum PA Patel 40.0
## 4 BB McCullum RA Jadeja 19.0
## 5 BB McCullum VVS Laxman 41.0
## 6 BB McCullum Y Gnaneswara Rao 13.0
## 7 DPMD Jayawardene BB McCullum 152.0
## 8 DPMD Jayawardene BJ Hodge 41.0
## 9 DPMD Jayawardene KM Jadhav 4.0
## 10 DPMD Jayawardene M Klinger 28.0
## 11 DPMD Jayawardene OA Shah 9.0
## 12 DPMD Jayawardene PA Patel 25.0
## 13 DPMD Jayawardene RA Jadeja 18.0
## 14 DPMD Jayawardene RV Gomez 10.0
## 15 DPMD Jayawardene VVS Laxman 12.0
## 16 BJ Hodge BB McCullum 18.0
## 17 BJ Hodge DPMD Jayawardene 47.0
## 18 BJ Hodge KM Jadhav 2.0
## 19 BJ Hodge OA Shah 19.0
## 20 BJ Hodge PA Patel 79.0
## 21 BJ Hodge RA Jadeja 99.0
## 22 BJ Hodge RV Gomez 21.0
8. Team batsmen versus bowler in Twenty20-Chart (all matches against all IPL teams)
The plots below provide information on how each of the top batsmen of the IPL team fared against the opposition bowlers of all other IPL teams.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv")
rcb_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersAllOppnAllMatches(rcb_matches,"Royal Challengers Bangalore",plot=True,top=3,runsScored=60)
9 Team batsmen versus bowler in Twenty20-Dataframe (all matches against all IPL teams)
This function provides the batting performance of an IPL team against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv")
kxip_matches = pd.read_csv(path)
m=yka.teamBatsmenVsBowlersAllOppnAllMatches(kxip_matches,'Kings XI Punjab',plot=False,top=2,runsScored=50)
print(m)
## batsman bowler runsScored
## 0 SE Marsh A Chandila 20.0
## 1 SE Marsh A Choudhary 1.0
## 2 SE Marsh A Kumble 37.0
## 3 SE Marsh A Mishra 0.0
## 4 SE Marsh A Mithun 9.0
## 5 SE Marsh A Nehra 33.0
## 6 SE Marsh A Singh 2.0
## 7 SE Marsh A Symonds 5.0
## 8 SE Marsh AA Chavan 19.0
## 9 SE Marsh AA Jhunjhunwala 15.0
## 10 SE Marsh AB Agarkar 27.0
## 11 SE Marsh AB Dinda 31.0
## 12 SE Marsh AB McDonald 9.0
## 13 SE Marsh AC Thomas 1.0
## 14 SE Marsh AD Mathews 7.0
## 15 SE Marsh AD Russell 8.0
## 16 SE Marsh AJ Tye 0.0
## 17 SE Marsh AL Menaria 6.0
## 18 SE Marsh AM Salvi 8.0
## 19 SE Marsh AN Ahmed 16.0
## 20 SE Marsh AS Raut 7.0
## 21 SE Marsh Ankit Sharma 2.0
## 22 SE Marsh Ankit Soni 11.0
## 23 SE Marsh B Kumar 10.0
## 24 SE Marsh B Lee 1.0
## 25 SE Marsh BAW Mendis 11.0
## 26 SE Marsh BB Sran 3.0
## 27 SE Marsh BJ Hodge 18.0
## 28 SE Marsh Basil Thampi 17.0
## 29 SE Marsh C de Grandhomme 8.0
## .. ... ... ...
## 235 DA Miller R Sharma 7.0
## 236 DA Miller R Tewatia 3.0
## 237 DA Miller R Vinay Kumar 30.0
## 238 DA Miller RA Jadeja 84.0
## 239 DA Miller RD Chahar 3.0
## 240 DA Miller RE van der Merwe 5.0
## 241 DA Miller RN ten Doeschate 1.0
## 242 DA Miller RP Singh 35.0
## 243 DA Miller Rashid Khan 0.0
## 244 DA Miller S Aravind 7.0
## 245 DA Miller S Kaul 23.0
## 246 DA Miller S Kaushik 8.0
## 247 DA Miller S Ladda 6.0
## 248 DA Miller S Nadeem 11.0
## 249 DA Miller SK Raina 2.0
## 250 DA Miller SL Malinga 9.0
## 251 DA Miller SMSM Senanayake 6.0
## 252 DA Miller SP Narine 10.0
## 253 DA Miller SR Watson 16.0
## 254 DA Miller STR Binny 14.0
## 255 DA Miller Shakib Al Hasan 3.0
## 256 DA Miller TA Boult 20.0
## 257 DA Miller TG Southee 11.0
## 258 DA Miller UT Yadav 51.0
## 259 DA Miller VR Aaron 19.0
## 260 DA Miller VS Malik 3.0
## 261 DA Miller YK Pathan 0.0
## 262 DA Miller YS Chahal 35.0
## 263 DA Miller Yuvraj Singh 11.0
## 264 DA Miller Z Khan 2.0
##
## [265 rows x 3 columns]
10. Team batting scorecard(all matches against all IPL teams)
This function provides the overall scorecard for an IPL team in all matches against all other IPL teams. The batting scorecard shows the top batsmen for Kolkata Knight Riders below
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Kolkata Knight Riders-allMatchesAllOpposition.csv")
kkr_matches = pd.read_csv(path)
scorecard=yka.teamBattingScorecardAllOppnAllMatches(kkr_matches,'Kolkata Knight Riders')
print(scorecard)
## batsman runs balls 4s 6s SR
## 19 G Gambhir 3035.0 2533 352 46 119.818397
## 17 YK Pathan 1893.0 1421 150 86 133.216045
## 22 RV Uthappa 1806.0 1311 200 54 137.757437
## 16 JH Kallis 1295.0 1237 128 23 104.688763
## 23 MK Pandey 1270.0 1048 103 38 121.183206
## 0 SC Ganguly 1031.0 977 105 36 105.527124
## 12 MK Tiwary 1002.0 921 86 23 108.794788
## 1 BB McCullum 882.0 754 92 32 116.976127
## 25 SA Yadav 608.0 474 54 21 128.270042
## 15 MS Bisla 543.0 518 60 16 104.826255
## 26 AD Russell 516.0 308 45 34 167.532468
## 4 DJ Hussey 511.0 417 31 28 122.541966
## 24 Shakib Al Hasan 498.0 399 44 15 124.812030
## 10 BJ Hodge 476.0 430 47 10 110.697674
## 11 CH Gayle 463.0 350 45 26 132.285714
## 18 EJG Morgan 444.0 373 45 16 119.034853
## 54 CA Lynn 378.0 250 30 23 151.200000
## 6 LR Shukla 374.0 320 31 15 116.875000
## 29 RN ten Doeschate 326.0 238 26 15 136.974790
## 21 DB Das 304.0 267 23 16 113.857678
## 3 WP Saha 298.0 213 24 12 139.906103
## 28 SP Narine 271.0 193 36 12 140.414508
## 13 AD Mathews 249.0 211 20 8 118.009479
## 33 Salman Butt 193.0 172 30 2 112.209302
## 41 MN van Wyk 167.0 135 19 1 123.703704
## 7 AB Agarkar 160.0 137 12 5 116.788321
## 20 R Bhatia 159.0 134 15 3 118.656716
## 51 C de Grandhomme 126.0 92 10 6 136.956522
## 39 CA Pujara 122.0 119 14 3 102.521008
## 40 OA Shah 115.0 96 7 5 119.791667
## .. ... ... ... ... .. ...
## 50 JO Holder 22.0 20 2 1 110.000000
## 65 Kuldeep Yadav 20.0 22 2 0 90.909091
## 71 BJ Haddin 18.0 11 2 1 163.636364
## 70 NM Coulter-Nile 14.0 13 0 2 107.692308
## 47 L Balaji 13.0 12 1 0 108.333333
## 55 SMSM Senanayake 10.0 17 0 0 58.823529
## 53 M Morkel 9.0 8 0 0 112.500000
## 62 AN Ghosh 7.0 8 1 0 87.500000
## 32 GB Hogg 7.0 6 0 0 116.666667
## 56 MV Boucher 6.0 6 0 0 100.000000
## 77 Azhar Mahmood 6.0 8 1 0 75.000000
## 78 DM Bravo 6.0 5 1 0 120.000000
## 68 SS Shaikh 6.0 7 1 0 85.714286
## 66 TA Boult 5.0 8 0 0 62.500000
## 76 Mohammed Shami 5.0 10 0 0 50.000000
## 80 P Dogra 5.0 8 0 0 62.500000
## 69 R Vinay Kumar 4.0 7 0 0 57.142857
## 75 AS Rajpoot 4.0 7 1 0 57.142857
## 43 Mandeep Singh 4.0 11 1 0 36.363636
## 37 AB Dinda 4.0 8 0 0 50.000000
## 79 PJ Sangwan 4.0 2 1 0 200.000000
## 73 R McLaren 3.0 6 0 0 50.000000
## 67 SB Bangar 2.0 9 0 0 22.222222
## 57 RS Gavaskar 2.0 8 0 0 25.000000
## 72 Shoaib Akhtar 2.0 8 0 0 25.000000
## 38 Mashrafe Mortaza 2.0 2 0 0 100.000000
## 63 BAW Mendis 1.0 2 0 0 50.000000
## 58 SE Bond 1.0 2 0 0 50.000000
## 44 CK Langeveldt 0.0 1 0 0 0.000000
## 30 PJ Cummins 0.0 2 0 0 0.000000
##
## [81 rows x 6 columns]
10a. Team batting scorecard(all matches against all IPL teams)
The output below shows the Chennai Super Kings against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv")
csk_matches = pd.read_csv(path)
scorecard=yka.teamBattingScorecardAllOppnAllMatches(csk_matches,'Chennai Super Kings')
print(scorecard)
## batsman runs balls 4s 6s SR
## 3 SK Raina 3699 2735 322 150 135.246801
## 5 MS Dhoni 2986 2199 218 126 135.788995
## 17 MEK Hussey 1768 1461 181 45 121.013005
## 11 M Vijay 1600 1289 141 66 124.127230
## 4 S Badrinath 1441 1245 154 28 115.742972
## 9 ML Hayden 1107 838 121 44 132.100239
## 18 F du Plessis 1081 867 92 29 124.682814
## 25 DR Smith 965 766 102 50 125.979112
## 26 BB McCullum 841 634 83 42 132.649842
## 6 JA Morkel 827 591 51 48 139.932318
## 20 DJ Bravo 706 543 54 30 130.018416
## 19 RA Jadeja 670 533 46 23 125.703565
## 0 PA Patel 516 529 67 7 97.542533
## 2 SP Fleming 196 171 27 3 114.619883
## 13 R Ashwin 190 208 19 1 91.346154
## 21 S Vidyut 145 115 21 3 126.086957
## 31 WP Saha 144 138 8 8 104.347826
## 1 S Anirudha 133 116 9 7 114.655172
## 33 DJ Hussey 116 96 8 6 120.833333
## 38 P Negi 116 77 10 5 150.649351
## 10 JDP Oram 106 107 6 5 99.065421
## 29 GJ Bailey 63 67 9 0 94.029851
## 22 A Flintoff 62 57 5 2 108.771930
## 8 MS Gony 50 39 2 5 128.205128
## 7 Joginder Sharma 36 30 1 2 120.000000
## 27 M Manhas 35 26 3 1 134.615385
## 28 MM Sharma 29 26 1 2 111.538462
## 23 SB Jakati 27 28 3 0 96.428571
## 12 JM Kemp 26 25 1 1 104.000000
## 14 L Balaji 22 35 1 1 62.857143
## 24 DE Bollinger 21 23 1 1 91.304348
## 41 CK Kapugedera 16 24 0 0 66.666667
## 37 CH Morris 14 17 0 0 82.352941
## 30 T Thushara 12 19 0 0 63.157895
## 42 M Ntini 11 19 2 0 57.894737
## 15 M Muralitharan 9 13 1 0 69.230769
## 32 KMDN Kulasekara 5 3 1 0 166.666667
## 34 SB Styris 5 2 1 0 250.000000
## 35 B Laughlin 4 9 0 0 44.444444
## 16 S Tyagi 3 4 0 0 75.000000
## 45 KB Arun Karthik 3 5 0 0 60.000000
## 36 AS Rajpoot 2 6 0 0 33.333333
## 43 RG More 2 2 0 0 100.000000
## 44 S Randiv 2 4 0 0 50.000000
## 39 A Nehra 1 7 0 0 14.285714
## 40 A Mukund 0 1 0 0 0.000000
11.Team Bowling scorecard (all matches against all IPL teams)
The output below gives the bowling performance of an IPL team against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Sunrisers Hyderabad-allMatchesAllOpposition.csv")
srh_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(srh_matches,'Sunrisers Hyderabad')
## C:\Users\Ganesh\ANACON~1\lib\site-packages\yorkpy\analytics.py:564: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df1['over']=df1.delivery.astype(int)
## C:\Users\Ganesh\ANACON~1\lib\site-packages\yorkpy\analytics.py:567: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
## df1['runsConceded']=df1['runs'] + df1['wides'] + df1['noballs']
print(scorecard)
## bowler overs runs maidens wicket econrate
## 60 JP Faulkner 28 192 0 15 6.857143
## 83 MM Sharma 37 334 0 13 9.027027
## 119 SL Malinga 31 215 0 13 6.935484
## 123 SR Watson 30 281 0 13 9.366667
## 90 NM Coulter-Nile 24 166 0 12 6.916667
## 31 DJ Bravo 26 184 0 12 7.076923
## 135 UT Yadav 37 297 0 12 8.027027
## 125 Sandeep Sharma 32 280 0 11 8.750000
## 75 M Morkel 25 195 0 9 7.800000
## 81 MJ McClenaghan 24 175 0 9 7.291667
## 5 AB Dinda 23 165 0 9 7.173913
## 55 JD Unadkat 20 167 0 8 8.350000
## 36 DS Kulkarni 28 200 0 8 7.142857
## 25 CH Morris 24 190 0 7 7.916667
## 101 R Bhatia 18 128 0 7 7.111111
## 70 Kuldeep Yadav 16 129 0 7 8.062500
## 11 AR Patel 27 208 0 7 7.703704
## 122 SP Narine 43 282 0 7 6.558140
## 141 YS Chahal 26 224 0 6 8.615385
## 44 Harbhajan Singh 39 264 0 6 6.769231
## 96 PP Chawla 21 140 0 6 6.666667
## 4 A Zampa 4 19 0 6 4.750000
## 126 Shakib Al Hasan 14 99 1 6 7.071429
## 80 MG Johnson 20 155 0 6 7.750000
## 59 JP Duminy 10 80 0 5 8.000000
## 58 JO Holder 15 113 0 5 7.533333
## 92 P Kumar 23 173 0 5 7.521739
## 100 R Ashwin 28 142 0 5 5.071429
## 2 A Mishra 18 144 0 4 8.000000
## 106 R Vinay Kumar 19 154 0 4 8.105263
## .. ... ... ... ... ... ...
## 6 AD Mascarenhas 4 25 0 0 6.250000
## 13 Ankit Soni 2 31 0 0 15.500000
## 132 TM Head 1 11 0 0 11.000000
## 10 AN Ahmed 6 63 0 0 10.500000
## 131 TM Dilshan 1 10 0 0 10.000000
## 134 Tejas Baroka 3 33 0 0 11.000000
## 73 M Ashwin 1 6 0 0 6.000000
## 109 RG Sharma 1 5 0 0 5.000000
## 22 Basil Thampi 2 21 0 0 10.500000
## 23 C Munro 1 8 0 0 8.000000
## 68 KV Sharma 2 19 0 0 9.500000
## 77 M Vijay 4 24 0 0 6.000000
## 66 KJ Abbott 3 34 0 0 11.333333
## 65 KH Pandya 2 17 0 0 8.500000
## 82 MM Patel 3 22 0 0 7.333333
## 62 K Rabada 4 59 0 0 14.750000
## 85 MP Stoinis 3 28 0 0 9.333333
## 54 JA Morkel 3 35 0 0 11.666667
## 46 I Sharma 8 64 0 0 8.000000
## 94 PJ Cummins 4 37 0 0 9.250000
## 95 PJ Sangwan 8 82 0 0 10.250000
## 103 R Sathish 1 9 0 0 9.000000
## 38 DW Steyn 2 17 0 0 8.500000
## 108 RG More 2 28 0 0 14.000000
## 34 DJG Sammy 2 18 0 0 9.000000
## 33 DJ Muthuswami 2 20 0 0 10.000000
## 32 DJ Hooda 5 45 0 0 9.000000
## 24 CH Gayle 3 24 0 0 8.000000
## 116 SA Abbott 2 21 0 0 10.500000
## 72 LR Shukla 2 28 0 0 14.000000
##
## [144 rows x 6 columns]
11a.Team Bowling scorecard (all matches against all IPL teams)
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Rajasthan Royals-allMatchesAllOpposition.csv")
rr_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(rr_matches,'Rajasthan Royals')
print(scorecard)
## bowler overs runs maidens wicket econrate
## 2 A Mishra 63 426 0 29 6.761905
## 66 JA Morkel 38 301 0 16 7.921053
## 129 R Vinay Kumar 48 406 0 15 8.458333
## 135 RP Singh 41 255 0 14 6.219512
## 95 MF Maharoof 23 139 0 14 6.043478
## 118 PP Chawla 45 353 0 14 7.844444
## 130 RA Jadeja 32 227 0 14 7.093750
## 50 DW Steyn 43 232 0 13 5.395349
## 56 Harbhajan Singh 45 341 0 12 7.577778
## 1 A Kumble 21 108 1 12 5.142857
## 159 SL Malinga 49 363 0 12 7.408163
## 60 IK Pathan 37 279 0 11 7.540541
## 82 KA Pollard 21 201 0 11 9.571429
## 119 PP Ojha 46 426 0 11 9.260870
## 121 R Ashwin 29 222 0 11 7.655172
## 22 B Kumar 31 233 0 11 7.516129
## 3 A Nehra 32 214 0 11 6.687500
## 41 DJ Bravo 30 292 0 10 9.733333
## 110 P Kumar 48 329 1 10 6.854167
## 58 I Sharma 37 284 0 9 7.675676
## 168 Shakib Al Hasan 25 153 0 9 6.120000
## 87 L Balaji 33 277 0 9 8.393939
## 122 R Bhatia 19 121 0 8 6.368421
## 48 DS Kulkarni 21 148 0 8 7.047619
## 101 MM Sharma 20 142 0 8 7.100000
## 174 UT Yadav 25 203 0 8 8.120000
## 15 AR Patel 16 110 0 7 6.875000
## 133 RJ Harris 16 132 0 7 8.250000
## 72 JH Kallis 37 254 0 7 6.864865
## 192 Z Khan 33 213 0 7 6.454545
## .. ... ... ... ... ... ...
## 170 Shoaib Ahmed 2 19 0 0 9.500000
## 54 GS Sandhu 4 49 0 0 12.250000
## 139 RV Gomez 1 9 0 0 9.000000
## 163 SPD Smith 0 5 0 0 inf
## 115 PC Valthaty 3 35 0 0 11.666667
## 34 CJ Anderson 4 26 0 0 6.500000
## 81 K Upadhyay 3 29 0 0 9.666667
## 79 K Goel 1 11 0 0 11.000000
## 28 BJ Rohrer 1 12 0 0 12.000000
## 78 Joginder Sharma 2 23 0 0 11.500000
## 99 MK Tiwary 2 28 0 0 14.000000
## 26 BE Hendricks 4 57 0 0 14.250000
## 102 MR Marsh 1 10 0 0 10.000000
## 106 NL McCullum 3 22 0 0 7.333333
## 113 P Prasanth 1 18 0 0 18.000000
## 114 P Suyal 4 45 0 0 11.250000
## 46 DP Vijaykumar 1 10 0 0 10.000000
## 154 SB Styris 2 14 0 0 7.000000
## 71 JEC Franklin 3 32 0 0 10.666667
## 70 JE Taylor 3 22 0 0 7.333333
## 18 Ankit Sharma 4 33 0 0 8.250000
## 134 RN ten Doeschate 2 14 0 0 7.000000
## 16 Abdur Razzak 2 29 0 0 14.500000
## 65 J Theron 6 48 0 0 8.000000
## 146 S Narwal 2 17 0 0 8.500000
## 63 J Botha 1 19 0 0 19.000000
## 149 S Tyagi 8 65 0 0 8.125000
## 151 SB Bangar 2 20 0 0 10.000000
## 13 AM Nayar 2 7 0 0 3.500000
## 0 A Ashish Reddy 3 22 0 0 7.333333
##
## [193 rows x 6 columns]
12. Team Bowling wicket kind -Chart (all matches against all IPL teams)
The functions compute and display the kind of wickets taken(bowled, caught, lbw etc) by an IPL team in all matches against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Gujarat Lions-allMatchesAllOpposition.csv")
gl_matches = pd.read_csv(path)
yka.teamBowlingWicketKindAllOppnAllMatches(gl_matches,'Gujarat Lions',plot=True,top=5,wickets=2)
13. Team Bowling wicket kind -Dataframe (all matches against all IPL teams)
This gives the type of wickets taken for an IPL team against all other IPL teams.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Rising Pune Supergiants-allMatchesAllOpposition.csv")
rps_matches = pd.read_csv(path)
m=yka.teamBowlingWicketKindAllOppnAllMatches(rps_matches,'Rising Pune Supergiants',plot=False,top=4,wickets=10)
print(m)
## bowler kind wickets
## 0 A Nehra caught 4
## 1 A Nehra run out 2
## 2 MM Sharma caught 3
## 3 MM Sharma caught and bowled 1
## 4 MM Sharma run out 1
## 5 SR Watson bowled 1
## 6 SR Watson caught 4
## 7 KW Richardson caught 3
## 8 KW Richardson retired hurt 1
14 Team Bowler vs Batman -Plot (all matches against all IPL teams)
The function below gives the performance of bowlers against batsmen ,in all matches against another IPL team.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Rising Pune Supergiants-allMatchesAllOpposition.csv")
rps_matches = pd.read_csv(path)
yka.teamBowlersVsBatsmenAllOppnAllMatches(rps_matches,'Rising Pune Supergiants',plot=True,top=5,runsConceded=10)
15 Team Bowler vs Batman – Dataframe (all matches against all IPL teams)
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Deccan Chargers-allMatchesAllOpposition.csv")
dc_matches = pd.read_csv(path)
m=yka.teamBowlersVsBatsmenAllOppnAllMatches(dc_matches,'Deccan Chargers',plot=False,top=2,runsConceded=30)
print(m)
## bowler batsman runsConceded
## 0 P Kumar A Ashish Reddy 6.0
## 1 P Kumar A Symonds 15.0
## 2 P Kumar AA Bilakhia 12.0
## 3 P Kumar AA Jhunjhunwala 1.0
## 4 P Kumar AC Gilchrist 20.0
## 5 P Kumar Anirudh Singh 11.0
## 6 P Kumar B Chipli 1.0
## 7 P Kumar CL White 11.0
## 8 P Kumar DB Ravi Teja 15.0
## 9 P Kumar DJ Harris 2.0
## 10 P Kumar DR Smith 5.0
## 11 P Kumar FH Edwards 3.0
## 12 P Kumar HH Gibbs 46.0
## 13 P Kumar J Theron 0.0
## 14 P Kumar JP Duminy 4.0
## 15 P Kumar KC Sangakkara 15.0
## 16 P Kumar MD Mishra 4.0
## 17 P Kumar PA Patel 9.0
## 18 P Kumar RG Sharma 36.0
## 19 P Kumar RJ Harris 3.0
## 20 P Kumar S Dhawan 37.0
## 21 P Kumar S Sohal 6.0
## 22 P Kumar SB Styris 6.0
## 23 P Kumar Shahid Afridi 0.0
## 24 P Kumar TL Suman 22.0
## 25 P Kumar VVS Laxman 5.0
## 26 P Kumar Y Venugopal Rao 1.0
## 27 PP Chawla A Ashish Reddy 2.0
## 28 PP Chawla A Symonds 35.0
## 29 PP Chawla AA Jhunjhunwala 6.0
## 30 PP Chawla AC Gilchrist 4.0
## 31 PP Chawla B Chipli 8.0
## 32 PP Chawla CL White 16.0
## 33 PP Chawla DB Ravi Teja 30.0
## 34 PP Chawla DJ Harris 9.0
## 35 PP Chawla DNT Zoysa 1.0
## 36 PP Chawla HH Gibbs 30.0
## 37 PP Chawla JP Duminy 10.0
## 38 PP Chawla KC Sangakkara 1.0
## 39 PP Chawla MR Marsh 1.0
## 40 PP Chawla PA Patel 4.0
## 41 PP Chawla PA Reddy 8.0
## 42 PP Chawla RG Sharma 50.0
## 43 PP Chawla S Dhawan 33.0
## 44 PP Chawla SB Bangar 1.0
## 45 PP Chawla TL Suman 17.0
## 46 PP Chawla VVS Laxman 7.0
## 47 PP Chawla Y Venugopal Rao 3.0
16 Team Wins and Losses – Summary (all matches against all IPL teams)
The function below computes and plots the number of wins and losses between an IPL team and all other IPL teams in all matches. The summary just gives the wins, losses and ties
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv")
csk_matches = pd.read_csv(path)
team1='Chennai Super Kings'
yka.plotWinLossByTeamAllOpposition(csk_matches,team1,plot="summary")
16a Team Wins and Losses – Detailed (all matches against all IPL teams)
The function below computes and plot the number of wins and losses between an IPL team and all other IPL teams in all matches. This gives a breakup of which team won against this team.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Chennai Super Kings-allMatchesAllOpposition.csv")
csk_matches = pd.read_csv(path)
team1='Chennai Super Kings'
yka.plotWinLossByTeamAllOpposition(csk_matches,team1,plot="detailed")
16b Team Wins and Losses – Summary (all matches against all IPL teams)
This plot gives the wins vs losses of MI against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv")
mi_matches = pd.read_csv(path)
team1='Mumbai Indians'
yka.plotWinLossByTeamAllOpposition(mi_matches,team1,plot="summary")
16c Team Wins and Losses – Detailed (all matches against all IPL teams)
The function below computes and plot the number of wins and losses between an IPL team and all other IPL teams in all matches. This gives the breakup of MI wins, losses and ties
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Mumbai Indians-allMatchesAllOpposition.csv")
mi_matches = pd.read_csv(path)
team1='Mumbai Indians'
yka.plotWinLossByTeamAllOpposition(mi_matches,team1,plot="detailed")
17 Team Wins by win type (all matches against all IPL teams)
This function shows how the win happened whether by runs or by wickets in all matches played against all other IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv")
rcb_matches = pd.read_csv(path)
yka.plotWinsByRunOrWicketsAllOpposition(rcb_matches,'Royal Challengers Bangalore')
18 Team Wins by toss decision (summary) (all matches against all IPL teams)
This show how Royal Challengers Bangalore fared when it chose to field on winning the toss
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Royal Challengers Bangalore-allMatchesAllOpposition.csv")
rcb_matches = pd.read_csv(path)
yka.plotWinsbyTossDecisionAllOpposition(rcb_matches,'Royal Challengers Bangalore',tossDecision='field',plot='summary')
18a. Team Wins by toss decision (detailed) (all matches against all IPL teams)
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv")
kxip_matches = pd.read_csv(path)
yka.plotWinsbyTossDecisionAllOpposition(kxip_matches,'Kings XI Punjab',tossDecision='field',plot='detailed')
19 Team Wins by toss decision (summary) (all matches against all IPL teams)
This plot shows how Mumbai Indians fared when it chose to bat on winning the toss against all other IPL teams.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Delhi Daredevils-allMatchesAllOpposition.csv")
mi_rcb_matches = pd.read_csv(path)
yka.plotWinsbyTossDecisionAllOpposition(mi_rcb_matches,'Mumbai Indians',tossDecision='bat',plot='summary')
20 Team Wins by toss decision (detailed)(all matches against all IPL teams)
This plot shows how Kings X1 Punjab fared when it chose to bat on winning the toss
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data2"
path=os.path.join(dir1,"Kings XI Punjab-allMatchesAllOpposition.csv")
kxip_matches = pd.read_csv(path)
yka.plotWinsbyTossDecisionAllOpposition(kxip_matches,'Kings XI Punjab',tossDecision='bat',plot='detailed')
Feel free to clone/download the code from Github yorkpy
Conclusion
This post included analysis of an IPL team against all other IPL teams. You can download the data for this and the earlier posts from [yorkpyData](https://github.com/tvganesh/yorkpyData
The code can be cloned/downloaded from Github
Important note: Do check out my other posts using yorkpy at yorkpy-posts
To be continued. Watch this space!
Also see
1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
2. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
3. Designing a Social Web Portal
4. Computer Vision: Ramblings on derivatives, histograms and contours
5. Introducing cricket package yorkr: Part 3-Foxed by flight!
6. The making of Total Control Android game
To see all posts click Index of posts
Pitching yorkpy…on the middle and outside off-stump to IPL – Part 2
When you come to a fork in the road, take it.
You’ve got to be very careful if you don’t know where you are going, because you might not get there
Yogi Berra
Try taking his (Rahul Dravid’s) wicket in the first 15 minutes. If you can’t then only try to take the remaining wickets
Steve Waugh
Introduction
This post is a follow-up to my previous post, Pitching yorkpy…short of good length to IPL-Part 1, in which I analyzed individual IPL matches. In this 2nd post I analyze the data in all matches between any 2 IPL teams, say CSK-RCB, MI-KKR or DD-RPS and so on. As I have already mentioned yorky is the python clone of my R packkage yorkr and this post is almost a mirror image of my post with yorkr namely yorkr crashes the IPL party! – Part 2. The signatures of yorkpy and yorkr are identical and will work in amost the same way. yorkpy, like yorkr, uses data from Cricsheet
You can clone/download the code at Github yorkpy
This post has been published to RPubs at yorkpy-Part2
You can download this post as PDF at IPLT20-yorkpy-part2
You can download all the data used in this post and the previous post at yorkpyData
Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below).
2. Get data for all T20 matches between 2 teams
We can get all IPL T20 matches between any 2 teams using the function below. The dir parameter should point to the folder which has the IPL T20 csv files of the individual matches (see Pitching yorkpy…short of good length to IPL-Part 1). This function creates a data frame of all the IPL T20 matches and and also saves the dataframe as CSV file if save=True. If save=False the dataframe is just returned and not saved.
import pandas as pd
import os
import yorkpy.analytics as yka
#dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted"
#yka.getAllMatchesBetweenTeams("Kolkata Knight Riders","Delhi Daredevils",dir=dir1,save=True)
3. Save data for all matches between all combination of 2 teams
This can be done locally using the function below. You could use this function to combine all IPL Twenty20 matches between any 2 IPL teams into a single dataframe and save it in the current folder. All the dataframes for all combinations have already been done and are available as CSV files in Github at yorkpyData
import pandas as pd
import os
import yorkpy.analytics as yka
#dir1= "C:\\software\\cricket-package\\yorkpyPkg\\yorkpyData\\IPLConverted"
#yka.saveAllMatchesBetween2IPLTeams(dir1)
Note: In the functions below, I have randomly chosen any 2 IPL teams and analyze how the teams have performed against each other in different areas. You are free to choose any 2 combination of IPL teams for your analysis
4.Team Batsmen partnership in Twenty20 (all matches with opposing IPL team – summary)
The function below computes the highest partnerships between the 2 IPL teams Chennai Superkings and Delhi Daredevils. Any other 2 IPL team could have also been chosen. The summary gives the top 3 batsmen for Delhi Daredevils namely Sehwag, Gambhir and Dinesh Karthik when the report=‘summary’
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Chennai Super Kings-Delhi Daredevils-allMatches.csv")
csk_dd_matches = pd.read_csv(path)
m=yka.teamBatsmenPartnershiOppnAllMatches(csk_dd_matches,'Delhi Daredevils',report="summary")
print(m)
## batsman totalPartnershipRuns
## 49 V Sehwag 233
## 12 G Gambhir 200
## 21 KD Karthik 180
## 10 DA Warner 134
## 4 AB de Villiers 133
5. Team Batsmen partnership in Twenty20 (all matches with opposing IPL team -detailed)
The function below gives the detailed breakup of partnerships between Deccan Chargers and Mumbai Indians for Deccan Chargers.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Deccan Chargers-Mumbai Indians-allMatches.csv")
dc_mi_matches = pd.read_csv(path)
theTeam='Deccan Chargers'
m=yka.teamBatsmenPartnershiOppnAllMatches(dc_mi_matches,theTeam,report="detailed", top=4)
print(m)
## batsman totalPartnershipRuns non_striker partnershipRuns
## 0 AC Gilchrist 201 A Symonds 0
## 1 AC Gilchrist 201 HH Gibbs 53
## 2 AC Gilchrist 201 MD Mishra 0
## 3 AC Gilchrist 201 RG Sharma 20
## 4 AC Gilchrist 201 Shahid Afridi 6
## 5 AC Gilchrist 201 TL Suman 7
## 6 AC Gilchrist 201 VVS Laxman 115
## 7 S Dhawan 122 A Mishra 9
## 8 S Dhawan 122 B Chipli 1
## 9 S Dhawan 122 CL White 2
## 10 S Dhawan 122 DT Christian 52
## 11 S Dhawan 122 IR Jaggi 2
## 12 S Dhawan 122 JP Duminy 9
## 13 S Dhawan 122 KC Sangakkara 16
## 14 S Dhawan 122 PA Patel 22
## 15 S Dhawan 122 S Sohal 9
## 16 RG Sharma 103 A Symonds 11
## 17 RG Sharma 103 AC Gilchrist 18
## 18 RG Sharma 103 DR Smith 6
## 19 RG Sharma 103 HH Gibbs 3
## 20 RG Sharma 103 Jaskaran Singh 15
## 21 RG Sharma 103 KAJ Roach 4
## 22 RG Sharma 103 LPC Silva 0
## 23 RG Sharma 103 TL Suman 14
## 24 RG Sharma 103 Y Venugopal Rao 32
## 25 HH Gibbs 102 AC Gilchrist 40
## 26 HH Gibbs 102 DR Smith 24
## 27 HH Gibbs 102 MD Mishra 27
## 28 HH Gibbs 102 RG Sharma 8
## 29 HH Gibbs 102 VVS Laxman 1
## 30 HH Gibbs 102 Y Venugopal Rao 2
6. Team Batsmen partnership in Twenty20 – Chart (all matches with opposing IPL team)
The function below plots the partnerships in all matches between 2 IPL teams and plots as chart
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Gujarat Lions-Kings XI Punjab-allMatches.csv")
gl_kxip_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipOppnAllMatchesChart(gl_kxip_matches,'Kings XI Punjab','Gujarat Lions', plot=True, top=4, partnershipRuns=20)
7.Team Batsmen partnership in Twenty20 – Dataframe (all matches with opposing IPL team)
This function does not plot the data but returns the dataframe to the user to plot or manipulate.
Note: Many of the plots include an additional parameters for e.g. plot which is either True or False. The default value is plot=True. When plot=True the plot will be displayed. When plot=False the data frame will be returned to the user. The user can use this to create an interactive charts. The parameter top= specifies the number of top batsmen that need to be included in the chart, and partnershipRuns gives the minimum cutoff runs in partnerships to be considered
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Kolkata Knight Riders-Rising Pune Supergiants-allMatches.csv")
kkr_rps_matches = pd.read_csv(path)
m=yka.teamBatsmenPartnershipOppnAllMatchesChart(kkr_rps_matches,'Rising Pune Supergiants','Kolkata Knight Riders', plot=False, top=5, partnershipRuns=20)
print(m)
## batsman non_striker partnershipRuns
## 0 AM Rahane F du Plessis 20
## 1 AM Rahane JA Morkel 16
## 2 AM Rahane NLTC Perera 6
## 3 AM Rahane SPD Smith 25
## 4 AM Rahane UT Khawaja 2
## 5 GJ Bailey IK Pathan 4
## 6 GJ Bailey SS Tiwary 28
## 7 GJ Bailey UT Khawaja 1
## 8 MS Dhoni IK Pathan 5
## 9 MS Dhoni JA Morkel 1
## 10 MS Dhoni NLTC Perera 2
## 11 MS Dhoni R Ashwin 1
## 12 MS Dhoni R Bhatia 22
## 13 SPD Smith AM Rahane 31
## 14 NLTC Perera AM Rahane 12
## 15 NLTC Perera MS Dhoni 13
8. Team batsmen versus bowler in Twenty20-Chart (all matches with opposing IPL team)
The plots below provide information on how each of the top batsmen of the IPL teams fared against the opposition bowlers
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Rajasthan Royals-Royal Challengers Bangalore-allMatches.csv")
rr_rcb_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersOppnAllMatches(rr_rcb_matches,'Rajasthan Royals',"Royal Challengers Bangalore",plot=True,top=3,runsScored=20)
9 Team batsmen versus bowler in Twenty20-Dataframe (all matches with opposing IPL team)
This function provides the bowling performance, the number of overs bowled, maidens, runs conceded. wickets taken and economy rate for the IPL match
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Mumbai Indians-Delhi Daredevils-allMatches.csv")
mi_dd_matches = pd.read_csv(path)
m=yka.teamBatsmenVsBowlersOppnAllMatches(mi_dd_matches,'Delhi Daredevils',"Mumbai Indians",plot=False,top=2,runsScored=50)
print(m)
## batsman bowler runsScored
## 0 V Sehwag A Nehra 6.0
## 1 V Sehwag AG Murtaza 6.0
## 2 V Sehwag AM Nayar 14.0
## 3 V Sehwag CJ McKay 10.0
## 4 V Sehwag CRD Fernando 9.0
## 5 V Sehwag DJ Bravo 9.0
## 6 V Sehwag DJ Thornely 0.0
## 7 V Sehwag DR Smith 13.0
## 8 V Sehwag DS Kulkarni 20.0
## 9 V Sehwag Harbhajan Singh 54.0
## 10 V Sehwag JJ Bumrah 19.0
## 11 V Sehwag KA Pollard 37.0
## 12 V Sehwag MM Patel 27.0
## 13 V Sehwag PP Ojha 7.0
## 14 V Sehwag R Shukla 9.0
## 15 V Sehwag RJ Peterson 7.0
## 16 V Sehwag RP Singh 28.0
## 17 V Sehwag SL Malinga 32.0
## 18 V Sehwag SM Pollock 25.0
## 19 V Sehwag ST Jayasuriya 29.0
## 20 V Sehwag Z Khan 14.0
## 21 JP Duminy CJ Anderson 3.0
## 22 JP Duminy HH Pandya 7.0
## 23 JP Duminy Harbhajan Singh 29.0
## 24 JP Duminy J Suchith 5.0
## 25 JP Duminy JJ Bumrah 70.0
## 26 JP Duminy KA Pollard 29.0
## 27 JP Duminy KH Pandya 8.0
## 28 JP Duminy M de Lange 6.0
## 29 JP Duminy MJ McClenaghan 14.0
## 30 JP Duminy N Rana 1.0
## 31 JP Duminy PP Ojha 16.0
## 32 JP Duminy R Vinay Kumar 18.0
## 33 JP Duminy RG Sharma 3.0
## 34 JP Duminy S Gopal 8.0
## 35 JP Duminy SL Malinga 8.0
## 36 JP Duminy TG Southee 3.0
10. Team batting scorecard(all matches with opposing IPL team)
This function provides the overall scorecard for an IPL team in all matches against another IPL team. In the snippet below the batting scorecard of RCB is show against CSK. Kohli, Gayle and De villiers lead the pack.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Royal Challengers Bangalore-Chennai Super Kings-allMatches.csv")
rcb_csk_matches = pd.read_csv(path)
scorecard=yka.teamBattingScorecardOppnAllMatches(rcb_csk_matches,'Royal Challengers Bangalore',"Chennai Super Kings")
print(scorecard)
## batsman runs balls 4s 6s SR
## 5 V Kohli 706 570 51 30 123.859649
## 20 CH Gayle 270 228 12 23 118.421053
## 19 AB de Villiers 241 157 26 9 153.503185
## 6 R Dravid 133 117 18 0 113.675214
## 3 JH Kallis 123 113 21 0 108.849558
## 22 MA Agarwal 120 104 15 4 115.384615
## 2 LRPL Taylor 117 102 5 6 114.705882
## 11 RV Uthappa 115 77 7 8 149.350649
## 21 SS Tiwary 86 88 4 3 97.727273
## 17 MK Pandey 73 72 10 0 101.388889
## 32 KD Karthik 61 58 9 0 105.172414
## 34 D Wiese 51 43 4 2 118.604651
## 33 SN Khan 50 36 5 1 138.888889
## 1 W Jaffer 50 36 5 2 138.888889
## 7 P Kumar 39 25 2 2 156.000000
## 28 Yuvraj Singh 38 33 2 1 115.151515
## 4 MV Boucher 37 33 4 1 112.121212
## 23 LA Pomersbach 31 21 2 2 147.619048
## 8 Z Khan 29 27 3 0 107.407407
## 12 KP Pietersen 23 15 2 1 153.333333
## 38 CL White 21 13 2 1 161.538462
## 26 YV Takawale 19 17 4 0 111.764706
## 31 MS Bisla 17 14 3 0 121.428571
## 14 R Vinay Kumar 17 10 1 1 170.000000
## 25 RR Rossouw 15 13 1 1 115.384615
## 40 AUK Pathan 14 6 2 1 233.333333
## 42 JJ van der Wath 14 11 1 1 127.272727
## 27 VH Zol 13 12 0 1 108.333333
## 30 MA Starc 13 16 1 0 81.250000
## 24 MC Henriques 12 4 3 0 300.000000
## 44 A Mithun 11 8 2 0 137.500000
## 50 PA Patel 10 14 2 0 71.428571
## 36 SP Goswami 10 19 1 0 52.631579
## 0 B Chipli 8 12 1 0 66.666667
## 9 B Akhil 8 12 1 0 66.666667
## 29 S Rana 6 8 0 0 75.000000
## 16 RE van der Merwe 5 12 0 0 41.666667
## 49 KB Arun Karthik 5 5 0 0 100.000000
## 54 Mandeep Singh 4 7 0 0 57.142857
## 37 Misbah-ul-Haq 4 6 0 0 66.666667
## 52 NJ Maddinson 4 7 1 0 57.142857
## 51 AN Ahmed 4 1 1 0 400.000000
## 15 A Kumble 3 6 0 0 50.000000
## 43 DL Vettori 3 4 0 0 75.000000
## 47 DT Christian 2 2 0 0 100.000000
## 45 J Syed Mohammad 2 3 0 0 66.666667
## 35 HV Patel 2 5 0 0 40.000000
## 41 CA Pujara 2 6 0 0 33.333333
## 10 DW Steyn 1 5 0 0 20.000000
## 18 EJG Morgan 1 4 0 0 25.000000
## 46 RR Bhatkal 0 2 0 0 0.000000
## 48 R Rampaul 0 6 0 0 0.000000
## 13 R Bishnoi 0 1 0 0 0.000000
## 39 TM Dilshan 0 1 0 0 0.000000
## 53 Iqbal Abdulla 0 3 0 0 0.000000
## 55 S Aravind 0 1 0 0 0.000000
11.Team Bowling scorecard (all matches with opposing IPL team)
The output below gives the performance of Rajasthan Royals bowlers against Kolkata Knight Riders in all matches between the 2 IPL teams.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Kolkata Knight Riders-Rajasthan Royals-allMatches.csv")
rcb_csk_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardOppnAllMatches(rcb_csk_matches,'Rajasthan Royals',"Kolkata Knight Riders")
print(scorecard)
## bowler overs runs maidens wicket econrate
## 31 Shakib Al Hasan 25 153 0 9 6.120000
## 12 I Sharma 15 118 0 6 7.866667
## 33 Umar Gul 8 61 0 6 7.625000
## 29 SP Narine 24 155 0 6 6.458333
## 1 AB Dinda 20 126 0 6 6.300000
## 23 R Vinay Kumar 8 72 0 5 9.000000
## 22 R Bhatia 15 104 0 5 6.933333
## 0 AB Agarkar 12 105 0 4 8.750000
## 17 LR Shukla 12 87 0 4 7.250000
## 6 B Lee 15 90 0 4 6.000000
## 3 AD Russell 7 59 0 4 8.428571
## 34 YK Pathan 8 61 0 4 7.625000
## 14 JD Unadkat 4 26 0 3 6.500000
## 15 JH Kallis 20 149 0 3 7.450000
## 16 L Balaji 11 73 0 3 6.636364
## 27 SE Bond 8 52 1 3 6.500000
## 10 CK Langeveldt 4 15 0 3 3.750000
## 13 Iqbal Abdulla 10 70 0 3 7.000000
## 28 SMSM Senanayake 4 26 0 2 6.500000
## 7 BAW Mendis 4 19 0 2 4.750000
## 18 M Kartik 8 56 0 2 7.000000
## 4 Anureet Singh 4 35 0 2 8.750000
## 32 UT Yadav 7 67 0 2 9.571429
## 30 SS Sarkar 3 15 0 1 5.000000
## 26 SC Ganguly 6 61 0 1 10.166667
## 5 Azhar Mahmood 3 41 0 1 13.666667
## 19 M Morkel 8 78 0 1 9.750000
## 11 DJ Hussey 2 26 0 0 13.000000
## 2 AD Mathews 3 33 0 0 11.000000
## 8 BJ Hodge 2 34 0 0 17.000000
## 25 S Narwal 2 17 0 0 8.500000
## 24 RN ten Doeschate 2 14 0 0 7.000000
## 21 PP Chawla 4 39 0 0 9.750000
## 20 Mohammed Shami 3 26 0 0 8.666667
## 9 CH Gayle 4 20 0 0 5.000000
12. Team Bowling wicket kind -Chart (all matches with opposing IPL team)
The functions compute and display the kind of wickets taken(bowled, caught, lbw etc) by an IPL team in all matches against another IPL team
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Chennai Super Kings-Rajasthan Royals-allMatches.csv")
csk_rr_matches = pd.read_csv(path)
yka.teamBowlingWicketKindOppositionAllMatches(csk_rr_matches,'Chennai Super Kings','Rajasthan Royals',plot=True,top=5,wickets=1)
13. Team Bowling wicket kind -Dataframe (all matches with opposing IPL team)
This gives the type of wickets taken
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Delhi Daredevils-Pune Warriors-allMatches.csv")
dd_pw_matches = pd.read_csv(path)
m=yka.teamBowlingWicketKindOppositionAllMatches(dd_pw_matches,'Pune Warriors','Delhi Daredevils',plot=False,top=4,wickets=1)
print(m)
## bowler kind wickets
## 0 IK Pathan bowled 1
## 1 IK Pathan caught 3
## 2 M Morkel bowled 1
## 3 M Morkel caught 3
## 4 S Nadeem bowled 1
## 5 S Nadeem caught 2
## 6 UT Yadav caught 3
14 Team Bowler vs Batman -Plot (all matches with opposing IPL team)
The function below gives the performance of bowlers in all matches against another IPL team.
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Sunrisers Hyderabad-Kolkata Knight Riders-allMatches.csv")
srh_kkr_matches = pd.read_csv(path)
yka.teamBowlersVsBatsmenOppnAllMatches(srh_kkr_matches,'Sunrisers Hyderabad','Kolkata Knight Riders',plot=True,top=5,runsConceded=10)
15 Team Bowler vs Batman – Dataframe (all matches with opposing IPL team)
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Royal Challengers Bangalore-Kings XI Punjab-allMatches.csv")
srh_kkr_matches = pd.read_csv(path)
m=yka.teamBowlersVsBatsmenOppnAllMatches(srh_kkr_matches,'Royal Challengers Bangalore','Kings XI Punjab',plot=False,top=1,runsConceded=30)
print(m)
## bowler batsman runsConceded
## 0 PP Chawla A Kumble 1
## 1 PP Chawla A Mithun 1
## 2 PP Chawla AB McDonald 3
## 3 PP Chawla AB de Villiers 29
## 4 PP Chawla CA Pujara 13
## 5 PP Chawla CH Gayle 62
## 6 PP Chawla CK Langeveldt 1
## 7 PP Chawla CL White 3
## 8 PP Chawla DL Vettori 1
## 9 PP Chawla DT Patil 4
## 10 PP Chawla JH Kallis 17
## 11 PP Chawla JJ van der Wath 1
## 12 PP Chawla KB Arun Karthik 4
## 13 PP Chawla KP Pietersen 14
## 14 PP Chawla LRPL Taylor 6
## 15 PP Chawla M Kaif 2
## 16 PP Chawla MK Pandey 10
## 17 PP Chawla MV Boucher 9
## 18 PP Chawla Misbah-ul-Haq 0
## 19 PP Chawla P Kumar 0
## 20 PP Chawla R Dravid 28
## 21 PP Chawla RE van der Merwe 7
## 22 PP Chawla RV Uthappa 19
## 23 PP Chawla SS Tiwary 6
## 24 PP Chawla V Kohli 56
## 25 PP Chawla Z Khan 0
16 Team Wins and Losses (all matches with opposing IPL team)
The function below computes and plot the number of wins and losses in a head-on confrontation between 2 IPL teams
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Chennai Super Kings-Delhi Daredevils-allMatches.csv")
csk_dd_matches = pd.read_csv(path)
yka.plotWinLossBetweenTeams(csk_dd_matches,'Chennai Super Kings','Delhi Daredevils')
17 Team Wins by win type (all matches with opposing IPL team)
This function shows how the win happened whether by runs or by wickets
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Chennai Super Kings-Delhi Daredevils-allMatches.csv")
csk_dd_matches = pd.read_csv(path)
yka.plotWinsByRunOrWickets(csk_dd_matches,'Chennai Super Kings')
18 Team Wins by toss decision-field (all matches with opposing IPL team)
This show how Rajasthan Royals fared when it chose to field on winning the toss
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Rajasthan Royals-Kings XI Punjab-allMatches.csv")
rr_kxip_matches = pd.read_csv(path)
yka.plotWinsbyTossDecision(rr_kxip_matches,'Rajasthan Royals',tossDecision='field')
18 Team Wins by toss decision-bat (all matches with opposing IPL team)
This plot shows how Mumbai Indians fared when it chose to bat on winning the toss
import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data1"
path=os.path.join(dir1,"Mumbai Indians-Royal Challengers Bangalore-allMatches.csv")
mi_rcb_matches = pd.read_csv(path)
yka.plotWinsbyTossDecision(mi_rcb_matches,'Mumbai Indians',tossDecision='bat')
Feel free to clone/download the code from Github yorkpy
Important note: Do check out my other posts using yorkpy at yorkpy-posts
Conclusion
This post included analysis of all IPL matches between any 2 IPL teams. The data for the analysis can be downloaded from [yorkpyData](https://github.com/tvganesh/yorkpyData
To be continued. Watch this space!
You may also like
Also see
1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
2. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
3. Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress
4. Introducing cricpy:A python package to analyze performances of cricketers
5. Introducing cricket package yorkr: Part 3-Foxed by flight!
6. Hand detection through Haartraining: A hands-on approach
To see all posts click Index of posts
My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 6,7,8
This is the final set of presentations in my series ‘Elements of Neural Networks and Deep Learning’. This set follows the earlier 2 sets of presentations namely
1. My presentations on ‘Elements of Neural Networks & Deep Learning’ -Part1,2,3
2. My presentations on ‘Elements of Neural Networks & Deep Learning’ -Parts 4,5
In this final set of presentations I discuss initialization methods, regularization techniques including dropout. Next I also discuss gradient descent optimization methods like momentum, rmsprop, adam etc. Lastly, I briefly also touch on hyper-parameter tuning approaches. The corresponding implementations are available in vectorized R, Python and Octave are available in my book ‘Deep Learning from first principles:Second edition- In vectorized Python, R and Octave‘
1. Elements of Neural Networks and Deep Learning – Part 6
This part discusses initialization methods specifically like He and Xavier. The presentation also focuses on how to prevent over-fitting using regularization. Lastly the dropout method of regularization is also discusses
The corresponding implementations in vectorized R, Python and Octave of the above discussed methods are available in my post Deep Learning from first principles in Python, R and Octave – Part 6
2. Elements of Neural Networks and Deep Learning – Part 7
This presentation introduces exponentially weighted moving average and shows how this is used in different approaches to gradient descent optimization. The key techniques discussed are learning rate decay, momentum method, rmsprop and adam.
The equivalent implementations of the gradient descent optimization techniques in R, Python and Octave can be seen in my post Deep Learning from first principles in Python, R and Octave – Part 7
3. Elements of Neural Networks and Deep Learning – Part 8
This last part touches upon hyper-parameter tuning in Deep Learning networks
This concludes this series of presentations on “Elements of Neural Networks and Deep Learning’
Important note: Do check out my later version of these videos at Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8 . These have more content and also include some corrections. Check it out!
Checkout my book ‘Deep Learning from first principles: Second Edition – In vectorized Python, R and Octave’. My book starts with the implementation of a simple 2-layer Neural Network and works its way to a generic L-Layer Deep Learning Network, with all the bells and whistles. The derivations have been discussed in detail. The code has been extensively commented and included in its entirety in the Appendix sections. My book is available on Amazon as paperback ($18.99) and and in kindle version($9.99/Rs449).
See also
1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
2. Big Data-1: Move into the big league:Graduate from Python to Pyspark
3. My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI)
4. Revisiting crimes against women in India
5. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!
6. Deblurring with OpenCV: Weiner filter reloaded
7. Taking a closer look at Quantum gates and their operations
To see all posts click Index of posts
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. – Jamie Zawinski
Some programmers, when confronted with a problem, think “I know, I’ll use floating point arithmetic.” Now they have 1.999999999997 problems. – @tomscott
Some people, when confronted with a problem, think “I know, I’ll use multithreading”. Nothhw tpe yawrve o oblems. – @d6
Some people, when confronted with a problem, think “I know, I’ll use versioning.” Now they have 2.1.0 problems. – @JaesCoyle
Some people, when faced with a problem, think, “I know, I’ll use binary.” Now they have 10 problems. – @nedbat
Introduction
The power of Spark, which operates on in-memory datasets, is the fact that it stores the data as collections using Resilient Distributed Datasets (RDDs), which are themselves distributed in partitions across clusters. RDDs, are a fast way of processing data, as the data is operated on parallel based on the map-reduce paradigm. RDDs can be be used when the operations are low level. RDDs, are typically used on unstructured data like logs or text. For structured and semi-structured data, Spark has a higher abstraction called Dataframes. Handling data through dataframes are extremely fast as they are Optimized using the Catalyst Optimization engine and the performance is orders of magnitude faster than RDDs. In addition Dataframes also use Tungsten which handle memory management and garbage collection more effectively.
The picture below shows the performance improvement achieved with Dataframes over RDDs
Benefits from Project Tungsten

Npte: The above data and graph is taken from the course Big Data Analysis with Apache Spark at edX, UC Berkeley
This post is a continuation of my 2 earlier posts
1. Big Data-1: Move into the big league:Graduate from Python to Pyspark
2. Big Data-2: Move into the big league:Graduate from R to SparkR
In this post I perform equivalent operations on a small dataset using RDDs, Dataframes in Pyspark & SparkR and HiveQL. As in some of my earlier posts, I have used the tendulkar.csv file for this post. The dataset is small and allows me to do most everything from data cleaning, data transformation and grouping etc.
You can clone fork the notebooks from github at Big Data:Part 3
The notebooks have also been published and can be accessed below
1. RDD – Select all columns of tables
1b.RDD – Select columns 1 to 4
[[‘Runs’, ‘Mins’, ‘BF’, ‘4s’],
[’15’, ’28’, ’24’, ‘2’],
[‘DNB’, ‘-‘, ‘-‘, ‘-‘],
[’59’, ‘254’, ‘172’, ‘4’],
[‘8′, ’24’, ’16’, ‘1’]]
1c. RDD – Select specific columns 0, 10
[(‘Ground’, ‘Runs’),
(‘Karachi’, ’15’),
(‘Karachi’, ‘DNB’),
(‘Faisalabad’, ’59’),
(‘Faisalabad’, ‘8’)]
2. Dataframe:Pyspark – Select all columns
|Runs|Mins| BF| 4s| 6s| SR|Pos|Dismissal|Inns|Opposition| Ground|Start Date|
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
| 15| 28| 24| 2| 0| 62.5| 6| bowled| 2|v Pakistan| Karachi| 15-Nov-89|
| DNB| -| -| -| -| -| -| -| 4|v Pakistan| Karachi| 15-Nov-89|
| 59| 254|172| 4| 0| 34.3| 6| lbw| 1|v Pakistan|Faisalabad| 23-Nov-89|
| 8| 24| 16| 1| 0| 50| 6| run out| 3|v Pakistan|Faisalabad| 23-Nov-89|
| 41| 124| 90| 5| 0|45.55| 7| bowled| 1|v Pakistan| Lahore| 1-Dec-89|
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
only showing top 5 rows
2a. Dataframe:Pyspark- Select specific columns
|Runs| BF|Mins|
+—-+—+—-+
| 15| 24| 28|
| DNB| -| -|
| 59|172| 254|
| 8| 16| 24|
| 41| 90| 124|
+—-+—+—-+
3. Dataframe:SparkR – Select all columns
3a. Dataframe:SparkR- Select specific columns
1 15 24 28
2 DNB – –
3 59 172 254
4 8 16 24
5 41 90 124
6 35 51 74
4. Hive QL – Select all columns
|Runs|Mins|BF |4s |6s |SR |Pos|Dismissal|Inns|Opposition|Ground |Start Date|
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
|15 |28 |24 |2 |0 |62.5 |6 |bowled |2 |v Pakistan|Karachi |15-Nov-89 |
|DNB |- |- |- |- |- |- |- |4 |v Pakistan|Karachi |15-Nov-89 |
|59 |254 |172|4 |0 |34.3 |6 |lbw |1 |v Pakistan|Faisalabad|23-Nov-89 |
|8 |24 |16 |1 |0 |50 |6 |run out |3 |v Pakistan|Faisalabad|23-Nov-89 |
|41 |124 |90 |5 |0 |45.55|7 |bowled |1 |v Pakistan|Lahore |1-Dec-89 |
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
4a. Hive QL – Select specific columns
+—-+—+—-+
|15 |24 |28 |
|DNB |- |- |
|59 |172|254 |
|8 |16 |24 |
|41 |90 |124 |
+—-+—+—-+
5. RDD – Filter rows on specific condition
[[‘Runs’,
‘Mins’,
‘BF’,
‘4s’,
‘6s’,
‘SR’,
‘Pos’,
‘Dismissal’,
‘Inns’,
‘Opposition’,
‘Ground’,
‘Start Date’],
[’15’,
’28’,
’24’,
‘2’,
‘0’,
‘62.5’,
‘6’,
‘bowled’,
‘2’,
‘v Pakistan’,
‘Karachi’,
’15-Nov-89′],
[‘DNB’,
‘-‘,
‘-‘,
‘-‘,
‘-‘,
‘-‘,
‘-‘,
‘-‘,
‘4’,
‘v Pakistan’,
‘Karachi’,
’15-Nov-89′],
[’59’,
‘254’,
‘172’,
‘4’,
‘0’,
‘34.3’,
‘6’,
‘lbw’,
‘1’,
‘v Pakistan’,
‘Faisalabad’,
’23-Nov-89′],
[‘8′,
’24’,
’16’,
‘1’,
‘0’,
’50’,
‘6’,
‘run out’,
‘3’,
‘v Pakistan’,
‘Faisalabad’,
’23-Nov-89′]]
5a. Dataframe:Pyspark – Filter rows on specific condition
|Runs|Mins| BF| 4s| 6s| SR|Pos|Dismissal|Inns|Opposition| Ground|Start Date|
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
| 15| 28| 24| 2| 0| 62.5| 6| bowled| 2|v Pakistan| Karachi| 15-Nov-89|
| 59| 254|172| 4| 0| 34.3| 6| lbw| 1|v Pakistan|Faisalabad| 23-Nov-89|
| 8| 24| 16| 1| 0| 50| 6| run out| 3|v Pakistan|Faisalabad| 23-Nov-89|
| 41| 124| 90| 5| 0|45.55| 7| bowled| 1|v Pakistan| Lahore| 1-Dec-89|
| 35| 74| 51| 5| 0|68.62| 6| lbw| 1|v Pakistan| Sialkot| 9-Dec-89|
+—-+—-+—+—+—+—–+—+———+—-+———-+———-+———-+
only showing top 5 rows
5b. Dataframe:SparkR – Filter rows on specific condition
5c Hive QL – Filter rows on specific condition
|Runs|BF |Mins|
+—-+—+—-+
|15 |24 |28 |
|59 |172|254 |
|8 |16 |24 |
|41 |90 |124 |
|35 |51 |74 |
|57 |134|193 |
|0 |1 |1 |
|24 |44 |50 |
|88 |266|324 |
|5 |13 |15 |
+—-+—+—-+
only showing top 10 rows
6. RDD – Find rows where Runs > 50
6a. Dataframe:Pyspark – Find rows where Runs >50
from pyspark.sql import SparkSession
|Runs|Mins| BF| 4s| 6s| SR|Pos|Dismissal|Inns| Opposition| Ground|Start Date|
+—-+—-+—+—+—+—–+—+———+—-+————–+————+———-+
| 59| 254|172| 4| 0| 34.3| 6| lbw| 1| v Pakistan| Faisalabad| 23-Nov-89|
| 57| 193|134| 6| 0|42.53| 6| caught| 3| v Pakistan| Sialkot| 9-Dec-89|
| 88| 324|266| 5| 0|33.08| 6| caught| 1| v New Zealand| Napier| 9-Feb-90|
| 68| 216|136| 8| 0| 50| 6| caught| 2| v England| Manchester| 9-Aug-90|
| 114| 228|161| 16| 0| 70.8| 4| caught| 2| v Australia| Perth| 1-Feb-92|
| 111| 373|270| 19| 0|41.11| 4| caught| 2|v South Africa|Johannesburg| 26-Nov-92|
| 73| 272|208| 8| 1|35.09| 5| caught| 2|v South Africa| Cape Town| 2-Jan-93|
| 50| 158|118| 6| 0|42.37| 4| caught| 1| v England| Kolkata| 29-Jan-93|
| 165| 361|296| 24| 1|55.74| 4| caught| 1| v England| Chennai| 11-Feb-93|
| 78| 285|213| 10| 0|36.61| 4| lbw| 2| v England| Mumbai| 19-Feb-93|
+—-+—-+—+—+—+—–+—+———+—-+————–+————+———-+
6b. Dataframe:SparkR – Find rows where Runs >50
7 RDD – groupByKey() and reduceByKey()
(‘Lahore’, 17.0),
(‘Adelaide’, 32.6),
(‘Colombo (SSC)’, 77.55555555555556),
(‘Nagpur’, 64.66666666666667),
(‘Auckland’, 5.0),
(‘Bloemfontein’, 85.0),
(‘Centurion’, 73.5),
(‘Faisalabad’, 27.0),
(‘Bridgetown’, 26.0)]
7a Dataframe:Pyspark – Compute mean, min and max
| Ground| avg(Runs)|min(Runs)|max(Runs)|
+————-+—————–+———+———+
| Bangalore| 54.3125| 0| 96|
| Adelaide| 32.6| 0| 61|
|Colombo (PSS)| 37.2| 14| 71|
| Christchurch| 12.0| 0| 24|
| Auckland| 5.0| 5| 5|
| Chennai| 60.625| 0| 81|
| Centurion| 73.5| 111| 36|
| Brisbane|7.666666666666667| 0| 7|
| Birmingham| 46.75| 1| 40|
| Ahmedabad| 40.125| 100| 8|
|Colombo (RPS)| 143.0| 143| 143|
| Chittagong| 57.8| 101| 36|
| Cape Town|69.85714285714286| 14| 9|
| Bridgetown| 26.0| 0| 92|
| Bulawayo| 55.0| 36| 74|
| Delhi|39.94736842105263| 0| 76|
| Chandigarh| 11.0| 11| 11|
| Bloemfontein| 85.0| 15| 155|
|Colombo (SSC)|77.55555555555556| 104| 8|
| Cuttack| 2.0| 2| 2|
+————-+—————–+———+———+
only showing top 20 rows
7b Dataframe:SparkR – Compute mean, min and max
Also see
1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
2.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
3.The Clash of the Titans in Test and ODI cricket
4. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
5.Latency, throughput implications for the Cloud
6. Simulating a Web Joint in Android
5. Pitching yorkpy … short of good length to IPL – Part 1
To see all posts click Index of Posts