Benford’s law meets IPL, Intl. T20 and ODI cricket

“To grasp how different a million is from a billion, think about it like this: A million seconds is a little under two weeks; a billion seconds is about thirty-two years.”

“One of the pleasures of looking at the world through mathematical eyes is that you can see certain patterns that would otherwise be hidden.”

               Steven Strogatz, Prof at Cornell University

Introduction

Within the last two weeks, I was introduced to Benford’s Law by 2 of my friends. Initially, I looked it up and Google and was quite intrigued by the law. Subsequently another friends asked me to check the ‘Digits’ episode, from the “Connected” series on Netflix by Latif Nasser, which I strongly recommend you watch.

Benford’s Law also called the Newcomb–Benford law, the law of anomalous numbers, or the First Digit Law states that, when dealing with quantities obtained from Nature, the frequency of appearance of each digit in the first significant place is logarithmic. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30.1% of the time, the number 2 about 17.6%, number 3 about 12.5% all the way to the number 9 at 4.6%. This interesting logarithmic pattern is observed in most natural datasets from population densities, river lengths, heights of skyscrapers, tax returns etc. What is really curious about this law, is that when we measure the lengths of rivers, the law holds perfectly regardless of the units used to measure. So the length of the rivers would obey the law whether we measure in meters, feet, miles etc. There is something almost mystical about this law.

The law has also been used widely to detect financial fraud, manipulations in tax statements, bots in twitter, fake accounts in social networks, image manipulation etc. In this age of deep fakes, the ability to detect fake images will assume paramount importance. While deviations from Benford Law do not always signify fraud, to large extent they point to an aberration. Prof Nigrini, of Cape Town used this law to identify financial discrepancies in Enron’s financial statement resulting in the infamous scandal. Also the 2009 Iranian election was found to be fradulent as the first digit percentages did not conform to those specified by Benford’s Law.

While it cannot be said with absolute certainty, marked deviations from Benford’s law could possibly indicate that there has been manipulation of natural processes. Possibly Benford’s law could be used to detect large scale match-fixing in cricket tournaments. However, we cannot look at this in isolation and the other statistical and forensic methods may be required to determine if there is fraud. Here is an interesting paper Promises and perils of Benford’s law

A set of numbers is said to satisfy Benford’s law if the leading digit d (d ∈ {1, …, 9}) occurs with probability

P(d)=log_{10}(1+1/d)

This law also works for number in other bases, in base b >=2

P(d)=log_{b}(1+1/d)

Interestingly, this law also applies to sports on the number of point scored in basketball etc. I was curious to see if this applied to cricket. Previously, using my R package yorkr, I had already converted all T20 data and ODI data from Cricsheet which is available at yorkrData2020, I wanted to check if Benford’s Law worked on the runs scored, or deliveries faced by batsmen at team level or at a tournament level (IPL, Intl. T20 or ODI).

Thankfully, R has a package benford.analysis to check for data behaviour in accordance to Benford’s Law, and I have used this package in my post

This post is also available in RPubs as Benford’s Law meets IPL, Intl. T20 and ODI

library(data.table)
library(reshape2)
library(dplyr)
library(benford.analysis)
library(yorkr)

In this post, I have randomly check data with Benford’s law. The fully converted dataset is available in yorkrData2020 which I have included above. You can try on any dataset including ODI (men,women),Intl T20(men,women),IPL,BBL,PSL,NTB and WBB.

1. Check the runs distribution by Royal Challengers Bangalore

We can see the behaviour is as expected with Benford’s law, with minor deviations

load("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Royal Challengers Bangalore-BattingDetails.RData")
rcbRunsTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") 
rcbRunsTrends
## 
## Benford object:
##  
## Data: battingDetails$runs 
## Number of observations used = 1205 
## Number of obs. for second order = 99 
## First digits analysed = 1
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.458
##          Var  0.091
##  Ex.Kurtosis -1.213
##     Skewness -0.025
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1      1         14.26
## 2      7         13.88
## 3      9          8.14
## 4      6          5.33
## 5      4          4.78
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  battingDetails$runs
## X-squared = 5.2091, df = 8, p-value = 0.735
## 
## 
##  Mantissa Arc Test
## 
## data:  battingDetails$runs
## L2 = 0.0022852, df = 2, p-value = 0.06369
## 
## Mean Absolute Deviation (MAD): 0.004941381
## MAD Conformity - Nigrini (2012): Close conformity
## Distortion Factor: -18.8725
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

2. Check the ‘balls played’ distribution by Royal Challengers Bangalore

load("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Royal Challengers Bangalore-BattingDetails.RData")
rcbBallsPlayedTrends = benford(battingDetails$ballsPlayed, number.of.digits = 1, discrete = T, sign = "positive") 
plot(rcbBallsPlayedTrends)

 

3. Check the runs distribution by Chennai Super Kings

The trend seems to deviate from the expected behavior to some extent in the number of digits for 5 & 7.

load("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Chennai Super Kings-BattingDetails.RData")
cskRunsTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") 
cskRunsTrends
## 
## Benford object:
##  
## Data: battingDetails$runs 
## Number of observations used = 1054 
## Number of obs. for second order = 94 
## First digits analysed = 1
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.466
##          Var  0.081
##  Ex.Kurtosis -1.100
##     Skewness -0.054
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1      5         27.54
## 2      2         18.40
## 3      1         17.29
## 4      9         14.23
## 5      7         14.12
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  battingDetails$runs
## X-squared = 22.862, df = 8, p-value = 0.003545
## 
## 
##  Mantissa Arc Test
## 
## data:  battingDetails$runs
## L2 = 0.002376, df = 2, p-value = 0.08173
## 
## Mean Absolute Deviation (MAD): 0.01309597
## MAD Conformity - Nigrini (2012): Marginally acceptable conformity
## Distortion Factor: -17.90664
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

4. Check runs distribution in all of Indian Premier League (IPL)

battingDF <- NULL
teams <-c("Chennai Super Kings","Deccan Chargers","Delhi Daredevils",
          "Kings XI Punjab", 'Kochi Tuskers Kerala',"Kolkata Knight Riders",
          "Mumbai Indians", "Pune Warriors","Rajasthan Royals",
          "Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
          "Rising Pune Supergiants")


setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails")
for(team in teams){
  battingDetails <- NULL
  val <- paste(team,"-BattingDetails.RData",sep="")
  print(val)
  tryCatch(load(val),
           error = function(e) {
             print("No data1")
             setNext=TRUE
           }
           
           
  )
  details <- battingDetails
  battingDF <- rbind(battingDF,details)
}
## [1] "Chennai Super Kings-BattingDetails.RData"
## [1] "Deccan Chargers-BattingDetails.RData"
## [1] "Delhi Daredevils-BattingDetails.RData"
## [1] "Kings XI Punjab-BattingDetails.RData"
## [1] "Kochi Tuskers Kerala-BattingDetails.RData"
## [1] "Kolkata Knight Riders-BattingDetails.RData"
## [1] "Mumbai Indians-BattingDetails.RData"
## [1] "Pune Warriors-BattingDetails.RData"
## [1] "Rajasthan Royals-BattingDetails.RData"
## [1] "Royal Challengers Bangalore-BattingDetails.RData"
## [1] "Sunrisers Hyderabad-BattingDetails.RData"
## [1] "Gujarat Lions-BattingDetails.RData"
## [1] "Rising Pune Supergiants-BattingDetails.RData"
trends = benford(battingDF$runs, number.of.digits = 1, discrete = T, sign = "positive") 
trends
## 
## Benford object:
##  
## Data: battingDF$runs 
## Number of observations used = 10129 
## Number of obs. for second order = 123 
## First digits analysed = 1
## 
## Mantissa: 
## 
##    Statistic   Value
##         Mean  0.4521
##          Var  0.0856
##  Ex.Kurtosis -1.1570
##     Skewness -0.0033
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1      2        159.37
## 2      9        121.48
## 3      7         93.40
## 4      8         83.12
## 5      1         61.87
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  battingDF$runs
## X-squared = 78.166, df = 8, p-value = 1.143e-13
## 
## 
##  Mantissa Arc Test
## 
## data:  battingDF$runs
## L2 = 5.8237e-05, df = 2, p-value = 0.5544
## 
## Mean Absolute Deviation (MAD): 0.006627966
## MAD Conformity - Nigrini (2012): Acceptable conformity
## Distortion Factor: -20.90333
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

5. Check Benford’s law in India matches

setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails")
load("India-BattingDetails.RData")

indiaTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") 
plot(indiaTrends)

 

6. Check Benford’s law in all of Intl. T20

setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails")
teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka',
          "England", "Bangladesh","Netherlands","Scotland", "Afghanistan",
          "Zimbabwe","Ireland","New Zealand","South Africa","Canada",
          "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea",
          "United Arab Emirates","Namibia","Cayman Islands","Singapore",
          "United States of America","Bhutan","Maldives","Botswana","Nigeria",
          "Denmark","Germany","Jersey","Norway","Qatar","Malaysia","Vanuatu",
          "Thailand")

for(team in teams){
  battingDetails <- NULL
  val <- paste(team,"-BattingDetails.RData",sep="")
  print(val)
  tryCatch(load(val),
           error = function(e) {
             print("No data1")
             setNext=TRUE
           }
           
           
  )
  details <- battingDetails
  battingDF <- rbind(battingDF,details)
  
}
intlT20Trends = benford(battingDF$runs, number.of.digits = 1, discrete = T, sign = "positive") 
intlT20Trends
## 
## Benford object:
##  
## Data: battingDF$runs 
## Number of observations used = 21833 
## Number of obs. for second order = 131 
## First digits analysed = 1
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.447
##          Var  0.085
##  Ex.Kurtosis -1.158
##     Skewness  0.018
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1      2        361.40
## 2      9        276.02
## 3      1        264.61
## 4      7        210.14
## 5      8        198.81
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  battingDF$runs
## X-squared = 202.29, df = 8, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  battingDF$runs
## L2 = 5.3983e-06, df = 2, p-value = 0.8888
## 
## Mean Absolute Deviation (MAD): 0.007821098
## MAD Conformity - Nigrini (2012): Acceptable conformity
## Distortion Factor: -24.11086
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

Conclusion

Maths rules our lives, more than we are aware, more that we like to admit. It is there in all of nature. Whether it is the recursive patterns of Mandelbrot sets, the intrinsic notion of beauty through the golden ratio, the murmuration of swallows, the synchronous blinking of fireflies or in the almost univerality of Benford’s law on natural datasets, mathematics govern us.

Isn’t it strange that while we humans pride ourselves of freewill, the runs scored by batsmen in particular formats conform to Benford’s rule for the first digits. It almost looks like, the runs that will be scored is almost to extent predetermined to fall within specified ranges obeying Benford’s law. So much for choice.

Something to be pondered over!

Also see

  1. Introducing GooglyPlusPlus!!!
  2. Deconstructing Convolutional Neural Networks with Tensorflow and Keras
  3. Going deeper into IBM’s Quantum Experience!
  4. Experiments with deblurring using OpenCV
  5. Big Data 6: The T20 Dance of Apache NiFi and yorkpy
  6. Deep Learning from first principles in Python, R and Octave – Part 4
  7. Practical Machine Learning with R and Python – Part 4
  8. Re-introducing cricketr! : An R package to analyze performances of cricketers
  9. Bull in a china shop – Behind the scenes in Android

Introducing GooglyPlusPlus!!!

“We can lift ourselves out of ignorance, we can find ourselves as creatures of excellence and intelligence and skill.”
“Heaven is not a place, and it is not a time. Heaven is being perfect.”
“Your whole body, from wingtip to wingtip, is nothing more than your thought itself, in a form you can see. Break the chains of your thought, and you break the chains of your body, too.”

From Jonathan Livingstone Seagull, by Richard Bach

Introduction

The metamorphosis is complete, from eggs to the butterfly! My R package yorkr, went on to become Googly,  and then to GooglyPlus and  now finally GooglyPlusPlus. My latest R Shiny app now provides interactive visualisation of almost all data in Cricsheet. GooglyPlusPlus visualizes the following matches

1. ODI (men)
2. ODI (women)
3. Intl. T20 (men)
4. Intl T20 (women)
5. IPL (Indian Premier League)
6. BBL (Big Bash League)
7. NTB (Natwest T20)
8. PSL (Pakistan Super League)
9. WBBL – Women’s BBL

GooglyPlusPlus is entirely based on my R package yorkr. To know more about yorkr see ‘Revitalizing R package yorkr‘ and the roughly 25+ posts on yorkr in Index of posts

This Shiny app was quite involved, and it took a lot of work to keep things organised and separate for the different forms of cricket. Anyway it is done and I am happy with the outcome.

Before you use the app, I would suggest that you take a look at the video “How to use GooglyPlusPlus?“. In this video, I show the different features of GooglyPlusPlus and how to navigate through them.

Check out GooglyPlusPlus Shiny at GooglyPlusPlus

You can clone/fork and play around with the code of GooglyPlusPlus here at Github

A. Highlights of GooglyPlusPlus.

The R Shiny app GooglyPlusPlus has the following main pages for the 9 different cricket formats. See below

 

Important note: Below I will be including some random output from the GooglyPlusPlus app for different match formats, however there is a lot more features in GooglyPlusPlus

1.  Indian Premier League (IPL)

a. IPL batsman – Batsman Runs vs Deliveries

 

b. IPL Match – Match  batting scorecard

 

c. Head-to-head between 2 IPL Teams – Team Batsmen Batting Partnership All Matches

 

 

 

d. Overall Performance – Team Bowling Scorecard Overall

 

 

 

2. International T20 Men

a. Batsman Function- Runs vs Strike rate

 

 

 

b. Bowler Function – Mean Economy Rate

 

 

3. International T20 (Women)

a.Batsman Functions – Batsman Cumulative Average Runs

 

 

b. Intl T20 Women’s match – Match worm Graph

 

 

 

 

 

4. Big Bash League (BBL)

a.Head-to-Head: Team batsmen batting partnerships

 

b.  Overall Performance – Team batsmen vs bowlers

 

 

5. Natwest T20 (NTB)

a. Head-to-head : Team bowlers vs batsmen

 

 

 

b. Batsman Runs vs Deliveries

 

 

6. Pakistan Super League (PSL)

a. Overall Performance – Batsmen Partnership

 

b. Bowling Scorecard

 

7. Women’s Big Bash League (WBBL)

a. Bowler wicket against opposition

 

 

8. One Day International (ODI) Men

a. Batsman Runs Against Opposition

 

b. Team Batsmen against bowlers

 

 

9. One Day International (ODI) women)

a. Match Batting Scorecard

b. Batsman Cumulative Strike Rate

 

 

 

Conclusion

There you have it. I have randomly shown  2 functions for each cricket format. There are many functions in each tab for the for the different match formats – namely IPL, BBL, Intl T20 (men,women), PSL etc.  Go ahead and give GooglyPlusPlus a spin!

To try out GooglyPlusPlus click GooglyPlusPlus. Don’t forget to check out the video How to use GooglyPlusPlus?

You can clone/fork the code from Github at GooglyPlusPlus

Hope you have fun with GooglyPlusPlus!!

You may also like

1. Big Data 6: The T20 Dance of Apache NiFi and yorkpy
2. Deep Learning from first principles in Python, R and Octave – Part 7
3. De-blurring revisited with Wiener filter using OpenCV
4. Exploring Quantum Gate operations with QCSimulator
5. Latency, throughput implications for the Cloud
6. Programming Zen and now – Some essential tips-2
7. The Anomaly
8. Practical Machine Learning with R and Python – Part 3
9. Introducing cricpy:A python package to analyze performances of cricketers
10. The making of Total Control Android game

To see all posts click Index of posts

It’s a wrap! yorkr wraps up BBL, NTB, PSL and WBB!!!

“Do not take life too seriously. You will never get out of it alive.” – Elbert Hubbard

“How many people here have telekenetic powers? Raise my hand.” – Emo Philips

Have you ever noticed that anybody driving slower than you is an idiot, and anyone going faster than you is a maniac?” – George Carlin

 

It’s a wrap!!! In my previous posts,Revitalizing yorkr, I showed how you can use yorkr functions for Intl. ODI, Intl. T20 and IPL. My next post yorkr rocks women’s ODI and women’s Intl T20 yorkr handled women’s ODI and Intl. T20. In this post, yorkr wraps the remaining T20 formats namely

  1. Big Bash League (BBL)
  2. Natwest Super T20 (NTB)
  3. Pakistan Super League (PSL)
  4. Women’s Big Bash League (WBB)

The data for all the above T20 formats are taken from Cricsheet.

-All the data has been converted and is available in Github at yorkrData2020 organized as below. You can use any of the 90+ yorkr functions on the converted data.

Screenshot 2020-05-16 at 12.32.07 PM

-This post has been published at RPubs at yorkrWrapUpT20formats

-You can download a PDF version of this file at yorkrWrapsUpT20Formats

  • For ODI Matches men’s and women’ use
  1. ODI-Part1, 2. ODI-Part2,3. ODI-Part3, 4.ODI-Part 4
  • For any of the T20s formats you can use the following posts
  1. T20-Part1, 2. T20-Part2, 3. T20-Part3, 4. T20-Part4

or you can use these templates Intl. T20, or similar to IPL T20

I am going to randomly pick 2 yorkr functions for each of the T20 formats BBL, NTB, PSL and WBB to demonstrate yorkr below, however you can use any of the 90+ yorkr functions

install.packages("../../../yorkrgit/yorkr_0.0.9.tar.gz",repos = NULL, type="source")
library(yorkr)
library(dplyr)

Note: In the following T20 formats I have randomly picked 2 of the 90+ yorkr functions

A. Big Bash League (BBL)

A1.Batting Scorecard

load("../../../yorkrData2020/bbl/bblMatches/Adelaide Strikers-Brisbane Heat-2017-12-31.RData")
as_bh <- overs
teamBattingScorecardMatch(as_bh,'Adelaide Strikers')
## Total= 139
## # A tibble: 9 x 5
##   batsman      ballsPlayed fours sixes  runs
##   <chr>              <int> <dbl> <dbl> <dbl>
## 1 AT Carey               6     0     0     2
## 2 CA Ingram             21     2     0    23
## 3 J Weatherald          14     2     1    20
## 4 JS Lehmann            17     3     0    22
## 5 JW Wells              13     1     0    12
## 6 MG Neser              25     3     2    40
## 7 PM Siddle              1     0     0     1
## 8 Rashid Khan            2     0     1     6
## 9 TM Head               17     0     0    13

A2.Batting Partnership

load("../../../yorkrData2020/bbl/bblMatches2Teams/Melbourne Renegades-Sydney Sixers-allMatches.RData")
mr_ss_matches <- matches
m <-teamBatsmenPartnershiOppnAllMatches(mr_ss_matches,'Sydney Sixers',report="summary")
m
## # A tibble: 28 x 2
##    batsman      totalRuns
##    <chr>            <dbl>
##  1 MC Henriques       277
##  2 JR Philippe        186
##  3 NJ Maddinson       183
##  4 MJ Lumb            165
##  5 DP Hughes          158
##  6 JC Silk            141
##  7 SPD Smith          116
##  8 JM Vince            97
##  9 TK Curran           68
## 10 J Botha             33
## # … with 18 more rows

B. Natwest Super League

B1.Team Match Partnership

load("../../../yorkrData2020/ntb/ntbMatches/Derbyshire-Nottinghamshire-2019-07-26.RData")
db_nt <-overs
teamBatsmenPartnershipMatch(db_nt,"Derbyshire","Nottinghamshire")

B2.Batsmen vs Bowlers

load("../../../yorkrData2020/ntb/ntbMatches2Teams/Birmingham Bears-Leicestershire-allMatches.RData")
bb_le_matches <- matches
teamBatsmenVsBowlersOppnAllMatches(bb_le_matches,"Birmingham Bears","Leicestershire",top=3)

C. Pakistan Super League (PSL)

C1.Individual performance of Babar Azam

library(grid)
library(gridExtra)

babar <- getBatsmanDetails(team="Karachi Kings",name="Babar Azam",dir="../../../yorkrData2020/psl/pslBattingBowlingDetails/")
## [1] "../../../yorkrData2020/psl/pslBattingBowlingDetails//Karachi Kings-BattingDetails.RData"
print(dim(babar))
## [1] 40 15
p1 <-batsmanRunsVsStrikeRate(babar,"Babar Azam")
p2 <-batsmanMovingAverage(babar,"Babar Azam")
p3 <- batsmanCumulativeAverageRuns(babar,"Babar Azam")
grid.arrange(p1,p2,p3, ncol=2)

C2.Bowling performance against all oppositions

load("../../../yorkrData2020/psl/pslMatches2Teams/Lahore Qalandars-Multan Sultans-allMatches.RData")
lq_ms_matches <- matches
teamBowlingPerfOppnAllMatches(lq_ms_matches,"Lahore Qalanders","Multan Sultans")
## # A tibble: 40 x 5
##    bowler              overs maidens  runs wickets
##    <chr>               <int>   <int> <dbl>   <dbl>
##  1 Shaheen Shah Afridi    11       1   134      11
##  2 Junaid Khan             5       0   154       8
##  3 Imran Tahir             5       0    74       6
##  4 Mohammad Ilyas          5       0    93       4
##  5 Haris Rauf              7       0   154       3
##  6 D Wiese                 7       0    92       3
##  7 Mohammad Irfan          5       0    91       3
##  8 S Lamichhane            5       0    74       3
##  9 SP Narine               8       0    48       3
## 10 MM Ali                  3       0    30       3
## # … with 30 more rows

D. Women Big Bash League

D1.Bowling scorecard

load("../../../yorkrData2020/wbb/wbbMatches/Hobart Hurricanes-Brisbane Heat-2018-12-30.RData")
hh_bh_match <- overs
teamBowlingScorecardMatch(hh_bh_match,'Brisbane Heat')
## # A tibble: 6 x 5
##   bowler      overs maidens  runs wickets
##   <chr>       <int>   <int> <dbl>   <dbl>
## 1 DM Kimmince     3       0    31       2
## 2 GM Harris       4       0    23       3
## 3 H Birkett       1       0     7       0
## 4 JL Barsby       3       0    21       0
## 5 JL Jonassen     4       0    33       0
## 6 SJ Johnson      4       0    17       0

D2.Team batsmen partnerships

load("../../../yorkrData2020/wbb/wbbAllMatchesAllTeams/allMatchesAllOpposition-Perth Scorchers.RData")
ps_matches <- matches
teamBatsmenPartnershipAllOppnAllMatchesPlot(ps_matches,"Perth Scorchers",main="Perth Scorchers")

As mentioned above, I have randomly picked 2 yorkr functions for each of the T20 formats. You can use any of the 90+ functions for analysis of matches, teams, batsmen and bowlers.

1a. Ranking Big Bash League (BBL) batsman

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/bbl/bblMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/bbl/bblBattingBowlingDetails"
rankBBLBatsmen(dir=dir,odir=odir,minMatches=30)
## # A tibble: 62 x 4
##    batsman      matches meanRuns meanSR
##    <chr>          <int>    <dbl>  <dbl>
##  1 DJM Short         44     41.6   126.
##  2 SE Marsh          48     39.1   120.
##  3 AJ Finch          60     36.0   130.
##  4 AT Carey          36     35.9   129.
##  5 KP Pietersen      31     33.5   118.
##  6 UT Khawaja        40     31.5   112.
##  7 BJ Hodge          38     31.5   127.
##  8 CA Lynn           72     31.3   128.
##  9 MP Stoinis        53     30.7   112.
## 10 TM Head           45     30     131.
## # … with 52 more rows

1b. Ranking Big Bash League (BBL) bowlers

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/bbl/bblMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/bbl/bblBattingBowlingDetails"
rankBBLBowlers(dir=dir,odir=odir,minMatches=25)
## # A tibble: 53 x 4
##    bowler         matches totalWickets meanER
##    <chr>            <int>        <dbl>  <dbl>
##  1 SA Abbott           60           90   8.42
##  2 AJ Tye              45           69   7.32
##  3 B Laughlin          48           66   7.96
##  4 BCJ Cutting         71           63   8.87
##  5 BJ Dwarshuis        54           62   7.87
##  6 MG Neser            54           57   8.36
##  7 Rashid Khan         40           55   6.32
##  8 JP Behrendorff      41           53   6.55
##  9 SNJ O'Keefe         53           52   6.76
## 10 A Zampa             42           51   7.34
## # … with 43 more rows

2a. Ranking Natwest T20 League (NTB) batsman

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ntb/ntbMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ntb/ntbBattingBowlingDetails"

rankNTBBatsmen(dir=dir,odir=odir,minMatches=20)
## # A tibble: 42 x 4
##    batsman          matches meanRuns meanSR
##    <chr>              <int>    <dbl>  <dbl>
##  1 SR Hain               24     34.6   107.
##  2 M Klinger             26     34.1   118.
##  3 MH Wessels            26     33.9   122.
##  4 DJ Bell-Drummond      21     33.1   112.
##  5 DJ Malan              26     33     129.
##  6 T Kohler-Cadmore      23     33.0   118.
##  7 A Lyth                22     31.4   150.
##  8 JJ Cobb               26     30.7   110.
##  9 CA Ingram             25     30.5   153.
## 10 IA Cockbain           26     29.8   121.
## # … with 32 more rows

2b. Ranking Natwest T20 League (NTB) bowlers

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ntb/ntbMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ntb/ntbBattingBowlingDetails"

rankNTBBowlers(dir=dir,odir=odir,minMatches=20)
## # A tibble: 23 x 4
##    bowler          matches totalWickets meanER
##    <chr>             <int>        <dbl>  <dbl>
##  1 HF Gurney            23           45   8.63
##  2 AJ Tye               26           40   7.81
##  3 TS Roland-Jones      26           37   8.10
##  4 BAC Howell           20           35   6.89
##  5 TT Bresnan           21           31   8.82
##  6 MJJ Critchley        25           31   7.33
##  7 LA Dawson            24           30   6.80
##  8 TK Curran            23           28   8.19
##  9 NA Sowter            25           28   8.09
## 10 MTC Waller           25           27   7.59
## # … with 13 more rows

3a. Ranking Pakistan Super League (PSL) batsman

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/psl/pslMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/psl/pslBattingBowlingDetails"

rankPSLBatsmen(dir=dir,odir=odir,minMatches=15)
## # A tibble: 47 x 4
##    batsman      matches meanRuns meanSR
##    <chr>          <int>    <dbl>  <dbl>
##  1 Babar Azam        40     33.7   102.
##  2 L Ronchi          31     32.9   143.
##  3 DR Smith          24     30.8   111.
##  4 JJ Roy            15     30.6   123.
##  5 Kamran Akmal      46     30.1   112.
##  6 SR Watson         40     29.2   126.
##  7 Shoaib Malik      35     28.1   113.
##  8 Fakhar Zaman      38     27.6   119.
##  9 Imam-ul-Haq       15     27.4   115.
## 10 RR Rossouw        36     27.0   130.
## # … with 37 more rows

3b. Ranking Pakistan Super League (PSL) bowlers

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/psl/pslMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/psl/pslBattingBowlingDetails"

rankPSLBowlers(dir=dir,odir=odir,minMatches=15)
## # A tibble: 25 x 4
##    bowler              matches totalWickets meanER
##    <chr>                 <int>        <dbl>  <dbl>
##  1 Wahab Riaz               44           70   6.94
##  2 Hasan Ali                41           61   7.43
##  3 Faheem Ashraf            30           50   7.84
##  4 Mohammad Amir            38           48   7.16
##  5 Usman Shinwari           26           43   8.64
##  6 Mohammad Sami            29           40   7.60
##  7 Shadab Khan              40           38   7.57
##  8 Shaheen Shah Afridi      24           34   7.88
##  9 Rumman Raees             24           33   7.77
## 10 Mohammad Hasnain         16           28   8.65
## # … with 15 more rows

4a. Ranking Women’s Big Bash League (WBB) batsman

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/wbb/wbbMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/wbb/wbbBattingBowlingDetails"
rankWBBBatsmen(dir=dir,odir=odir,minMatches=15)
## # A tibble: 36 x 4
##    batsman    matches meanRuns meanSR
##    <chr>        <int>    <dbl>  <dbl>
##  1 BL Mooney       27     46.7  129. 
##  2 SFM Devine      22     43.5  111. 
##  3 EA Perry        16     41.1   97.1
##  4 MM Lanning      19     38     98.2
##  5 JE Cameron      22     32.9  127. 
##  6 DN Wyatt        24     32    112. 
##  7 AE Jones        17     28.9  107. 
##  8 AJ Healy        19     28.4  122. 
##  9 M du Preez      19     27    101. 
## 10 L Lee           18     26.9   98.9
## # … with 26 more rows

4b. Ranking Women’s Big Bash League (WBB) bowlers

dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/wbb/wbbMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/wbb/wbbBattingBowlingDetails"
rankWBBBowlers(dir=dir,odir=odir,minMatches=15)
## # A tibble: 31 x 4
##    bowler      matches totalWickets meanER
##    <chr>         <int>        <dbl>  <dbl>
##  1 M Strano         23           37   7.25
##  2 DM Kimmince      24           36   7.46
##  3 SJ Coyte         22           29   7.59
##  4 JL Jonassen      24           28   6.81
##  5 SJ Johnson       24           27   6.61
##  6 ML Schutt        22           26   6.03
##  7 SFM Devine       22           24   7.58
##  8 M Brown          23           23   7.33
##  9 M Kapp           19           23   5.05
## 10 H Graham         19           22   7.68
## # … with 21 more rows

Conclusion

yorkr can handle ODI and T20 matches in the format as represented in Cricsheet. In my posts, I have shown how yorkr can be used for Intl. ODI and Intl. T20 for both men and women. yorkr can also handle all T20 formats like IPL T20, BBL, Natwest T20, PSL and women’s BBL. Go ahead take yorkr for a ride and check out your favorite teams and players.

Hope you have fun!!!

You may also like

  1. Getting started with Tensorflow, Keras in Python and R
  2. Computer Vision: Ramblings on derivatives, histograms and contours
  3. Cricpy adds team analytics to its arsenal!!
  4. Sixer – R package cricketr’s new Shiny avatar
  5. Big Data-2: Move into the big league:Graduate from R to SparkR
  6. Practical Machine Learning with R and Python – Part 5
  7. Deep Learning from first principles in Python, R and Octave – Part 7
  8. Exploring Quantum Gate operations with QCSimulator
  9. GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables

To see all posts click Index of Posts

yorkr rocks women’s One Day International (ODI) and International T20!!

“Life is not measured by the number of breaths we take, but by the moments that take our breath away.” Maya Angelou

“Life shrinks or expands in proportion to one’s courage.” Anais Nin

“Devotion to the truth is the hallmark of morality; there is no greater, nobler, more heroic form of devotion than the act of a man who assumes the responsibility of thinking.” Ayn Rand in Atlas Shrugged

Introduction

yorkr, this time, rocks women’s cricket!!! In this post, my R package yorkr analyzes women’s One Day International and International T20. The latest changes in my R package yorkr, as mentioned in my last post Revitalizing R package yorkr, included the modifications for the segregation men’s and women’s ODI and T20 matches into separate folders while converting them from YAML to R data frames. As the data was already converted I could just use the yorkr functions 90+ to analyze the women’s ODI and women’s T20. The data for this is taken from Cricsheet

My R package yorkr has 4 classes of functions

ODI Functions

  • Class 1: Analysis of ODI matches – See ODI-Part 1
  • Class 2: Analysis of all ODI matches between 2 ODI teams – See ODI Part 2
  • Class 3 : Analysis of all matches played by a ODI team againsta all other ODI teams – See ODI Part 3
  • Class 4 : Analysis of ODI batsmen and bowlers – See ODI Part 4

Note
-The converted data is available at yorkrData
-This RMarkdown file has been published at RPubs at yorkrAnalyzesWomensODIT20
-You can download this as a PDF at yorkrAnalyzesWomensODIT20

install.packages("../../../yorkrgit/yorkr_0.0.9.tar.gz",repos = NULL, type="source")

1. Analysis of women’s ODI matches

library(yorkr)

Save all matches between 2 teams

#saveAllMatchesBetweenTeams("../../../yorkrData2020/odi/odiWomenMatches/","../../../yorkrData2020/odi/odiWomenMatches2Teams/")

Save all matches played by an ODI team against all other ODI teams

#saveAllMatchesAllOpposition("../../../yorkrData2020/odi/odiWomenMatches/","../../../yorkrData2020/odi/odiWomenAllMatchesAllTeams/")

Since there are several functions in each class, I have randomly selected a few functions to demonstrate yorkr’s analysis ## ODI Match Analysis (Class 1) In the functions below ODI women matches are analyzed as in the India-Australia ODI in 7 Feb 2016.

1.Scorecard

load("../../../yorkrData2020/odi/odiWomenMatches/Australia-India-2016-02-07.RData")
aus_ind <- overs
teamBattingScorecardMatch(aus_ind,'India')
## Total= 223
## # A tibble: 7 x 5
##   batsman         ballsPlayed fours sixes  runs
##   <chr>                 <int> <int> <dbl> <dbl>
## 1 H Kaur                   42     2     0    22
## 2 J Goswami                 4     1     0     4
## 3 M Raj                   113    12     0    89
## 4 PG Raut                  31     2     0    24
## 5 S Mandhana               52     7     0    55
## 6 S Pandey                 18     2     0    17
## 7 V Krishnamurthy          21     2     0    12

2.Batting Partnerships

The partnerships in this match between India and Australia. Mithali Raj tops the list, with partnerships with Smriti Mandhana, Harmanpreet Kaur and Punam Raut. The next highest partnership is Smriti Mandhana

teamBatsmenPartnershipMatch(aus_ind,"India","Australia")

Analyze bowling in the women’s ODI England-New Zealand match on 15 Feb 2013

3.Wicket kind

load("../../../yorkrData2020/odi/odiWomenMatches/England-New Zealand-2013-02-15.RData")
eng_nz <- overs
teamBowlingWicketKindMatch(eng_nz,"England","New Zealand")

4.Match worm graph

Plot the match worm graph for Pakistan-South Africa women’s ODI 25 Jun 2017

load("../../../yorkrData2020/odi/odiWomenMatches/Pakistan-South Africa-2017-06-25.RData")
pak_sa <-overs
matchWormGraph(pak_sa,'Pakistan',"South Africa")

Analysis of team in all matches against another team (Class 2)

5. Team Batsmen partnerships

The functions below analyze all matches between South Africa and Sri Lanka.

load("../../../yorkrData2020/odi/odiWomenMatches2Teams/South Africa-Sri Lanka-allMatches.RData")
sa_sl_matches <- matches
m <-teamBatsmenPartnershiOppnAllMatches(sa_sl_matches,'South Africa',report="summary")
m
## # A tibble: 16 x 2
##    batsman        totalRuns
##    <chr>              <dbl>
##  1 M du Preez           241
##  2 M Kapp               194
##  3 L Wolvaardt          168
##  4 D van Niekerk        138
##  5 L Lee                138
##  6 T Chetty             136
##  7 A Steyn              118
##  8 L Goodall             89
##  9 S Luus                71
## 10 N de Klerk            35
## 11 CL Tryon              15
## 12 F Tunnicliffe         15
## 13 S Ismail               9
## 14 M Klaas                2
## 15 Y Fourie               1
## 16 B Bezuidenhout         0
teamBatsmenPartnershipOppnAllMatchesChart(sa_sl_matches,"Sri Lanka","South Africa")

6. Team bowler wicketkind

The plot below gives the performance if women Indian ODI bowlers in all ODI matches against England. The top wicket takers are Jhulan Goswami, Ekta Bisht, Gouher Sultana

load("../../../yorkrData2020/odi/odiWomenMatches2Teams/India-England-allMatches.RData")
ind_eng_matches <- matches
teamBowlersWicketsOppnAllMatches(ind_eng_matches,"India","England")

Performance of women ODI teams against all other teams in all matches (Class 3)

7. Overall batting scorecard

West Indies top scorers in ODI in all matches. The top scorers in West Indies are 1. Stafanie Taylor 2. Deandra Dottin 3. Hayley Matthews

load("../../../yorkrData2020/odi/odiWomenAllMatchesAllteams/allMatchesAllOpposition-West Indies.RData")
wi_matches <- matches
m <-teamBattingScorecardAllOppnAllMatches(wi_matches,theTeam="West Indies")
## Total= 4629
m
## # A tibble: 31 x 5
##    batsman          ballsPlayed fours sixes  runs
##    <chr>                  <int> <int> <int> <dbl>
##  1 SR Taylor               1087    83     7   766
##  2 DJS Dottin               778    69    21   641
##  3 HK Matthews              734    71     4   527
##  4 SA Campbelle             649    39     4   396
##  5 Kycia A Knight           517    35     2   284
##  6 CN Nation                554    31     1   274
##  7 Kyshona A Knight         578    35    NA   264
##  8 MR Aguilleira            481    20     3   252
##  9 B Cooper                 289    19     3   176
## 10 NY McLean                230    18     2   155
## # … with 21 more rows

Individual batsman and bowler performances (Class 4)

8. Batsmen performances

The functions below perform individual batsman and bowler analysis. I chose the top women ODI batsman

  1. Mithali Raj (Ind) has the highest ODI runs with a career average of 50.64
  2. Charlotte Edwards (Eng)
  3. Suzie Bates (NX)
#india_details <- getTeamBattingDetails("India",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
#eng_details <- getTeamBattingDetails("England",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
#nz_details <- getTeamBattingDetails("New Zealand",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")

mithali <- getBatsmanDetails(team="India",name="M Raj",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/odi/odiWomenBattingBowlingDetails/India-BattingDetails.RData"
charlotte <- getBatsmanDetails(team="England",name="CM Edwards",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/odi/odiWomenBattingBowlingDetails/England-BattingDetails.RData"
suzie<- getBatsmanDetails(team="New Zealand",name="SW Bates",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/odi/odiWomenBattingBowlingDetails/New Zealand-BattingDetails.RData"

Plot Runs vs Strike Rate

library(grid)
library(gridExtra)
p1 <-batsmanRunsVsStrikeRate(mithali,"Mithali Raj")
p2 <- batsmanRunsVsStrikeRate(charlotte, "Charlotte E")
p3 <- batsmanRunsVsStrikeRate(suzie, "Suzie Bates")
grid.arrange(p1,p2,p3, ncol=2)

Plot the moving average

p1 <-batsmanMovingAverage(mithali,"Mithali Raj")
p2 <- batsmanMovingAverage(charlotte, "Charlotte E")
p3 <- batsmanMovingAverage(suzie, "Suzie Bates")
grid.arrange(p1,p2,p3, ncol=2)

p1 <-batsmanCumulativeAverageRuns(mithali,"Mithali Raj")
p2 <- batsmanCumulativeAverageRuns(charlotte, "Charlotte E")
p3 <- batsmanCumulativeAverageRuns(suzie, "Suzie Bates")
grid.arrange(p1,p2,p3, ncol=2)

Analyze ODI bowler performances

9. Bowler performances

The following 3 bowlers have been chosen for analysis

  1. Jhulan Goswami (Ind) is the highest overwall wicket taker with 225 wicket
  2. Anisa Mohammed (WI)
  3. Sana Mir (Pak)
#india_details <- getTeamBowlingDetails("India",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
#wi_details <- getTeamBowlingDetails("West Indies",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
#pak_details <- getTeamBowlingDetails("Pakistan",dir="../../../yorkrData2020/odi/odiWomenMatches", save=TRUE,odir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")

jhulan <- getBowlerWicketDetails(team="India",name="J Goswami",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
anisa <- getBowlerWicketDetails(team="West Indies",name="A Mohammed",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")
sana <- getBowlerWicketDetails(team="Pakistan",name="Sana Mir",dir="../../../yorkrData2020/odi/odiWomenBattingBowlingDetails")

Plot the bowler Mean Economy Rate

p1<-bowlerMeanEconomyRate(jhulan,"Jhulan G")
p2<-bowlerMeanEconomyRate(anisa, "Anisa M")
p3<-bowlerMeanEconomyRate(sana, "Sana Mir")
grid.arrange(p1,p2,p3, ncol=2)

Plot the cumulative average wickets taken by the bowlers

p1<-bowlerCumulativeAvgWickets(jhulan,"Jhulan G")
p2<-bowlerCumulativeAvgWickets(anisa, "Anisa M")
p3<-bowlerCumulativeAvgWickets(sana, "Sana Mir")
grid.arrange(p1,p2,p3, ncol=2)

2. Analysis of women’s International Twenty 20 matches

I have chosen some random yorkr functions to show the analysis of T20 players and matches

T20 Functions

There are the following class of T20 functions

  • Class 1: Analysis of T20 matches – See T20-Part 1
  • Class 2: Analysis of all T20 matches between 2 T20 teams – See T20 Part 2
  • Class 3 : Analysis of all matches played by a T20 team againsta All other T20 teams – See T20 Part 3
  • Class 4 : Analysis of T20 batsmen and bowlers – See T20 Part 4

You can also refer to the yorkr template that I created Analysis of International T20 matches with yorkr templates

Save all matches between teams

#saveAllMatchesBetweenTeams("../../../yorkrData2020/t20/t20WomenMatches/","../../../yorkrData2020/t20/t20WomenMatches2Teams/")

Save all T20 matches played by a team against all other teams

#saveAllMatchesAllOpposition("../../../yorkrData2020/t20/t20WomenMatches/","../../../yorkrData2020/t20/t20WomenAllMatchesAllTeams/")

T20 Match Analysis (Class 1)

10. Batting scorecard

Print the scorecard for the Bangladesh- Ireland match played on 3 Apr 2014

load("../../../yorkrData2020/t20/t20WomenMatches/Bangladesh-Ireland-2014-04-03.RData")
ban_ire <- overs
teamBattingScorecardMatch(ban_ire,'Bangladesh')
## Total= 95
## # A tibble: 9 x 5
##   batsman         ballsPlayed fours sixes  runs
##   <chr>                 <int> <dbl> <dbl> <dbl>
## 1 Ayasha Rahman            19     2     0    12
## 2 Fahima Khatun             2     0     0     0
## 3 Lata Mondal              12     1     0     8
## 4 Panna Ghosh               3     0     0     4
## 5 Rumana Ahmed             14     3     0    16
## 6 Salma Khatun              6     1     0     7
## 7 Shaila Sharmin            7     0     0     6
## 8 Shamima Sultana          11     0     0     7
## 9 Sharmin Akhter           46     3     0    35

Plot the performance of T20 batsmen against in bowlers in Germany – Netherlands.

load("../../../yorkrData2020/t20/t20WomenMatches/Germany-Netherlands-2019-06-27.RData")
ger_net <- overs
teamBatsmenVsBowlersMatch(ger_net,'Netherlands',"Germany",plot=TRUE)

11. Bowling scorecard

Print the bowling scorecard of Hong Kong-Kuwait T20 match played on 25 Feb 2019

load("../../../yorkrData2020/t20/t20WomenMatches/Hong Kong-Kuwait-2019-02-25.RData")
hk_kuw <-overs
teamBowlingScorecardMatch(hk_kuw,'Hong Kong')
## # A tibble: 5 x 5
##   bowler      overs maidens  runs wickets
##   <chr>       <int>   <int> <dbl>   <int>
## 1 Chan Ka Man     2       0     5       1
## 2 KY Chan         3       1     2       4
## 3 M Hill          2       0     6       1
## 4 M Wai Siu       2       0    11       1
## 5 M Yousaf        1       1     0       3

Head to head between 2 women’s T20 teams (Class 2)

12. Team batting partnerships

Print the partnership among Indian T20 women in all matches against England

load("../../../yorkrData2020/t20/t20WomenMatches2Teams/India-England-allMatches.RData")
ind_eng_matches <- matches
m <-teamBatsmenPartnershiOppnAllMatches(ind_eng_matches,'India',report="detailed")
m[1:30,]
##       batsman      nonStriker partnershipRuns totalRuns
## 1       M Raj        A Sharma               2       233
## 2       M Raj      BS Fulmali              25       233
## 3       M Raj       DB Sharma              16       233
## 4       M Raj          H Kaur              18       233
## 5       M Raj       J Goswami               6       233
## 6       M Raj         KV Jain               5       233
## 7       M Raj        L Kumari               5       233
## 8       M Raj     N Niranjana               3       233
## 9       M Raj        N Tanwar              17       233
## 10      M Raj         PG Raut              41       233
## 11      M Raj      R Malhotra               5       233
## 12      M Raj      S Mandhana              17       233
## 13      M Raj          S Naik              10       233
## 14      M Raj        S Pandey              19       233
## 15      M Raj        SK Naidu              37       233
## 16      M Raj V Krishnamurthy               7       233
## 17 S Mandhana          H Deol              20       145
## 18 S Mandhana    JI Rodrigues              47       145
## 19 S Mandhana           M Raj              32       145
## 20 S Mandhana   Shafali Verma              46       145
## 21     H Kaur        A Sharma               1       137
## 22     H Kaur        AA Patil               8       137
## 23     H Kaur       DB Sharma              14       137
## 24     H Kaur         E Bisht               3       137
## 25     H Kaur       J Goswami              11       137
## 26     H Kaur    JI Rodrigues              12       137
## 27     H Kaur           M Raj              19       137
## 28     H Kaur      MR Meshram              33       137
## 29     H Kaur        N Tanwar               2       137
## 30     H Kaur         PG Raut               0       137

13. Team batting partnerships (plot)

Plot the batting partnership of Indian T20 womern against England

The best batsmen are Mithali Raj, Smriti Mandhana and Harmanpreet Kaur in that order

teamBatsmenPartnershipOppnAllMatchesChart(ind_eng_matches,"India","England")

14. Team Wicketkind

Plot the wicket kind taken by the bowlers of Scotland against USA

load("../../../yorkrData2020/t20/t20WomenMatches2Teams/Scotland-United States of America-allMatches.RData")
sco_usa_matches <- matches
teamBowlersWicketsOppnAllMatches(sco_usa_matches,"Scotalnd","USA")

Performance of teams against all other teams in all T20 matches (Class 3)

15. Overall team scorecard

Print the batting scorecard of Zimbabwe against all other teams

load("../../../yorkrData2020/t20/t20WomenAllMatchesAllTeams/allMatchesAllOpposition-Zimbabwe.RData")
zim_matches <- matches
m <-teamBattingScorecardAllOppnAllMatches(zim_matches,theTeam="Zimbabwe")
## Total= 571
m
## # A tibble: 7 x 5
##   batsman      ballsPlayed fours sixes  runs
##   <chr>              <int> <int> <int> <dbl>
## 1 SM Mayers            181    20     3   216
## 2 M Mupachikwa         139     9    NA   125
## 3 CS Mugeri             88     9     2   119
## 4 M Musonda             38     2     1    46
## 5 J Nkomo               25     3    NA    34
## 6 A Ndiraya             14     3    NA    18
## 7 AC Mushangwe          13    NA    NA    13

15. Team batting partnerships

Print the batting partnership of West Indies. The best performances are by 1. Stafanie Taylor 2. Deandra Dottin 3. Hayley Matthews

load("../../../yorkrData2020/t20/t20WomenAllMatchesAllTeams/allMatchesAllOpposition-West Indies.RData")
wi_matches <- matches
m <- teamBatsmenPartnershipAllOppnAllMatches(wi_matches,theTeam='West Indies')
m
## # A tibble: 29 x 2
##    batsman        totalRuns
##    <chr>              <dbl>
##  1 SR Taylor           1199
##  2 DJS Dottin           912
##  3 HK Matthews          458
##  4 SA Campbelle         407
##  5 B Cooper             300
##  6 SACA King            287
##  7 MR Aguilleira        250
##  8 CN Nation            243
##  9 Kycia A Knight       240
## 10 NY McLean            142
## # … with 19 more rows

16. Team bowling wicketkind

The plot below shows the women T20 bowlers who have performed the best against India namely 1. Katherine Brunt (Eng) 2. Elysse Perry (Aus) 3. Anya Shrubsole

load("../../../yorkrData2020/t20/t20WomenAllMatchesAllTeams/allMatchesAllOpposition-India.RData")
ind_matches <- matches
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="All")

Analyze women T20 batsmen & bowlers (Class 4)

17. T20 batsmen performances

The following 4 players were chosen

  1. Harmanpreet Kaur (Ind)
  2. Suzie Bates (NZ)
  3. Meg Lanning (Aus)
  4. Stafanie Tay;or (WI)
#india_details <- getTeamBattingDetails("India",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#eng_details <- getTeamBattingDetails("England",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#aus_details <- getTeamBattingDetails("Australia",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#wi_details <-  getTeamBattingDetails("West Indies",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#nz_details <-  getTeamBattingDetails("New Zealand",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")

harmanpreet <- getBatsmanDetails(team="India",name="H Kaur",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/t20/t20WomenBattingBowlingDetails/India-BattingDetails.RData"
suzie <- getBatsmanDetails(team="New Zealand",name="SW Bates",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/t20/t20WomenBattingBowlingDetails/New Zealand-BattingDetails.RData"
meg <- getBatsmanDetails(team="Australia",name="MM Lanning",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/t20/t20WomenBattingBowlingDetails/Australia-BattingDetails.RData"
stafanie <- getBatsmanDetails(team="West Indies",name="SR Taylor",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
## [1] "../../../yorkrData2020/t20/t20WomenBattingBowlingDetails/West Indies-BattingDetails.RData"

Plot the performance of the players against opposition.

batsmanRunsAgainstOpposition(harmanpreet,"Harmanpreet")

batsmanRunsAgainstOpposition(suzie,"Suzie Bates")

batsmanRunsAgainstOpposition(stafanie,"Stafanie Taylor")

batsmanRunsAgainstOpposition(meg,"Meg Lanning")

Plot the cumulative strike rate of the players. Meg Lanning has the best strike rate of the lot. Stafanie and Suzie also touch a strike rate of 100

p1<-batsmanCumulativeStrikeRate(harmanpreet,"Harmanpreet")
p2<-batsmanCumulativeStrikeRate(suzie,"Suzie Bates")
p3<-batsmanCumulativeStrikeRate(stafanie,"Stafanie Taylor")
p4 <-batsmanCumulativeStrikeRate(meg,"Meg Lanning")
grid.arrange(p1,p2,p3,p4, ncol=2)

Analyze women’s T20 bowlers.

18. T20 bowler performances

The following bowlers were chosen for analysis

  1. Poonam Yadav (Ind)
  2. Anisa Mohammed (WI)
  3. Ellyse Perry (Aus)
  4. Anya Shrubsole (England)
#india_details <- getTeamBowlingDetails("India",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#wi_details <- getTeamBowlingDetails("West Indies",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#aus_details <- getTeamBowlingDetails("Australia",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
#eng_details <- getTeamBowlingDetails("England",dir="../../../yorkrData2020/t20/t20WomenMatches", save=TRUE,odir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")

poonam <- getBowlerWicketDetails(team="India",name="Poonam Yadav",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
anisa <- getBowlerWicketDetails(team="West Indies",name="A Mohammed",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
ellyse <- getBowlerWicketDetails(team="Australia",name="EA Perry",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")
anya <- getBowlerWicketDetails(team="England",name="A Shrubsole",dir="../../../yorkrData2020/t20/t20WomenBattingBowlingDetails")

Plot the bowler’s moving average

p1<-bowlerMovingAverage(poonam,"Poonam Yadav")
p2<-bowlerMovingAverage(anisa,"Anisa M")
p3 <-bowlerMovingAverage(ellyse,"Ellyse Perry")
p4 <-bowlerMovingAverage(anya,"Anya Shrubsole")
grid.arrange(p1,p2,p3,p4, ncol=2)

Plot the bowlers Cumulative Average Wickets

p1<-bowlerCumulativeAvgWickets(poonam,"Poonam Yadav")
p2<-bowlerCumulativeAvgWickets(anisa,"Anisa M")
p3 <-bowlerCumulativeAvgWickets(ellyse,"Ellyse Perry")
p4 <-bowlerCumulativeAvgWickets(anya,"Anya Shrubsole")
grid.arrange(p1,p2,p3,p4, ncol=2)

3a. Rank women ODI batsmen

Note: Mithali Raj (Ind) tops the ODI table with the most runs and highest average in ODI. The Cricsheet data does not have the earlier years in which she played. Hence you may see a much lower average for Mithali Raj

library(yorkr)
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiWomenMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiWomenBattingBowlingDetails"

rankODIBatsmen(dir=dir,odir=odir,minMatches=30)

## # A tibble: 24 x 4
##    batsman          matches meanRuns meanSR
##    <chr>              <int>    <dbl>  <dbl>
##  1 AE Satterthwaite      32     61.5   81.2
##  2 MM Lanning            47     49.5   85.0
##  3 TT Beaumont           35     45.8   68.5
##  4 EA Perry              42     45.7   74.3
##  5 SW Bates              42     44.0   70.9
##  6 NR Sciver             35     43.0   94.7
##  7 M Raj                 35     42.8   64.1
##  8 AC Jayangani          48     38.6   59.9
##  9 NE Bolton             32     36.5   60.5
## 10 T Chetty              34     33.1   70.3
## # … with 14 more rows

3b. Rank women ODI bowlers

Note: Jhulan Goswami tops the ODI bowlers with the most wickets. However the rank below is based on the available data in Cricsheet

library(yorkr)
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiWomenMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiWomenBattingBowlingDetails"

rankODIBowlers(dir=dir,odir=odir,minMatches=30)

## # A tibble: 19 x 4
##    bowler        matches totalWickets meanER
##    <chr>           <int>        <dbl>  <dbl>
##  1 JL Jonassen        44           76   3.90
##  2 M Kapp             49           70   3.80
##  3 S Ismail           44           65   3.82
##  4 KH Brunt           42           62   3.57
##  5 EA Perry           43           58   4.44
##  6 A Shrubsole        41           58   4.07
##  7 J Goswami          33           58   3.59
##  8 S Luus             41           54   4.82
##  9 D van Niekerk      40           53   3.84
## 10 ML Schutt          35           48   4.46
## 11 A Khaka            33           47   4.08
## 12 JL Gunn            30           43   4.23
## 13 I Ranaweera        35           42   4.89
## 14 Sana Mir           32           41   4.29
## 15 LA Marsh           30           40   4.16
## 16 NR Sciver          36           37   4.64
## 17 NR Sciver          36           37   4.64
## 18 NR Sciver          36           37   4.64
## 19 NR Sciver          36           37   4.64

4a. Rank women T20 batsman

library(yorkr)
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20WomenMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20WomenBattingBowlingDetails"

rankT20Batsmen(dir=dir,odir=odir,minMatches=30)

## # A tibble: 30 x 4
##    batsman       matches meanRuns meanSR
##    <chr>           <int>    <dbl>  <dbl>
##  1 SR Taylor          39     33.1   96.7
##  2 MM Lanning         53     29.3  102. 
##  3 EJ Villani         32     28.2   94.8
##  4 D van Niekerk      41     27.3   88.2
##  5 SJ Taylor          46     26.7  100. 
##  6 SW Bates           35     26.1   99.8
##  7 AC Jayangani       41     25.5   94.7
##  8 Bismah Maroof      52     24.5   83.0
##  9 DJS Dottin         38     24    109. 
## 10 CM Edwards         44     23.7   94.1
## # … with 20 more rows

4b. Rank women T20 bowlers

library(yorkr)
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20WomenMatches"
odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20WomenBattingBowlingDetails"

rankT20Bowlers(dir=dir,odir=odir,minMatches=30)
## # A tibble: 20 x 4
##    bowler           matches totalWickets meanER
##    <chr>              <int>        <dbl>  <dbl>
##  1 A Shrubsole           50           76   5.95
##  2 Nida Dar              50           59   5.99
##  3 KH Brunt              49           57   5.93
##  4 JL Jonassen           50           55   5.31
##  5 EA Perry              51           52   5.67
##  6 S Ismail              50           52   5.40
##  7 ML Schutt             39           50   6.17
##  8 D van Niekerk         39           47   5.45
##  9 D Hazell              35           44   4.95
## 10 NR Sciver             44           43   6.30
## 11 JL Gunn               30           41   6.14
## 12 A Mohammed            43           41   5.80
## 13 M Kapp                31           39   5.08
## 14 Asmavia Iqbal         33           36   6.39
## 15 Sana Mir              46           36   5.85
## 16 HASD Siriwardene      35           33   6.31
## 17 EA Osborne            30           31   5.62
## 18 S Luus                37           29   7.13
## 19 KDU Prabodhani        33           25   4.87
## 20 Bismah Maroof         35           22   6.49

Conclusion

While I have just shown how to use a small subset of functions, you can use the entire set of yorkr functions to analyze individual matches, head-2-head confrontation of two teams, performance of a teams against all other teams and finally performance of individual batsmen and bowlers in women’s ODI and T20 games.

You may also like

  1. Understanding Neural Style Transfer with Tensorflow and Keras
  2. Using Reinforcement Learning to solve Gridworld
  3. Big Data-4: Webserver log analysis with RDDs, Pyspark, SparkR and SparklyR
  4. Cricpy takes a swing at the ODIs
  5. GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables
  6. Cricketr adds team analytics to its repertoire!!!
  7. Deep Learning from first principles in Python, R and Octave – Part 8
  8. Natural language processing: What would Shakespeare say?
  9. Simulating an Edge Shape in Android

To see all posts click Index of posts

My book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

‘Cricket analytics with cricketr and cricpy – Analytics harmony with R and Python’ is now available on Amazon in both paperback ($21.99) and kindle ($9.99/Rs 449) versions. The book includes analysis of cricketers using both my R package ‘cricketr’ and my python package ‘cricpy’ for all formats of the game namely Test, ODI and T20. Both packages use data from ESPN Cricinfo Statsguru. The paperback is available on Amazon for $21.99 and the kindle version is available for $9.99/Rs 449

Pick up your copy today!

The book includes the following chapters

CONTENTS

Introduction 7
1. Cricket analytics with cricketr 9
1.1. Introducing cricketr! : An R package to analyze performances of cricketers 10
1.2. Taking cricketr for a spin – Part 1 48
1.2. cricketr digs the Ashes! 69
1.3. cricketr plays the ODIs! 97
1.4. cricketr adapts to the Twenty20 International! 139
1.5. Sixer – R package cricketr’s new Shiny avatar 168
1.6. Re-introducing cricketr! : An R package to analyze performances of cricketers 178
1.7. cricketr sizes up legendary All-rounders of yesteryear 233
1.8. cricketr flexes new muscles: The final analysis 277
1.9. The Clash of the Titans in Test and ODI cricket 300
1.10. Analyzing performances of cricketers using cricketr template 338
2. Cricket analytics with cricpy 352
2.1 Introducing cricpy:A python package to analyze performances of cricketers 353
2.2 Cricpy takes a swing at the ODIs 405
Analysis of Top 4 batsman 448
2.3 Cricpy takes guard for the Twenty20s 449
2.4 Analyzing batsmen and bowlers with cricpy template 490
9. Average runs against different opposing teams 493
3. Other cricket posts in R 500
3.1 Analyzing cricket’s batting legends – Through the mirage with R 500
3.2 Mirror, mirror … the best batsman of them all? 527
4. Appendix 541
Cricket analysis with Machine Learning using Octave 541
4.1 Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid 542
4.2 Informed choices through Machine Learning-2 Pitting together Kumble, Kapil, Chandra 555
Further reading 569
Important Links 570

Also see
1. My book “Deep Learning from first principles” now on Amazon
2. Practical Machine Learning with R and Python – Part 1
3. Revisiting World Bank data analysis with WDI and gVisMotionChart
4. Natural language processing: What would Shakespeare say?
5. Optimal Cloud Computing
6. Pitching yorkpy … short of good length to IPL – Part 1
7. Computer Vision: Ramblings on derivatives, histograms and contours

To see all posts click Index of posts

The Clash of the Titans in Test and ODI cricket

Who looks outside, dreams; who looks inside, awakes.
Show me a sane man and I will cure him for you.

            Carl Jung 

 

We’re made of star stuff. We are a way for the cosmos to know itself.
If you want to make an apple pie from scratch, you must first create the universe.

            Carl Sagan

Introduction

The biggest nag in the collective psyche of cricketing fraternity these days, is whether Virat Kohli has surpassed Sachin Tendulkar. This question has been troubling cricket lovers the world over and particularly in India, for quite a while. This nagging question has only grown stronger with Kohli’s 41st ODI century and with Michael Vaughan bestowing the GOAT title to Virat Kohli for ODI cricket. Hence, I decided to do my bit in addressing this, by doing analysis of Kohli’s and Tendulkar’s performance in ODI cricket. I also wanted to address the the best among the cricketing idols of India in Test cricket, namely Sunil Gavaskar, Sachin Tendulkar and Virat Kohli. Hence this post has 2 parts

  1. Analysis of Tendulkar, Gavaskar and Kohli in Test cricket
  2. Analysis of Tendulkar and Kohli in ODIs

In this post, I analyze the performances of these titans in Test and ODI cricket using my R package cricketr. While some may feel that comparisons are not possible as these batsmen are from different eras. To some extent this is true. I would give some leeway to Gavaskar as he had to bat in a pre-helmet era. But with Tendulkar and Kohli a fair and objective comparison is possible. There were pre-eminient bowlers in the times of Tendulkar as there are now.

From the analysis below, it can be seen that Tendulkar is ahead  of everybody else in Test cricket. However it must be noted that Tendulkar’s performance deteriorated towards the end of his career. Such was not the case with Gavaskar. Kohli has some catching up to do and he still has a lot of Test cricket in him.

In ODI Kohli can be seen to pulling ahead of Tendulkar in several aspects.

My R package cricketr can be installed directly from CRAN and you can use it analyze cricketers.

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!!  See Cricketr adds team analytics to its repertoire!!!

Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players

Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Note 1: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr templatefrom Github (which is the R Markdown file I have used for the analysis below).

Note 2: I sprinkle the charts with my observations. Feel free to look at them more closely and come to your conclusions.

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post Introducing cricpy:A python package to analyze performances of cricketers

1 Load the cricketr package

if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

A Test cricket  – Analysis of Gavaskar, Tendulkar and Kohli

2. Get player data

tendulkar <- getPlayerData(35320,dir=".",file="tendulkar.csv",type="batting")
kohli <- getPlayerData(253802,dir=".",file="kohli.csv",type="batting")
gavaskar <- getPlayerData(28794,dir=".",file="gavaskar.csv",type="batting")

3a. Basic analyses for Tendulkar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()

3b Basic analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohli.csv","Kohli")
batsmanMeanStrikeRate("./kohli.csv","Kohli")
batsmanRunsRanges("./kohli.csv","Kohli")
dev.off()

3c Basic analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./gavaskar.csv","Gavaskar")
batsmanMeanStrikeRate("./gavaskar.csv","Gavaskar")
batsmanRunsRanges("./gavaskar.csv","Gavaskar")
dev.off()

4a.More analyses for Tendulkar

It can be seen that Tendulkar and Gavaskar has been bowled more often than Kohli. Also Kohli does not have as many sixes in Test cricket as Tendulkar and Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")
dev.off()

4b. More analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanDismissals("./kohli.csv","Kohli")
dev.off()

4c More analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gavaskar.csv","Gavaskar")
batsman6s("./gavaskar.csv","Gavaskar")
batsmanDismissals("./gavaskar.csv","Gavaskar")
dev.off()

5 Performance of batsmen on different grounds

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsGround("./kohli.csv","Kohli")
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")

a

#dev.off()

6. Performance if batsmen against different Opposition

  1. Tendulkar averages 50 against the following countries – Australia, Bangladesh, England, Sri Lanka, West Indies and Zimbabwe
  2. Kohli average almost 50 against all the nations he has played – Australia, Bangladesh, England, New Zealand, Sri Lanka and West Indies
  3. Gavaskar averages 50 against Australia, Pakistan, West Indies, Sri Lanka
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohli.csv","Kohli")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")

7. Get player data special

This is required for the next 2 function calls

tendulkarsp <- getPlayerDataSp(35320,tdir=".",tfile="tendulkarsp.csv",ttype="batting")
kohlisp <- getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
gavaskarsp <- getPlayerDataSp(28794,tdir=".",tfile="gavaskarsp.csv",ttype="batting")

#dev.off()

8 Get contribution of batsmen in matches won and lost

Kohli contribution has had an equal contribution in won and lost matches. Tendulkar’s runs seem to have not helped in winning as much as only 50% of matches he has played have been won

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("./kohlisp.csv","Kohli")
batsmanContributionWonLost("./gavaskarsp.csv","Gavaskar")
  

a

9 Performance of batsmen at home and overseas

The boxplots show that Kohli performs better overseas than at home. The 3rd quartile is higher, though the median seems to lower overseas. For Tendulkar the performance is similar both ways. Gavaskar’s median runs scored overseas is higher.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))


batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("./kohlisp.csv","Kohli")
batsmanPerfHomeAway("./gavaskarsp.csv","Gavaskar")

10. Moving average of runs

Gavaskar’s moving average was very good at the time of his retirement. Kohli seems to be going very strong. Tendulkar’s performance shows signs of deterioration around the time of his retirement.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

#dev.off()

11 Boxplot and histogram of runs

Kohli has a marginally higher average (50.69) than Tendulkar (48.65) while Gavaskar 46. The median runs are same for Tendulkar and Kohli at 32

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfBoxHist("./kohli.csv","Kohli")
batsmanPerfBoxHist("./gavaskar.csv","Gavaskar")

12 Cumulative average Runs for batsmen

Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohli.csv","Kohli")
batsmanCumulativeAverageRuns("./gavaskar.csv","Gavaskar")

13. Cumulative average strike rate of batsmen

Tendulkar’s strike rate is better than Kohli and Gavaskar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohli.csv","Kohli")
batsmanCumulativeStrikeRate("./gavaskar.csv","Gavaskar")

14 Performance forecast of batsmen

The forecasted performance for Kohli and Gavaskar is higher than that of Tendulkar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kohli.csv","Kohli")
batsmanPerfForecast("./gavaskar.csv","Gavaskar")

#dev.off()

15. Relative strike rate of batsmen

par(mar=c(4,4,2,2))

frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanSR(frames,names)
#dev.off()

16. Relative Runs frequency of batsmen

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeRunsFreqPerf(frames,names)
#dev.off()

17. Relative cumulative average runs of batsmen

Tendulkar leads the way here, but it can be seem Kohli catching up.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

18. Relative cumulative average strike rate

Tendulkar has better strike rate than the other two.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

19. Check batsman in form

As in the moving average and performance forecast and cumulative average runs, Kohli and Gavaskar are in-form while Tendulkar was out-of-form towards the end.

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## [1] "**************************** Form status of Sachin Tendulkar ****************************
\n\n Population size: 294  Mean of population: 50.48 \n Sample size: 33  Mean of sample: 32.42 SD of 
sample: 29.8 \n\n Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval 
of population average\n Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below 
the 95% confidence interval of population average\n\n 
Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 117
  Mean of population: 50.35 \n Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n Null 
hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n 
Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population
 average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## [1] "**************************** Form status of Gavaskar ****************************\n\n 
Population size: 125  Mean of population: 44.67 \n Sample size: 14  Mean of sample: 57.86 SD of sample:
 58.55 \n\n Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval of population
 average\n Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence interval of 
population average\n\n Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater 
than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

20. Performance 3D

A 3D regression plane is fitted between the the Balls faced, Minutes at crease and Runs scored

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Sachin Tendulkar")
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./gavaskar.csv","Gavaskar")
#dev.off()

20. Runs likelihood

This functions computes the K-Means and determines the runs the batsmen are likely to score.

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 25.08 % likelihood that Tendulkar  will make  66 Runs in  122 balls over  167  Minutes 
## There is a 58.41 % likelihood that Tendulkar  will make  16 Runs in  31 balls over 44  Minutes
batsmanRunsLikelihood("./kohli.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 20 % likelihood that Kohli  will make  143 Runs in  232 balls over 330  Minutes 
## There is a 33.85 % likelihood that Kohli  will make  51 Runs in  92 balls over  127  Minutes 
## There is a 46.15 % likelihood that Kohli  will make  11 Runs in  24 balls over 31  Minutes
batsmanRunsLikelihood("./gavaskar.csv","Gavaskar")
## Summary of  Gavaskar 's runs scoring likelihood
## **************************************************
## 
## There is a 33.81 % likelihood that Gavaskar  will make  69 Runs in  159 balls over 214  Minutes 
## There is a 8.63 % likelihood that Gavaskar  will make  172 Runs in  364 balls over  506  Minutes 
## There is a 57.55 % likelihood that Gavaskar  will make  13 Runs in  35 balls over 48  Minutes

21. Predict runs for a random combination of Balls faced and runs scored

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)
batsmen <-cbind(round(tendulkar$Runs),round(kohli$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli Gavaskar
## 1          10           30         7     6        4
## 2          38           71        23    24       17
## 3          66          111        39    42       30
## 4          94          152        54    60       43
## 5         121          193        70    78       56
## 6         149          234        86    96       69
## 7         177          274       102   114       82
## 8         205          315       118   132       95
## 9         233          356       134   150      108
## 10        261          396       150   168      121
## 11        289          437       165   186      134
## 12        316          478       181   204      147
## 13        344          519       197   222      160
## 14        372          559       213   240      173
## 15        400          600       229   258      186
#dev.off()

Key findings

  1. Kohli has a marginally higher average than Tendulkar
  2. Tendulkar has the best strike rate of all the 3.
  3. The cumulative average runs and the performance forecast for Kohli and Gavaskar show an improving trend, while Tendulkar’s numbers deteriorate towards the end of his career
  4. Kohli is fast catching up Tendulkar on cumulative average runs vs innings in career.

B ODI Cricket – Analysis of Tendulkar and Kohli

The functions below get the ODI data for Tendulkar and Kohli as CSV files so that the analyses can be done

22 Get player data for ODIs

tendulkarOD <- getPlayerDataOD(35320,dir=".",file="tendulkarOD.csv",type="batting")
kohliOD <- getPlayerDataOD(253802,dir=".",file="kohliOD.csv",type="batting")

#dev.off()

23a Basic performance of Tendulkar in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkarOD.csv","Tendulkar")
batsmanRunsRanges("./tendulkarOD.csv","Tendulkar")
batsman4s("./tendulkarOD.csv","Tendulkar")
batsman6s("./tendulkarOD.csv","Tendulkar")
batsmanScoringRateODTT("./tendulkarOD.csv","Tendulkar")
#dev.off()

23b. Basic performance of Kohli in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohliOD.csv","Kohli")
batsmanRunsRanges("./kohliOD.csv","Kohli")
batsman4s("./kohliOD.csv","Kohli")
batsman6s("./kohliOD.csv","Kohli")
batsmanScoringRateODTT("./kohliOD.csv","Kohli")
#dev.off()

24. Performance forecast in ODIs

Kohli’s forecasted runs are much higher than Tendulkar’s in ODIs

par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkarOD.csv","Tendulkar")
batsmanPerfForecast("./kohliOD.csv","Kohli")

25. Batting performance

A 3D regression plane is fitted between Balls faced, Minutes at crease and Runs scored.

par(mar=c(4,4,2,2))
battingPerf3d("./tendulkarOD.csv","Tendulkar")
battingPerf3d("./kohliOD.csv","Kohli")

26. Predicting runs scored for the ODI batsmen

Kohli will score runs than Tendulkar for the same minutes at crease and balls faced.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)
tendulkarDF <- batsmanRunsPredict("./tendulkarOD.csv","Tendulkar",newdataframe=newDF)
kohliDF <- batsmanRunsPredict("./kohliOD.csv","Kohli",newdataframe=newDF)
batsmen <-cbind(round(tendulkarDF$Runs),round(kohliDF$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli
## 1          10           30         7     8
## 2          31           51        26    28
## 3          52           72        45    48
## 4          73           93        64    68
## 5          94          114        83    88
## 6         116          136       102   108
## 7         137          157       121   128
## 8         158          178       140   149
## 9         179          199       159   169
## 10        200          220       178   189

27. Runs likelihood for the ODI batsmen

Tendulkar has clusters around 13, 53 and 111 runs while Kohli has clusters around 13, 63,116. So it more likely that Kohli will tend to score higher

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkarOD.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 18.09 % likelihood that Tendulkar  will make  111 Runs in  118 balls over 172  Minutes 
## There is a 28.39 % likelihood that Tendulkar  will make  53 Runs in  63 balls over  95  Minutes 
## There is a 53.52 % likelihood that Tendulkar  will make  13 Runs in  18 balls over 27  Minutes
batsmanRunsLikelihood("./kohliOD.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 31.41 % likelihood that Kohli  will make  63 Runs in  69 balls over 97  Minutes 
## There is a 49.74 % likelihood that Kohli  will make  13 Runs in  18 balls over  24  Minutes 
## There is a 18.85 % likelihood that Kohli  will make  116 Runs in  113 balls over 163  Minutes

28. Runs in different venues for the ODI batsmen

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsGround("./kohliOD.csv","Kohli")

28. Runs against different opposition for the ODI batsmen

Tendulkar’s has 50+ average against Bermuda, Kenya and Namibia. While Kohli has a 50+ average against New Zealand, West Indies, South Africa, Zimbabwe and Bangladesh

par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohliOD.csv","Kohli")

29. Moving average of runs for the ODI batsmen

Tendulkar’s moving average shows an improvement (50+) towards the end of his career, but Kohli shows a marked increase 60+ currently

par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkarOD.csv","Tendulkar")
batsmanMovingAverage("./kohliOD.csv","Kohli")

30. Cumulative average runs of ODI batsmen

Tendulkar plateaus at 40+ while Kohli’s cumulative average runs goes up and up!!!

par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohliOD.csv","Kohli")

31 Cumulative strike rate of ODI batsmen

par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohliOD.csv","Kohli")

32. Relative batsmen strike rate

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanSRODTT(frames,names)
#dev.off()

33. Relative Run Frequency percentages

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeRunsFreqPerfODTT(frames,names)
#dev.off()

34. Relative cumulative average runs of ODI batsmen

Kohli breaks away from Tendulkar in cumulative average runs after 100 innings

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

35. Relative cumulative strike rate of ODI batsmen

This seems to be tussle with Kohli having an edge till about 40 innings and then from 40+ to 180 innings Tendulkar leads. Kohli just seems to be edging forward.

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

36. Batsmen 4s and 6s

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
batsman4s6s(frames,names)
##                Tendulkar Kohli
## Runs(1s,2s,3s)     66.29 69.67
## 4s                 29.65 25.90
## 6s                  4.06  4.43
#dev.off()

37. Check ODI batsmen form

par(mar=c(4,4,2,2))

checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## [1] "**************************** Form status of Tendulkar ********
********************\n\n Population size: 294  Mean of population: 50.48 \n
 Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 \n\n 
Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence
 interval of population average\n Alternative hypothesis 
Ha : Tendulkar 's sample average is below the 95% confidence interval 
of population average\n\n Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ***********
*****************\n\n Population size: 117  Mean of population: 50.35 \n
 Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n 
Null hypothesis H0 : Kohli 's sample average is within 95% confidence 
interval of population average\n Alternative hypothesis 
Ha : Kohli 's sample average is below the 95% confidence interval 
of population average\n\n Kohli 's Form Status: In-Form because 
the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

Key Findings

  1. Kohli has a better performance against oppositions like West Indies, South Africa and New Zealand
  2. Kohli breaks away from Tendulkar in cumulative average runs
  3. Tendulkar has been leading the strike rate rate but Kohli in recent times seems to be breaking loose.

Check out some other players with my R package cricketr

Important note: Do check out my other posts using cricketr at cricketr-posts

Also see

  1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
  2. A primer on Qubits, Quantum gates and Quantum Operations
  3. De-blurring revisited with Wiener filter using OpenCV
  4. Deep Learning from first principles in Python, R and Octave – Part 4
  5. The Many Faces of Latency
  6. Fun simulation of a Chain in Android
  7. Presentation on Wireless Technologies – Part 1
  8. yorkr crashes the IPL party ! – Part 1

To see all posts click Index of posts

Analyzing batsmen and bowlers with cricpy template

Introduction

This post shows how you can analyze batsmen and bowlers of Test, ODI and T20s using cricpy templates, using data from ESPN Cricinfo.

The cricpy package

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli  etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below

1. For Test players use batting and bowling.
2. For ODI use batting and bowling
3. For T20 use T20 Batting T20 Bowling

Please mindful of the  ESPN Cricinfo Terms of Use

My posts on Cripy were
a. Introducing cricpy:A python package to analyze performances of cricketers
b. Cricpy takes a swing at the ODIs
c. Cricpy takes guard for the Twenty20s

You can clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template. You can uncomment the functions and use them.

Cricpy can now analyze performances of teams in Test, ODI and T20 cricket see Cricpy adds team analytics to its arsenal!!

This post is also hosted on Rpubs at Int

The cricpy package is now available with pip install cricpy!!!

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
##   from pandas.core import datetools

2. Invoking functions with Python package cricpy

import cricpy.analytics as ca 
#ca.batsman4s("aplayer.csv","A Player")

3. Getting help from cricpy – Python

import cricpy.analytics as ca
#help(ca.getPlayerData)

The details below will introduce the different functions that are available in cricpy.

4. Get the player data for a player using the function getPlayerData()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

4a. For Test players

import cricpy.analytics as ca
#player1 =ca.getPlayerData(profileNo1,dir="..",file="player1.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#player1 =ca.getPlayerData(profileNo2,dir="..",file="player2.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])

4b. For ODI players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataOD(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataOD(profileNo2,dir="..",file="player2.csv",type="batting"")

4c For T20 players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataTT(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataTT(profileNo2,dir="..",file="player2.csv",type="batting"")

5 A Player’s performance – Basic Analyses

The 3 plots below provide the following for Rahul Dravid

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
#ca.batsmanRunsFreqPerf("aplayer.csv","A Player")
#ca.batsmanMeanStrikeRate("aplayer.csv","A Player")
#ca.batsmanRunsRanges("aplayer.csv","A Player") 

6. More analyses

This gives details on the batsmen’s 4s, 6s and dismissals

import cricpy.analytics as ca
#ca.batsman4s("aplayer.csv","A Player")
#ca.batsman6s("aplayer.csv","A Player") 
#ca.batsmanDismissals("aplayer.csv","A Player")
# The below function is for ODI and T20 only
#ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")  

7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer.csv","A Player")

8. Average runs at different venues

The plot below gives the average runs scored at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
#ca.batsmanAvgRunsGround("aplayer.csv","A Player")

9. Average runs against different opposing teams

This plot computes the average runs scored against different countries.

import cricpy.analytics as ca
#ca.batsmanAvgRunsOpposition("aplayer.csv","A Player")

10. Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman.

import cricpy.analytics as ca
#ca.batsmanRunsLikelihood("aplayer.csv","A Player")

11. A look at the Top 4 batsman

Choose any number of players

1.Player1 2.Player2 3.Player3 …

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
#ca.batsmanPerfBoxHist("aplayer001.csv","A Player001")
#ca.batsmanPerfBoxHist("aplayer002.csv","A Player002")
#ca.batsmanPerfBoxHist("aplayer003.csv","A Player003")
#ca.batsmanPerfBoxHist("aplayer004.csv","A Player004")

13. Get Player Data special

import cricpy.analytics as ca
#player1sp = ca.getPlayerDataSp(profile1,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp = ca.getPlayerDataSp(profile2,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp = ca.getPlayerDataSp(profile3,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp = ca.getPlayerDataSp(profile4,tdir=".",tfile="player4sp.csv",ttype="batting")

14. Contribution to won and lost matches

Note:This can only be used for Test matches

import cricpy.analytics as ca
#ca.batsmanContributionWonLost("player1sp.csv","A Player001")
#ca.batsmanContributionWonLost("player2sp.csv","A Player002")
#ca.batsmanContributionWonLost("player3sp.csv","A Player003")
#ca.batsmanContributionWonLost("player4sp.csv","A Player004")

15. Performance at home and overseas

Note:This can only be used for Test matches This function also requires the use of getPlayerDataSp() as shown above

import cricpy.analytics as ca
#ca.batsmanPerfHomeAway("player1sp.csv","A Player001")
#ca.batsmanPerfHomeAway("player2sp.csv","A Player002")
#ca.batsmanPerfHomeAway("player3sp.csv","A Player003")
#ca.batsmanPerfHomeAway("player4sp.csv","A Player004")

16 Moving Average of runs in career

import cricpy.analytics as ca
#ca.batsmanMovingAverage("aplayer001.csv","A Player001")
#ca.batsmanMovingAverage("aplayer002.csv","A Player002")
#ca.batsmanMovingAverage("aplayer003.csv","A Player003")
#ca.batsmanMovingAverage("aplayer004.csv","A Player004")

17 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career.

import cricpy.analytics as ca
#ca.batsmanCumulativeAverageRuns("aplayer001.csv","A Player001")
#ca.batsmanCumulativeAverageRuns("aplayer002.csv","A Player002")
#ca.batsmanCumulativeAverageRuns("aplayer003.csv","A Player003")
#ca.batsmanCumulativeAverageRuns("aplayer004.csv","A Player004")

18 Cumulative Average strike rate of batsman in career

.

import cricpy.analytics as ca
#ca.batsmanCumulativeStrikeRate("aplayer001.csv","A Player001")
#ca.batsmanCumulativeStrikeRate("aplayer002.csv","A Player002")
#ca.batsmanCumulativeStrikeRate("aplayer003.csv","A Player003")
#ca.batsmanCumulativeStrikeRate("aplayer004.csv","A Player004")

19 Future Runs forecast

import cricpy.analytics as ca
#ca.batsmanPerfForecast("aplayer001.csv","A Player001")

20 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them.

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeAvgRuns(frames,names)

21 Plot of 4s and 6s

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.batsman4s6s(frames,names)

22. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeStrikeRate(frames,names)

23. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer001.csv","A Player001")
#ca.battingPerf3d("aplayer002.csv","A Player002")
#ca.battingPerf3d("aplayer003.csv","A Player003")
#ca.battingPerf3d("aplayer004.csv","A Player004")

24. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
#aplayer = ca.batsmanRunsPredict("aplayer.csv",newDF,"A Player")
#print(aplayer)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

25 Analysis of Top 3 wicket takers

Take any number of bowlers from either Test, ODI or T20

  1. Bowler1
  2. Bowler2
  3. Bowler3 …

26. Get the bowler’s data (Test)

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#abowler1 =ca.getPlayerData(profileNo1,dir=".",file="abowler1.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler2 =ca.getPlayerData(profileNo2,dir=".",file="abowler2.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler3 =ca.getPlayerData(profile3,dir=".",file="abowler3.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])

26b For ODI bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataOD(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataOD(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataOD(profile3,dir=".",file="abowler3.csv",type="bowling")

26c For T20 bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataTT(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataTT(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataTT(profile3,dir=".",file="abowler3.csv",type="bowling")

27. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
#ca.bowlerWktsFreqPercent("abowler1.csv","A Bowler1")
#ca.bowlerWktsFreqPercent("abowler2.csv","A Bowler2")
#ca.bowlerWktsFreqPercent("abowler3.csv","A Bowler3")

28. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken

import cricpy.analytics as ca
#ca.bowlerWktsRunsPlot("abowler1.csv","A Bowler1")
#ca.bowlerWktsRunsPlot("abowler2.csv","A Bowler2")
#ca.bowlerWktsRunsPlot("abowler3.csv","A Bowler3")

29 Average wickets at different venues

The plot gives the average wickets taken bat different venues.

import cricpy.analytics as ca
#ca.bowlerAvgWktsGround("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsGround("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsGround("abowler3.csv","A Bowler3")

30 Average wickets against different opposition

The plot gives the average wickets taken against different countries.

import cricpy.analytics as ca
#ca.bowlerAvgWktsOpposition("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsOpposition("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsOpposition("abowler3.csv","A Bowler3")

31 Wickets taken moving average

import cricpy.analytics as ca
#ca.bowlerMovingAverage("abowler1.csv","A Bowler1")
#ca.bowlerMovingAverage("abowler2.csv","A Bowler2")
#ca.bowlerMovingAverage("abowler3.csv","A Bowler3")

32 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgWickets("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgWickets("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgWickets("abowler3.csv","A Bowler3")

33 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgEconRate("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgEconRate("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgEconRate("abowler3.csv","A Bowler3")

34 Future Wickets forecast

import cricpy.analytics as ca
#ca.bowlerPerfForecast("abowler1.csv","A bowler1")

35 Get player data special

import cricpy.analytics as ca
#abowler1sp =ca.getPlayerDataSp(profile1,tdir=".",tfile="abowler1sp.csv",ttype="bowling")
#abowler2sp =ca.getPlayerDataSp(profile2,tdir=".",tfile="abowler2sp.csv",ttype="bowling")
#abowler3sp =ca.getPlayerDataSp(profile3,tdir=".",tfile="abowler3sp.csv",ttype="bowling")

36 Contribution to matches won and lost

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerContributionWonLost("abowler1sp.csv","A Bowler1")
#ca.bowlerContributionWonLost("abowler2sp.csv","A Bowler2")
#ca.bowlerContributionWonLost("abowler3sp.csv","A Bowler3")

37 Performance home and overseas

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerPerfHomeAway("abowler1sp.csv","A Bowler1")
#ca.bowlerPerfHomeAway("abowler2sp.csv","A Bowler2")
#ca.bowlerPerfHomeAway("abowler3sp.csv","A Bowler3")

38 Relative cumulative average economy rate of bowlers

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgEconRate(frames,names)

39 Relative Economy Rate against wickets taken

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlingER(frames,names)

40 Relative cumulative average wickets of bowlers in career

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgWickets(frames,names)

Clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template

Important note: Do check out my other posts using cricpy at cricpy-posts

Key Findings

Analysis of Top 4 batsman

Analysis of Top 3 bowlers

You may also like
1. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2. Presentation on ‘Evolution to LTE’
3. Stacks of protocol stacks – A primer
4. Taking baby steps in Lisp
5. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

To see all posts click Index of posts

Cricpy takes a swing at the ODIs

No computer has ever been designed that is ever aware of what it’s doing; but most of the time, we aren’t either.” Marvin Minksy

“The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague” Edgser Djikstra

Introduction

In this post, cricpy, the Python avatar of my R package cricketr, learns some new tricks to be able to handle ODI matches. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers

Cricpy uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Cricpy can now analyze performances of teams in Test, ODI and T20 cricket see Cricpy adds team analytics to its arsenal!!

This post is also hosted on Rpubs at Int

To know how to use cricpy see Introducing cricpy:A python package to analyze performances of cricketers. To the original version of cricpy, I have added 3 new functions for ODI. The earlier functions work for Test and ODI.

This post is also hosted on Rpubs at Cricpy takes a swing at the ODIs. You can also down the pdf version of this post at cricpy-odi.pdf

You can fork/clone the package at Github cricpy

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

The cricpy package

The data for a particular player in ODI can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Virendar Sehwag, Chris Gayle etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohli’s profile is 253802. This can be used to get the data for Virat Kohlis shown below

The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the lanuguages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the package at Github cricpy

Note: The charts are self-explanatory and I have not added much of my owy interpretation to it. Do look at the plots closely and check out the performances for yourself.

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 

2. Invoking functions with Python package crlcpy

import cricpy.analytics as ca 
ca.batsman4s("./kohli.csv","Virat Kohli")

3. Getting help from cricpy – Python

import cricpy.analytics as ca 
help(ca.getPlayerDataOD)
## Help on function getPlayerDataOD in module cricpy.analytics:
## 
## getPlayerDataOD(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
##     Get the One day player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a .csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerDataOD(profile, opposition="",host="",dir = "../", file = "player001.csv", 
##     type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virender Sehwag this turns out to be http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
##     opposition      The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,Bermuda:12, England:1,Hong Kong:19,India:6,Ireland:29, Netherlands:15,New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Africa XI:405 Note: If no value is entered for opposition then all teams are considered
##     host            The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,Ireland:29,Malaysia:16,New Zealand:5,Pakistan:7, Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "../data". Default="../data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
##     result      
##     This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp getPlayerData
##     
##     Examples
##     
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     sehwag =getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag1.csv",
##     type="batting", homeOrAway=[1,2],result=[1,2,3,4])
##     
##     # Only away. Get data only for won and lost innings
##     sehwag = getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag2.csv",
##     type="batting",homeOrAway=[2],result=[1,2])
##     
##     # Get bowling data and store in file for future
##     malinga = getPlayerData(49758,dir="../cricketr/data",file="malinga1.csv",
##     type="bowling")
##     
##     # Get Dhoni's ODI record in Australia against Australua
##     dhoni = getPlayerDataOD(28081,opposition = 2,host=2,dir=".",
##     file="dhoniVsAusinAusOD",type="batting")
##     
##     ## End(Not run)

The details below will introduce the different functions that are available in cricpy.

4. Get the ODI player data for a player using the function getPlayerDataOD()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataOD for all subsequent analyses

import cricpy.analytics as ca
#sehwag=ca.getPlayerDataOD(35263,dir=".",file="sehwag.csv",type="batting")
#kohli=ca.getPlayerDataOD(253802,dir=".",file="kohli.csv",type="batting")
#jayasuriya=ca.getPlayerDataOD(49209,dir=".",file="jayasuriya.csv",type="batting")
#gayle=ca.getPlayerDataOD(51880,dir=".",file="gayle.csv",type="batting")

Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test & ODI records

5 Virat Kohli’s performance – Basic Analyses

The 3 plots below provide the following for Virat Kohli

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")

ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")

6. More analyses

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

ca.batsman6s("./kohli.csv","Virat Kohli")

ca.batsmanDismissals("./kohli.csv","Virat Kohli")

ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")


7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

Average runs at different venues

The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")

9. Average runs against different opposing teams

This plot computes the average runs scored by Kohli against different countries.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")

10 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")

A look at the Top 4 batsman – Kohli, Jayasuriya, Sehwag and Gayle

The following batsmen have been very prolific in ODI cricket and will be used for the analyses

  1. Virat Kohli: Runs – 10232, Average:59.83 ,Strike rate-92.88
  2. Sanath Jayasuriya : Runs – 13430, Average:32.36 ,Strike rate-91.2
  3. Virendar Sehwag :Runs – 8273, Average:35.05 ,Strike rate-104.33
  4. Chris Gayle : Runs – 9727, Average:37.12 ,Strike rate-85.82

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")

ca.batsmanPerfBoxHist("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanPerfBoxHist("./gayle.csv","Chris Gayle")

ca.batsmanPerfBoxHist("./sehwag.csv","Virendar Sehwag")

13 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Kohli’s performance has been steadily improving over the years, so has Sehwag. Gayle seems to be on the way down

import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")

ca.batsmanMovingAverage("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanMovingAverage("./gayle.csv","Chris Gayle")

ca.batsmanMovingAverage("./sehwag.csv","Virendar Sehwag")

14 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Kohli seems to be getting better with time and reaches a cumulative average of 45+. Sehwag improves with time and reaches around 35+. Chris Gayle drops from 42 to 35

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeAverageRuns("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeAverageRuns("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeAverageRuns("./sehwag.csv","Virendar Sehwag")

15 Cumulative Average strike rate of batsman in career

Sehwag has the best strike rate of almost 90. Kohli and Jayasuriya have a cumulative strike rate of 75.

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeStrikeRate("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeStrikeRate("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeStrikeRate("./sehwag.csv","Virendar Sehwag")

16 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman . It can be seen that Virat Kohli towers above all others in the runs. He is followed by Chris Gayle and then Sehwag

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percentages for each 10 run bucket. The plot below show Sehwag has the best strike rate, followed by Jayasuriya

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

18. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

ca.battingPerf3d("./jayasuriya.csv","Sanath jayasuriya")

ca.battingPerf3d("./gayle.csv","Chris Gayle")

ca.battingPerf3d("./sehwag.csv","Virendar Sehwag")

3D plot of Runs vs Balls Faced and Minutes at Crease

From the plot below it can be seen that Sehwag has more runs by way of 4s than 1’s,2’s or 3s. Gayle and Jayasuriya have large number of 6s

import cricpy.analytics as ca
frames = ["./sehwag.csv","./kohli.csv","./gayle.csv","./jayasuriya.csv"]
names = ["Sehwag","Kohli","Gayle","Jayasuriya"]
ca.batsman4s6s(frames,names)

20. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
##             BF        Mins        Runs
## 0    10.000000   30.000000    6.807407
## 1    37.857143   70.714286   36.034833
## 2    65.714286  111.428571   65.262259
## 3    93.571429  152.142857   94.489686
## 4   121.428571  192.857143  123.717112
## 5   149.285714  233.571429  152.944538
## 6   177.142857  274.285714  182.171965
## 7   205.000000  315.000000  211.399391
## 8   232.857143  355.714286  240.626817
## 9   260.714286  396.428571  269.854244
## 10  288.571429  437.142857  299.081670
## 11  316.428571  477.857143  328.309096
## 12  344.285714  518.571429  357.536523
## 13  372.142857  559.285714  386.763949
## 14  400.000000  600.000000  415.991375

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

21 Analysis of Top Bowlers

The following 4 bowlers have had an excellent career and will be used for the analysis

  1. Muthiah Muralitharan:Wickets: 534, Average = 23.08, Economy Rate – 3.93
  2. Wasim Akram : Wickets: 502, Average = 23.52, Economy Rate – 3.89
  3. Shaun Pollock: Wickets: 393, Average = 24.50, Economy Rate – 3.67
  4. Javagal Srinath : Wickets:315, Average – 28.08, Economy Rate – 4.44

How do Muralitharan, Akram, Pollock and Srinath compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

22. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#akram=ca.getPlayerDataOD(43547,dir=".",file="akram.csv",type="bowling")
#murali=ca.getPlayerDataOD(49636,dir=".",file="murali.csv",type="bowling")
#pollock=ca.getPlayerDataOD(46774,dir=".",file="pollock.csv",type="bowling")
#srinath=ca.getPlayerDataOD(34105,dir=".",file="srinath.csv",type="bowling")

23. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./murali.csv","M Muralitharan")

ca.bowlerWktsFreqPercent("./akram.csv","Wasim Akram")

ca.bowlerWktsFreqPercent("./pollock.csv","Shaun Pollock")

ca.bowlerWktsFreqPercent("./srinath.csv","J Srinath")

24. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken. Murali’s median runs for wickets ia around 40 while Akram, Pollock and Srinath it is around 32+ runs. The spread around the median is larger for these 3 bowlers in comparison to Murali

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./murali.csv","M Muralitharan")

ca.bowlerWktsRunsPlot("./akram.csv","Wasim Akram")

ca.bowlerWktsRunsPlot("./pollock.csv","Shaun Pollock")

ca.bowlerWktsRunsPlot("./srinath.csv","J Srinath")

25 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsGround("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsGround("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsGround("./srinath.csv","J Srinath")

26 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsOpposition("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsOpposition("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsOpposition("./srinath.csv","J Srinath")

27 Wickets taken moving average

From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets

import cricpy.analytics as ca
ca.bowlerMovingAverage("./murali.csv","M Muralitharan")

ca.bowlerMovingAverage("./akram.csv","Wasim Akram")

ca.bowlerMovingAverage("./pollock.csv","Shaun Pollock")

ca.bowlerMovingAverage("./srinath.csv","J Srinath")

28 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. Muralitharan has consistently taken wickets at an average of 1.6 wickets per game. Shaun Pollock has an average of 1.5

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgWickets("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgWickets("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgWickets("./srinath.csv","J Srinath")

29 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. Pollock is the most economical, followed by Akram and then Murali

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgEconRate("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgEconRate("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgEconRate("./srinath.csv","J Srinath")

30 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate shows that Pollock is the most economical of the 4 bowlers. He is followed by Akram and then Murali

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

31 Relative Economy Rate against wickets taken

Pollock is most economical vs number of wickets taken. Murali has the best figures for 4 wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlingER(frames,names)

32 Relative cumulative average wickets of bowlers in career

The plot below shows that McGrath has the best overall cumulative average wickets. While the bowlers are neck to neck around 130 innings, you can see Muralitharan is most consistent and leads the pack after 150 innings in the number of wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

33. Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing

  1. Kohli is a mean run machine and has been consistently piling on runs. Clearly records will lay shattered in days to come for Kohli
  2. Virendar Sehwag has the best strike rate of the 4, followed by Jayasuriya and then Kohli
  3. Shaun Pollock is the most economical of the bowlers followed by Wasim Akram
  4. Muralitharan is the most consistent wicket of the lot.

Important note: Do check out my other posts using cricpy at cricpy-posts

Also see
1. Architecting a cloud based IP Multimedia System (IMS)
2. Exploring Quantum Gate operations with QCSimulator
3. Dabbling with Wiener filter using OpenCV
4. Deep Learning from first principles in Python, R and Octave – Part 5
5. Big Data-2: Move into the big league:Graduate from R to SparkR
6. Singularity
7. Practical Machine Learning with R and Python – Part 4
8. Literacy in India – A deepR dive
9. Modeling a Car in Android

To see all posts click Index of Posts

 

yorkr is generic!

The features and functionality in my yorkr package is now complete. My R package yorkr, is totally generic, which means that the R package  can be used for all ODI, T20 matches. Hence yorkr can be used for professional or amateur ODI and T20 matches. The R package can be used for both men and women ODI, T20 international or domestic matches. The main requirement is, that the match data  be created as a Yaml file in the format Cricsheet (Required yaml format for the match data).

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

$4.99/Rs 320 and $6.99/Rs448 respectively

 

I have successfully used my R functions for the Indian Premier League (IPL) matches with changes only to the convertAllYamlFiles2RDataFramesXX (please see posts below)

The convertAllYamlFiles2RDataframes &convertAllYamlFiles2RDataFramesT20 will have to be customized for the names of the teams playing in the domestic professional or amateur matches. All other classes of functions namely Class1, Class2, Class 3 and Class 4 as discussed in my post Introducing cricket package yorkr-Part 1: Beaten by sheer pace can be used as is without any changes.

There are numerous professional & amateur T20 matches that are played around the world. Here are a list of domestic T20 tournaments that are played around the world (from Wikipedia). The yorkr package can be used for any of these matches once the match data is saved as yaml as mentioned above.

So do go ahead and have fun, analyzing cricket performances with yorkr!

Please take a look at my posts on how to use yorkr for ODI, Twenty20 matches.

  1. Introducing cricket package yorkr:Part 1- Beaten by sheer pace!
    2. Introducing cricket package yorkr:Part 2- Trapped leg before wicket!
    3.  Introducing cricket package yorkr:Part 3- foxed by flight!
    4. Introducing cricket package yorkr:Part 4-In the block hole!
    5. yorkr pads up for the Twenty20s: Part 1- Analyzing team”s match performance
    6. yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams
    7. yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!
    8. yorkr pads up for Twenty20s:Part 4- Individual batting and bowling performances!
    9. yorkr crashes the IPL party ! – Part 1
    10. yorkr crashes the IPL party! – Part 2
    11. yorkr crashes the IPL party! – Part 3
    12. yorkr crashes the IPL party! – Part 4
    13. yorkr ranks IPL batsmen and bowlers
    14. yorkr ranks T20 batsmen and bowlers
    15. yorkr ranks ODI batsmen and bowlers

yorkr ranks ODI batsmen and bowlers

This is the last and final post in which yorkr ranks ODI batsmen and bowlers. These are based on match data from Cricsheet. The ranking is done on

  1. average runs and average strike rate for batsmen and
  2. average wickets and average economy rate for bowlers.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

nd $4.99/Rs 320 and $6.99/Rs448 respectively

 

This post has also been published in RPubs RankODIPlayers. You can download this as a pdf file at RankODIPlayers.pdf.

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

You can take a look at the code at rankODIPlayers (available in yorkr_0.0.5)

rm(list=ls())
library(yorkr)
library(dplyr)
source("rankODIBatsmen.R")
source("rankODIBowlers.R")

Rank ODI batsmen

The top 3 ODI batsmen are hashim Amla (SA), Matther Hayden(Aus) & Virat Kohli (Ind) . Note: For ODI a a cutoff of at least 50 matches played was chosen.

ODIBatsmanRank <- rankODIBatsmen()
as.data.frame(ODIBatsmanRank[1:30,])
##            batsman matches meanRuns    meanSR
## 1          HM Amla     185 51.96216  84.15508
## 2        ML Hayden      79 50.08861  81.20646
## 3          V Kohli     279 48.51971  78.55197
## 4   AB de Villiers     253 47.93676  95.05561
## 5     SR Tendulkar     151 45.82119  79.62311
## 6         S Dhawan     116 45.03448  81.54043
## 7         V Sehwag     167 44.49102 106.27563
## 8          JE Root     111 43.64865  81.66054
## 9        Q de Kock      85 43.61176  82.55235
## 10       IJL Trott     113 43.36283  70.69761
## 11   KC Sangakkara     293 42.81911  75.10420
## 12      TM Dilshan     283 41.76678  89.70360
## 13   KS Williamson     146 41.24658  73.49267
## 14   S Chanderpaul      93 40.07527  70.59613
## 15        HH Gibbs      75 40.00000  79.03813
## 16     Salman Butt      57 39.85965  59.29807
## 17    Anamul Haque      58 39.72414  56.45224
## 18      RT Ponting     238 38.88235  71.94294
## 19       JH Kallis     136 38.77941  67.17794
## 20        MS Dhoni     328 38.57927  90.30555
## 21      MJ Guptill     199 38.54774  73.88090
## 22       DA Warner     138 38.52174  87.24978
## 23 Mohammad Yousuf      94 38.44681  72.69851
## 24        JD Ryder      66 38.40909  91.29667
## 25       GJ Bailey     133 38.38346  75.74519
## 26       G Gambhir     209 37.83254  75.15483
## 27      AJ Strauss     122 37.80328  71.54844
## 28       MJ Clarke     301 37.67442  69.78415
## 29       SR Watson     274 37.08029  83.46489
## 30        AJ Finch     103 36.36893  79.49845

Rank ODI bowlers

The top 3 ODI bowlers are R J Harris (Aus), MJ Henry(NZ) and MA Starc(Aus). Mohammed Shami is 4th and Amit Mishra is 8th A cutoff of 20 matches was considered for bowlers

ODIBowlersRank <- rankODIBowlers()
## [1] 35072     3
## [1] "C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches"
as.data.frame(ODIBowlersRank[1:30,])
##               bowler matches meanWickets   meanER
## 1  Mustafizur Rahman      56    4.000000 4.293214
## 2           JH Davey      53    3.528302 4.455094
## 3          RJ Harris      94    3.276596 4.361489
## 4           MA Starc     208    3.144231 4.425865
## 5           MJ Henry      88    3.125000 4.961250
## 6         A Flintoff     139    2.956835 4.283022
## 7           A Mishra     106    2.886792 4.365849
## 8     Mohammed Shami     144    2.777778 5.609306
## 9     MJ McClenaghan     165    2.751515 5.640424
## 10          CJ McKay     230    2.704348       NA
## 11       MF Maharoof     114    2.701754 4.427018
## 12       Imran Tahir     156    2.660256 4.461923
## 13        BAW Mendis     234    2.641026 4.532308
## 14     RK Kleinveldt      54    2.629630 4.306667
## 15      Arafat Sunny      62    2.612903 4.103226
## 16         JE Taylor     156    2.602564 5.115192
## 17           AJ Hall      55    2.600000 3.879091
## 18        WD Parnell     129    2.596899 5.477597
## 19         CR Woakes     129    2.596899 5.340620
## 20      DE Bollinger     152    2.592105 4.282763
## 21        Wahab Riaz     206    2.567961 5.431748
## 22        PJ Cummins     148    2.567568 5.715405
## 23         R Rampaul     173    2.549133 4.726590
## 24      Taskin Ahmed      56    2.535714 5.325357
## 25          DW Steyn     292    2.534247 4.534007
## 26      JR Hazlewood      64    2.531250 4.392500
## 27        Abdur Rauf      84    2.523810 4.786667
## 28           SW Tait     141    2.517730 5.173191
## 29      Hamid Hassan     106    2.509434 4.686038
## 30        SL Malinga     419    2.498807 4.968974

Hope you have fun with my yorkr package.!

Important note: Do check out my other posts using yorkr at yorkr-posts