yorkr ranks T20 batsmen and bowlers

Here is another short post which ranks T20 batsmen and bowlers. These are based on match data from Cricsheet. The ranking is done on

  1. average runs and average strike rate for batsmen and
  2. average wickets and average economy rate for bowlers.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

.99/Rs 320 and $6.99/Rs448 respectively

 

This post has also been published in RPubs RankT20Players. You can download this as a pdf file at RankT20Players.pdf.

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

You can take a look at the code at rankT20Players (available in yorkr_0.0.5)

rm(list=ls())
library(yorkr)
library(dplyr)
source("rankT20Batsmen.R")
source("rankT20Bowlers.R")

Rank T20 batsmen

Virat Kohli (Ind), Chris Gayle (WI) and Kevin Pietersen (Eng) top the T20 rankings. Virat Kohli stands tall among the batsmen with a average of 39.1935, followed by Chris Gayle who has an average of 32.69 and finally Kevin Pietersen.

Note: For T20 a cutoff of at least 30 matches played was chosen.

T20BatsmanRank <- rankT20Batsmen()
as.data.frame(T20BatsmanRank[1:30,])
##             batsman matches meanRuns   meanSR
## 1           V Kohli      31 39.19355 128.8371
## 2          CH Gayle      43 32.69767 119.6467
## 3      KP Pietersen      37 32.43243 138.6732
## 4     KS Williamson      31 32.25806 130.1255
## 5  Mohammad Shahzad      33 31.66667 115.4582
## 6       BB McCullum      69 30.98551 126.0610
## 7        MJ Guptill      54 30.83333 120.0669
## 8          AD Hales      37 30.75676 115.3511
## 9       H Masakadza      38 29.26316 109.6182
## 10         GC Smith      32 27.59375 114.1831
## 11        DA Warner      56 27.53571 123.2209
## 12        JP Duminy      58 26.84483 117.3952
## 13 DPMD Jayawardene      51 26.47059 112.4257
## 14        SR Watson      50 26.30000 118.9464
## 15    KC Sangakkara      52 26.23077 112.4665
## 16       TM Dilshan      66 26.18182 102.5683
## 17         SK Raina      43 25.90698 124.3044
## 18        RG Sharma      41 25.68293 123.3983
## 19        G Gambhir      36 25.66667 117.5764
## 20     Yuvraj Singh      41 25.12195 119.5846
## 21    Misbah-ul-Haq      32 25.09375 106.6762
## 22       EJG Morgan      52 24.71154 121.1462
## 23       MN Samuels      40 24.35000 105.8547
## 24       MEK Hussey      30 24.03333 129.1250
## 25    Ahmed Shehzad      41 23.82927 100.8805
## 26  Shakib Al Hasan      40 23.35000 109.3798
## 27          HM Amla      30 23.33333 111.2513
## 28         CL White      45 22.73333       NA
## 29      LMP Simmons      33 22.54545       NA
## 30       Umar Akmal      69 22.20290 108.3590

Rank T20 bowlers

The top 3 T20 bowlers are BAW Mendis (SL) Umar Gul (Pak) and Steyn(SA). R Ashwin is 13th. As with batsmen, a minimum of 30 matches played was taken into consideration.

T20BowlersRank <- rankT20Bowlers()
as.data.frame(T20BowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       BAW Mendis      36   1.6944444 6.581111
## 2         Umar Gul      57   1.5964912 7.306842
## 3         DW Steyn      38   1.5526316 6.407632
## 4      Saeed Ajmal      63   1.4920635 6.316190
## 5       SL Malinga      59   1.4576271 7.163898
## 6       TG Southee      37   1.4054054 8.840000
## 7       MG Johnson      30   1.4000000 7.080667
## 8         GP Swann      38   1.3947368 6.576842
## 9      JW Dernbach      33   1.3636364 8.550303
## 10        M Morkel      39   1.3333333 7.384872
## 11 Shakib Al Hasan      37   1.2972973 6.648649
## 12       SP Narine      32   1.2500000 5.757812
## 13        R Ashwin      33   1.2424242 7.247273
## 14 KMDN Kulasekara      42   1.2380952 6.938095
## 15       SCJ Broad      55   1.2363636 7.832182
## 16      WD Parnell      34   1.2058824 8.227941
## 17        KD Mills      41   1.1951220 8.077317
## 18      DL Vettori      34   1.1470588 5.708235
## 19   Shahid Afridi      85   1.1294118 6.748000
## 20       SR Watson      44   1.1136364 8.015227
## 21   Sohail Tanvir      48   1.1041667 7.354167
## 22   Sohail Tanvir      48   1.1041667 7.354167
## 23     NL McCullum      56   1.0535714 7.246964
## 24     NLTC Perera      34   1.0294118 8.916471
## 25         J Botha      39   1.0256410 6.647436
## 26        DJ Bravo      45   1.0222222 8.630000
## 27   Mohammad Nabi      32   0.9687500 7.208437
## 28       DJG Sammy      55   0.8909091 7.899818
## 29 Mohammad Hafeez      56   0.8392857 6.996964
## 30      AD Mathews      44   0.7954545 6.827727

Conclusion

Conclusion

As expected Virat Kohli stands head and shoulders above the rest. Hamid Hasan and Mohammed Shami figuring the top T20 bowlers was a bit of a surprise to me.

Important note: Do check out my other posts using yorkr at yorkr-posts

Watch this space!

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  3. yorkr crashes the IPL party !Part 1
  4. Introducing cricketr! : An R package to analyze performances of cricketers
  5. Cricket analytics with cricketr in paperback and Kindle versions

yorkr ranks IPL batsmen and bowlers

Here is a short post which ranks IPL batsmen and bowlers. These are based on match data from Cricsheet. Ranking batsmen and bowlers in IPL is more challenging as the players can belong to different teams in different years. Hence I create a combined data frame of the batsmen and bowlers regardless of their IPL teams and calculate a) average runs and average strike rate for batsmen and c) average wickets and d) average economy rate for bowlers.

I will be doing this ranking for T20 and ODI batting and bowling performances shortly.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This post has also been published in RPubs RankIPLPlayers. You can download this as a pdf file at RankIPLPlayers.pdf.

You can take a look at the code at rankIPLPlayers (should be available in yorkr_0.0.5)

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

The results are slightly surprising

rm(list=ls())
library(yorkr)
library(dplyr)
setwd("C:/software/cricket-package/cricsheet/cleanup/IPL/rank")
source("rankIPLBatsmen.R")
source("rankIPLBowlers.R")

Rank IPL batsmen

Chris Gayle, MEK Hussey and Shane Watson are top 3 IPL batsmen. Gayle towers over the others in mean runs and mean strike rate. Surprisingly Ajinkya Rahane is the top Indian T20 batsman, if we leave out Sachin Tendulkar (who tops India yet again!). The other top IPL T20 batsmen are Raina, Gambhir, Rohit Sharma in that order. Virat Kohli comes a distant 14th.

iplBatsmanRank <- rankIPLBatsmen()
as.data.frame(iplBatsmanRank[1:30,])
##             batsman matches meanRuns    meanSR
## 1          CH Gayle     128 40.00781 144.92188
## 2        MEK Hussey      64 33.57812 107.23500
## 3         SR Watson      75 31.46667 129.97733
## 4      SR Tendulkar     127 29.74803 108.86622
## 5         AM Rahane      77 29.14286 101.40065
## 6         DA Warner     134 29.10448 118.38313
## 7         JP Duminy      94 28.77660 124.61702
## 8          SK Raina     128 28.62500 122.12656
## 9         G Gambhir     210 28.13810 108.78090
## 10        RG Sharma     181 28.07182 118.57801
## 11         DR Smith      78 27.82051 119.64462
## 12      BB McCullum      98 27.81633 114.91255
## 13         S Dhawan     109 27.74312 112.21000
## 14          V Kohli     188 27.56915 113.81261
## 15   AB de Villiers     150 27.46000 136.70860
## 16         R Dravid     104 27.02885 107.78923
## 17        JH Kallis     167 26.54491  94.65641
## 18         V Sehwag     174 26.39655 140.29011
## 19       RV Uthappa     166 26.27711 120.48506
## 20       SC Ganguly      86 25.98837  96.39849
## 21     AC Gilchrist      81 25.77778 122.69074
## 22    KC Sangakkara      70 25.67143 112.97529
## 23         MS Dhoni     119 25.29412 130.99832
## 24       TM Dilshan      82 24.13415 101.12634
## 25          M Vijay      96 23.92708 102.01771
## 26        AT Rayudu     146 23.63014 117.91000
## 27 DPMD Jayawardene     109 22.95413 110.73862
## 28        MK Pandey     105 22.71429        NA
## 29     Yuvraj Singh     112 22.48214 114.51018
## 30      S Badrinath      66 22.22727 114.97061

Rank IPL bowlers

The top 3 IPL T20 bowlers are SL Malinga,SP Narine and DJ Bravo.

Don’t get hung up on the decimals in the average wickets for the bowlers. All it implies is that if 2 bowlers have average wickets of 1.0 and 1.5, it implies that in 2 matches the 1st bowler will take 2 wickets and the 2nd bowler will take 3 wickets.

iplBowlersRank <- rankIPLBowlers()
as.data.frame(iplBowlersRank[1:30,])
##             bowler matches meanWickets   meanER
## 1       SL Malinga      96    1.645833 6.545208
## 2        SP Narine      54    1.555556 5.967593
## 3         DJ Bravo      58    1.517241 7.929310
## 4         M Morkel      37    1.405405 7.626216
## 5        IK Pathan      40    1.400000 7.579250
## 6         RP Singh      42    1.357143 7.966429
## 7         MM Patel      31    1.354839 7.282581
## 8  Shakib Al Hasan      32    1.343750 6.911250
## 9    R Vinay Kumar      63    1.317460 8.342540
## 10       MM Sharma      46    1.304348 7.740652
## 11         P Awana      33    1.303030 8.325758
## 12        MM Patel      30    1.300000 7.569667
## 13          Z Khan      41    1.292683 7.735854
## 14        A Mishra      43    1.255814 7.226512
## 15         PP Ojha      53    1.245283 7.268679
## 16     JP Faulkner      40    1.225000 8.502250
## 17     DS Kulkarni      32    1.156250 8.372188
## 18        UT Yadav      46    1.152174 8.394783
## 19        A Kumble      41    1.146341 6.567073
## 20       JA Morkel      73    1.136986 8.131370
## 21        SK Warne      53    1.132075 7.277170
## 22 Harbhajan Singh     107    1.102804 7.014953
## 23        L Balaji      34    1.088235 7.186176
## 24        R Ashwin      92    1.065217 6.812391
## 25        AR Patel      31    1.064516 7.137097
## 26  M Muralitharan      39    1.051282 6.470256
## 27         P Kumar      36    1.027778 8.148056
## 28       PP Chawla      85    1.023529 8.017765
## 29       SR Watson      67    1.014925 7.695224
## 30        DJ Bravo      30    1.000000 7.966333

Conclusion: The results are somewhat surprising. The ranking was based on data from Cricsheet. The data in this site are available from 2008-2015. I hope to do this ranking for T20 and ODIs shortly

Important note: Do check out my other posts using yorkr at yorkr-posts

Watch this space!

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance.
  3. yorkr crashes the IPL party !Part 1
  4. Introducing cricketr! : An R package to analyze performances of cricketers
  5. Cricket analytics with cricketr in paperback and Kindle versions

Introducing cricket package yorkr:Part 4-In the block hole!

Introduction

“The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.”

“If you wish to make an apple pie from scratch, you must first invent the universe.”

“We are like butterflies who flutter for a day and think it is forever.”

“The absence of evidence is not the evidence of absence.”

“We are star stuff which has taken its destiny into its own hands.”

                              Cosmos - Carl Sagan

This post is the 4th and possibly, the last part of my introduction, to my latest cricket package yorkr. This is the 4th part of the introduction, the 3 earlier ones were

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. Introducing cricket package yorkr: Part 2-Trapped leg before wicket!
  3. Introducing cricket package yorkr: Part 3-Foxed by flight!

The 1st part included functions dealing with a specific match, the 2nd part dealt with functions between 2 opposing teams. The 3rd part dealt with functions between a team and all matches with all oppositions. This 4th part includes individual batting and bowling performances in ODI matches and deals with Class 4 functions.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

d $4.99/Rs 320 and $6.99/Rs448 respectively

 

This post has also been published at RPubs yorkr-Part4 and can also be downloaded as a PDF document from yorkr-Part4.pdf.

You can clone/fork the code for the package yorkr from Github at yorkr-package

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Important note 1: Do check out all the posts on the python avatar of yorkr, namely ‘yorkpy’ in my post ‘Pitching yorkpy … short of good length to IPL – Part 1

Batsman functions

  1. batsmanRunsVsDeliveries
  2. batsmanFoursSixes
  3. batsmanDismissals
  4. batsmanRunsVsStrikeRate
  5. batsmanMovingAverage
  6. batsmanCumulativeAverageRuns
  7. batsmanCumulativeStrikeRate
  8. batsmanRunsAgainstOpposition
  9. batsmanRunsVenue
  10. batsmanRunsPredict

Bowler functions

  1. bowlerMeanEconomyRate
  2. bowlerMeanRunsConceded
  3. bowlerMovingAverage
  4. bowlerCumulativeAvgWickets
  5. bowlerCumulativeAvgEconRate
  6. bowlerWicketPlot
  7. bowlerWicketsAgainstOpposition
  8. bowlerWicketsVenue
  9. bowlerWktsPredict

Note: The yorkr package in its current avatar only supports ODI, T20 and IPL T20 matches.

library(yorkr)
library(gridExtra)
library(rpart.plot)
library(dplyr)
library(ggplot2)
rm(list=ls())

A. Batsman functions

1. Get Team Batting details

The function below gets the overall team batting details based on the RData file available in ODI matches. This is currently also available in Github at (https://github.com/tvganesh/yorkrData/tree/master/ODI/ODI-matches).  However you may have to do this as future matches are added! The batting details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
india_details <- getTeamBattingDetails("India",dir=".", save=TRUE)
dim(india_details)
## [1] 11085    15
sa_details <- getTeamBattingDetails("South Africa",dir=".",save=TRUE)
dim(sa_details)
## [1] 6375   15
nz_details <- getTeamBattingDetails("New Zealand",dir=".",save=TRUE)
dim(nz_details)
## [1] 6262   15
eng_details <- getTeamBattingDetails("England",dir=".",save=TRUE)
dim(eng_details)
## [1] 9001   15

2. Get batsman details

This function is used to get the individual batting record for a the specified batsmen of the country as in the functions below. For analyzing the batting performances the following cricketers have been chosen

  1. Virat Kohli (Ind)
  2. M S Dhoni (Ind)
  3. AB De Villiers (SA)
  4. Q De Kock (SA)
  5. J Root (Eng)
  6. M J Guptill (NZ)
setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".")
## [1] "./India-BattingDetails.RData"
dhoni <- getBatsmanDetails(team="India",name="Dhoni")
## [1] "./India-BattingDetails.RData"
devilliers <-  getBatsmanDetails(team="South Africa",name="Villiers",dir=".")
## [1] "./South Africa-BattingDetails.RData"
deKock <-  getBatsmanDetails(team="South Africa",name="Kock",dir=".")
## [1] "./South Africa-BattingDetails.RData"
root <-  getBatsmanDetails(team="England",name="Root",dir=".")
## [1] "./England-BattingDetails.RData"
guptill <-  getBatsmanDetails(team="New Zealand",name="Guptill",dir=".")
## [1] "./New Zealand-BattingDetails.RData"

3. Runs versus deliveries

Kohli, De Villiers and Guptill have a good cluster of points that head towards 150 runs at 150 deliveries.

p1 <-batsmanRunsVsDeliveries(kohli,"Kohli")
p2 <- batsmanRunsVsDeliveries(dhoni, "Dhoni")
p3 <- batsmanRunsVsDeliveries(devilliers,"De Villiers")
p4 <- batsmanRunsVsDeliveries(deKock,"Q de Kock")
p5 <- batsmanRunsVsDeliveries(root,"JE Root")
p6 <- batsmanRunsVsDeliveries(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsVsDeliveries-1

4. Batsman Total runs, Fours and Sixes

The plots below show the total runs, fours and sixes by the batsmen

kohli46 <- select(kohli,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(kohli46,"Kohli")
dhoni46 <- select(dhoni,batsman,ballsPlayed,fours,sixes,runs)
p2 <- batsmanFoursSixes(dhoni46,"Dhoni")
devilliers46 <- select(devilliers,batsman,ballsPlayed,fours,sixes,runs)
p3 <- batsmanFoursSixes(devilliers46, "De Villiers")
deKock46 <- select(deKock,batsman,ballsPlayed,fours,sixes,runs)
p4 <- batsmanFoursSixes(deKock46,"Q de Kock")
root46 <- select(root,batsman,ballsPlayed,fours,sixes,runs)
p5 <- batsmanFoursSixes(root46,"JE Root")
guptill46 <- select(guptill,batsman,ballsPlayed,fours,sixes,runs)
p6 <- batsmanFoursSixes(guptill46,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

foursSixes-1

5. Batsman dismissals

The type of dismissal for each batsman is shown below

p1 <-batsmanDismissals(kohli,"Kohli")
p2 <- batsmanDismissals(dhoni, "Dhoni")
p3 <- batsmanDismissals(devilliers, "De Villiers")
p4 <- batsmanDismissals(deKock,"Q de Kock")
p5 <- batsmanDismissals(root,"JE Root")
p6 <- batsmanDismissals(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

dismissal-1

6. Runs versus Strike Rate

De villiers has the best strike rate among all as there are more points to the right side of the plot for the same runs. Kohli and Dhoni do well too. Q De Kock and Joe Root also have a very good spread of points though they have fewer innings.

p1 <-batsmanRunsVsStrikeRate(kohli,"Kohli")
p2 <- batsmanRunsVsStrikeRate(dhoni, "Dhoni")
p3 <- batsmanRunsVsStrikeRate(devilliers, "De Villiers")
p4 <- batsmanRunsVsStrikeRate(deKock,"Q de Kock")
p5 <- batsmanRunsVsStrikeRate(root,"JE Root")
p6 <- batsmanRunsVsStrikeRate(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

runsSR-1

7. Batsman moving average

Kohli’s average is on a gentle increase from below 50 to around 60’s. Joe Root performance is impressive with his moving average of late tending towards the 70’s. Q De Kock seemed to have a slump around 2015 but his performance is on the increase. Devilliers consistently averages around 50. Dhoni also has been having a stable run in the last several years.

p1 <-batsmanMovingAverage(kohli,"Kohli")
p2 <- batsmanMovingAverage(dhoni, "Dhoni")
p3 <- batsmanMovingAverage(devilliers, "De Villiers")
p4 <- batsmanMovingAverage(deKock,"Q de Kock")
p5 <- batsmanMovingAverage(root,"JE Root")
p6 <- batsmanMovingAverage(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

ma-1

8. Batsman cumulative average

The functions below provide the cumulative average of runs scored. As can be seen Kohli and Devilliers have a cumulative runs rate that averages around 48-50. Q De Kock seems to have had a rocky career with several highs and lows as the cumulative average oscillates between 45-40. Root steadily improves to a cumulative average of around 42-43 from his 50th innings

p1 <-batsmanCumulativeAverageRuns(kohli,"Kohli")
p2 <- batsmanCumulativeAverageRuns(dhoni, "Dhoni")
p3 <- batsmanCumulativeAverageRuns(devilliers, "De Villiers")
p4 <- batsmanCumulativeAverageRuns(deKock,"Q de Kock")
p5 <- batsmanCumulativeAverageRuns(root,"JE Root")
p6 <- batsmanCumulativeAverageRuns(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cAvg-1

9. Cumulative Average Strike Rate

The plots below show the cumulative average strike rate of the batsmen. Dhoni and Devilliers have the best cumulative average strike rate of 90%. The rest average around 80% strike rate. Guptill shows a slump towards the latter part of his career.

p1 <-batsmanCumulativeStrikeRate(kohli,"Kohli")
p2 <- batsmanCumulativeStrikeRate(dhoni, "Dhoni")
p3 <- batsmanCumulativeStrikeRate(devilliers, "De Villiers")
p4 <- batsmanCumulativeStrikeRate(deKock,"Q de Kock")
p5 <- batsmanCumulativeStrikeRate(root,"JE Root")
p6 <- batsmanCumulativeStrikeRate(guptill,"MJ Guptill")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cSR-1

10. Batsman runs against opposition

Kohli’s best performances are against Australia, West Indies and Sri Lanka

batsmanRunsAgainstOpposition(kohli,"Kohli")

runsOppn1-1

batsmanRunsAgainstOpposition(dhoni, "Dhoni")

runsOppn2-1

Kohli’s best performances are against Australia, Pakistan and West Indies

batsmanRunsAgainstOpposition(devilliers, "De Villiers")

runsOppn3-1

Quentin de Kock average almost 100 runs against India and 75 runs against England

batsmanRunsAgainstOpposition(deKock, "Q de Kock")

runsOppn4-1

Root’s best performances are against South Africa, Sri Lanka and West Indies

batsmanRunsAgainstOpposition(root, "JE Root")

runsOppn5-1

batsmanRunsAgainstOpposition(guptill, "MJ Guptill")

runsOppn6-1

11. Runs at different venues

The plots below give the performances of the batsmen at different grounds.

batsmanRunsVenue(kohli,"Kohli")

runsVenue1-1

batsmanRunsVenue(dhoni, "Dhoni")

runsVenue2-1

batsmanRunsVenue(devilliers, "De Villiers")

runsVenue3-1

batsmanRunsVenue(deKock, "Q de Kock")

runsVenue4-1

batsmanRunsVenue(root, "JE Root")

runsVenue5-1

batsmanRunsVenue(guptill, "MJ Guptill")

runsVenue6-1

12. Predict number of runs to deliveries

The plots below use rpart classification tree to predict the number of deliveries required to score the runs in the leaf node. For e.g. Kohli takes 66 deliveries to score 64 runs and for higher number of deliveries scores around 115 runs. Devilliers needs

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(kohli,"Kohli")
batsmanRunsPredict(dhoni, "Dhoni")
batsmanRunsPredict(devilliers, "De Villiers")

runsPredict1,runsVenue1-1

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsPredict(deKock,"Q de Kock")
batsmanRunsPredict(root,"JE Root")
batsmanRunsPredict(guptill,"MJ Guptill")

runsPredict2,runsVenue1-1

B. Bowler functions

13. Get bowling details

The function below gets the overall team bowling details based on the RData file available in ODI matches. This is currently also available in Github at (https://github.com/tvganesh/yorkrData/tree/master/ODI/ODI-matches). The bowling details of the team in each match is created and a huge data frame is created by rbinding the individual dataframes. This can be saved as a RData file

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
ind_bowling <- getTeamBowlingDetails("India",dir=".",save=TRUE)
dim(ind_bowling)
## [1] 7816   12
aus_bowling <- getTeamBowlingDetails("Australia",dir=".",save=TRUE)
dim(aus_bowling)
## [1] 9191   12
ban_bowling <- getTeamBowlingDetails("Bangladesh",dir=".",save=TRUE)
dim(ban_bowling)
## [1] 5665   12
sa_bowling <- getTeamBowlingDetails("South Africa",dir=".",save=TRUE)
dim(sa_bowling)
## [1] 3806   12
sl_bowling <- getTeamBowlingDetails("Sri Lanka",dir=".",save=TRUE)
dim(sl_bowling)
## [1] 3964   12

14. Get bowling details of the individual bowlers

This function is used to get the individual bowling record for a specified bowler of the country as in the functions below. For analyzing the bowling performances the following cricketers have been chosen

  1. R A Jadeja (Ind)
  2. Ravichander Ashwin (Ind)
  3. Mitchell Starc (Aus)
  4. Shakib Al Hasan (Ban)
  5. Ajantha Mendis (SL)
  6. Dale Steyn (SA)
jadeja <- getBowlerWicketDetails(team="India",name="Jadeja",dir=".")
ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".")
starc <-  getBowlerWicketDetails(team="Australia",name="Starc",dir=".")
shakib <-  getBowlerWicketDetails(team="Bangladesh",name="Shakib",dir=".")
mendis <-  getBowlerWicketDetails(team="Sri Lanka",name="Mendis",dir=".")
steyn <-  getBowlerWicketDetails(team="South Africa",name="Steyn",dir=".")

15. Bowler Mean Economy Rate

Shakib Al Hassan is expensive in the 1st 3 overs after which he is very economical with a economy rate of 3-4. Starc, Steyn average around a ER of 4.0

p1<-bowlerMeanEconomyRate(jadeja,"RA Jadeja")
p2<-bowlerMeanEconomyRate(ashwin, "R Ashwin")
p3<-bowlerMeanEconomyRate(starc, "MA Starc")
p4<-bowlerMeanEconomyRate(shakib, "Shakib Al Hasan")
p5<-bowlerMeanEconomyRate(mendis, "A Mendis")
p6<-bowlerMeanEconomyRate(steyn, "D Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanER-1

16. Bowler Mean Runs conceded

Ashwin is expensive around 6 & 7 overs

p1<-bowlerMeanRunsConceded(jadeja,"RA Jadeja")
p2<-bowlerMeanRunsConceded(ashwin, "R Ashwin")
p3<-bowlerMeanRunsConceded(starc, "M A Starc")
p4<-bowlerMeanRunsConceded(shakib, "Shakib Al Hasan")
p5<-bowlerMeanRunsConceded(mendis, "A Mendis")
p6<-bowlerMeanRunsConceded(steyn, "D Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

meanRunsConceded-1

17. Bowler Moving average

RA jadeja and Mendis’ performance has dipped considerably, while Ashwin and Shakib have improving performances. Starc average around 4 wickets

p1<-bowlerMovingAverage(jadeja,"RA Jadeja")
p2<-bowlerMovingAverage(ashwin, "Ashwin")
p3<-bowlerMovingAverage(starc, "M A Starc")
p4<-bowlerMovingAverage(shakib, "Shakib Al Hasan")
p5<-bowlerMovingAverage(mendis, "Ajantha Mendis")
p6<-bowlerMovingAverage(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

bowlerMA-1

17. Bowler cumulative average wickets

Starc is clearly the most consistent performer with 3 wickets on an average over his career, while Jadeja averages around 2.0. Ashwin seems to have dropped from 2.4-2.0 wickets, while Mendis drops from high 3.5 to 2.2 wickets. The fractional wickets only show a tendency to take another wicket.

p1<-bowlerCumulativeAvgWickets(jadeja,"RA Jadeja")
p2<-bowlerCumulativeAvgWickets(ashwin, "Ashwin")
p3<-bowlerCumulativeAvgWickets(starc, "M A Starc")
p4<-bowlerCumulativeAvgWickets(shakib, "Shakib Al Hasan")
p5<-bowlerCumulativeAvgWickets(mendis, "Ajantha Mendis")
p6<-bowlerCumulativeAvgWickets(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumWkts-1

18. Bowler cumulative Economy Rate (ER)

The plots below are interesting. All of the bowlers seem to average around 4.5 runs/over. RA Jadeja’s ER improves and heads to 4.5, Mendis is seen to getting more expensive as his career progresses. From a ER of 3.0 he increases towards 4.5

p1<-bowlerCumulativeAvgEconRate(jadeja,"RA Jadeja")
p2<-bowlerCumulativeAvgEconRate(ashwin, "Ashwin")
p3<-bowlerCumulativeAvgEconRate(starc, "M A Starc")
p4<-bowlerCumulativeAvgEconRate(shakib, "Shakib Al Hasan")
p5<-bowlerCumulativeAvgEconRate(mendis, "Ajantha Mendis")
p6<-bowlerCumulativeAvgEconRate(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

cumER-1

19. Bowler wicket plot

The plot below gives the average wickets versus number of overs

p1<-bowlerWicketPlot(jadeja,"RA Jadeja")
p2<-bowlerWicketPlot(ashwin, "Ashwin")
p3<-bowlerWicketPlot(starc, "M A Starc")
p4<-bowlerWicketPlot(shakib, "Shakib Al Hasan")
p5<-bowlerWicketPlot(mendis, "Ajantha Mendis")
p6<-bowlerWicketPlot(steyn, "Dale Steyn")
grid.arrange(p1,p2,p3,p4,p5,p6, ncol=3)

wktPlot-1

20. Bowler wicket against opposition

#Jadeja's' best pertformance are against England, Pakistan and West Indies
bowlerWicketsAgainstOpposition(jadeja,"RA Jadeja")

wktsOppn1-1

#Ashwin's bets pertformance are against England, Pakistan and South Africa
bowlerWicketsAgainstOpposition(ashwin, "Ashwin")

wktsOppn2-1

#Starc has good performances against India, New Zealand, Pakistan, West Indies
bowlerWicketsAgainstOpposition(starc, "M A Starc")

wktsOppn3-1

bowlerWicketsAgainstOpposition(shakib,"Shakib Al Hasan")

wktsOppn4-1

bowlerWicketsAgainstOpposition(mendis, "Ajantha Mendis")

wktsOppn5-1

#Steyn has good performances against India, Sri Lanka, Pakistan, West Indies
bowlerWicketsAgainstOpposition(steyn, "Dale Steyn")

wktsOppn6-1

21. Bowler wicket at cricket grounds

bowlerWicketsVenue(jadeja,"RA Jadeja")

wktsAve1-1

bowlerWicketsVenue(ashwin, "Ashwin")

wktsAve2-1

bowlerWicketsVenue(starc, "M A Starc")
## Warning: Removed 2 rows containing missing values (geom_bar).

wktsAve3-1

bowlerWicketsVenue(shakib,"Shakib Al Hasan")

wktsAve4-1

bowlerWicketsVenue(mendis, "Ajantha Mendis")

wktsAve5-1

bowlerWicketsVenue(steyn, "Dale Steyn")

wktsAve6-1

22. Get Delivery wickets for bowlers

Thsi function creates a dataframe of deliveries and the wickets taken

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-matches")
jadeja1 <- getDeliveryWickets(team="India",dir=".",name="Jadeja",save=FALSE)
ashwin1 <- getDeliveryWickets(team="India",dir=".",name="Ashwin",save=FALSE)
starc1 <- getDeliveryWickets(team="Australia",dir=".",name="MA Starc",save=FALSE)
shakib1 <- getDeliveryWickets(team="Bangladesh",dir=".",name="Shakib",save=FALSE)
mendis1 <- getDeliveryWickets(team="Sri Lanka",dir=".",name="Mendis",save=FALSE)
steyn1 <- getDeliveryWickets(team="South Africa",dir=".",name="Steyn",save=FALSE)

23. Predict number of deliveries to wickets

#Jadeja and Ashwin need around 22 to 28 deliveries to make a break through
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(jadeja1,"RA Jadeja")
bowlerWktsPredict(ashwin1,"RAshwin")

wktsPred1-1

#Starc and Shakib provide an early breakthrough producing a wicket in around 16 balls. Starc's 2nd wicket comed around the 30th delivery
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(starc1,"MA Starc")
bowlerWktsPredict(shakib1,"Shakib Al Hasan")

wktsPred2-1

#Steyn and Mendis take 20 deliveries to get their 1st wicket
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktsPredict(mendis1,"A Mendis")
bowlerWktsPredict(steyn1,"DSteyn")

wktsPred3-1

Conclusion

This concludes the 4 part introduction to my new R cricket package yorkr for ODIs. I will be enhancing the package to handle Twenty20 and IPL matches soon. You can fork/clone the code from Github at yorkr.

The yaml data from Cricsheet have already beeen converted into R consumable dataframes. The converted data can be downloaded from Github at yorkrData. There are 3 folders – ODI matches, ODI matches between 2 teams (oppnAllMatches), ODI matches between a team and the rest of the world (all matches,all oppositions).

As I have already mentioned I have around 67 functions for analysis, however I am certain that the data has a lot more secrets waiting to be tapped. So please do go ahead and run any machine learning or statistical learning algorithms on them. If you do come up with interesting insights, I would appreciate if attribute the source to Cricsheet(http://cricsheet.org), and my package yorkr and my blog Giga thoughts*, besides dropping me a note.

Hope you have a great time with my yorkr package!

Important note: Do check out my other posts using yorkr at yorkr-posts

Also see

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. Cricket analytics with cricketr in paperback and Kindle versions
  3. My TEDx talk on the “Internet of Things”
  4. Bend it like Bluemix,MongoDB with autoscaling – Part 1
  5. The mind of a programmer
  6. Fun simulation of a chain in Android
  7. Taking cricketr for a spin-Part 1
  8. Latency,throughput implications for the cloud
  9. Hand detection through haar-training: A hands-on approach
  10. Cricket analytics with cricketr

Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

“We need to regard statistical intuition with proper suspicion and replace impression formation by computation wherever possible”

“We are pattern seekers, believers in a coherent world”

“The hot hand is entirely in the eyes of the beholders, who are consistently” “too quick to perceive order and causality in randomeness. The hot hand is a” “massive and widespread cognitive illusion”

                   "Thinking, Fast and Slow - Daniel Kahneman"

Introduction

Yorker (noun) :A yorker is a bowling delivery in cricket, that pitches at or around the batsman’s toes. Also known as ‘toe crusher’

My package ‘yorkr’ is now available on CRAN. This package is based on data from Cricsheet. Cricsheet has the data of ODIs, Test, Twenty20 and IPL matches as yaml files. The yorkr package provides functions to convert the yaml files to more easily R consumable entities, namely dataframes. In fact all ODI matches have already been converted and are available for use at yorkrData. However as future matches are added to Cricsheet, you will have to convert the match files yourself. More details below.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

This post can be viewed at RPubs at yorkr-Part1 or can also be downloaded as a PDF document yorkr-1.pdf

Checkout my interactive Shiny apps GooglyPlus2021 (interactive plots ) and GooglyPlusPlus2021 (analysis in specific intervals) which can be used to analyze IPL players, teams and matches.

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Important note 1: Do check out all the posts on the python avatar of yorkr, namely ‘yorkpy’ in my post ‘Pitching yorkpy … short of good length to IPL – Part 1

1. First things first

  1. yorkr currently has a total 70 functions as of now. I have intentionally avoided abbreviating function names by dropping vowels, as is the usual practice in coding, because the resulting abbreviated names created would be very difficult to remember, and use. So instead of naming a function as tmBmenPrtshpOppnAllMtches(), I have used the longer form for e.g. teamBatsmenPartnershipOppnAllmatches(), which is much clearer. The longer form will be more intuitive. Moreover RStudio prompts the the different functions which have the same prefix and one does not need to type in the entire function name.
  2. The package yorkr has 4 classes of functions
  • Class 1- Team performances in a match
  • Class 2- Team performances in all matches against a single oppostion (e.g. all matches of India vs Australia or all matches of England vs Pakistan etc.)
  • Class 3- Team performance in all matches against all Opposition (India vs All,Pakistan vs All etc.)
  • Class 4- Individual performances of batsmen and bowlers

In this post I will be looking into Class 1 functions, namely the performances of opposing teams in a single match

The list of functions are

  1. teamBattingScorecardMatch()
  2. teamBatsmenPartnershipMatch()
  3. teamBatsmenVsBowlersMatch()
  4. teamBowlingScorecardMatch()
  5. teamBowlingWicketKindMatch()
  6. teamBowlingWicketRunsMatch()
  7. teamBowlingWicketRunsMatch()
  8. teamBowlingWicketMatch()
  9. teamBowlersVsBatsmenMatch()
  10. matchWormGraph()

2. Install the package from CRAN

library(yorkr)
rm(list=ls())

3. Convert and save yaml file to dataframe

This function will convert a yaml file in the format as specified in Cricsheet to dataframe. This will be saved as as RData file in the target directory. The name of the file wil have the following format team1-team2-date.RData. This is seen below.

convertYaml2RDataframe("225171.yaml","./source","./data")
## [1] "./source/225171.yaml"
## [1] "first loop"
## [1] "second loop"
setwd("./data")
dir()
## [1] "Australia-India-2012-02-12.RData"      
## [2] "Bangladesh-Zimbabwe-2009-10-27.RData"  
## [3] "convertedFiles.txt"                    
## [4] "England-New Zealand-2007-01-30.RData"  
## [5] "Ireland-England-2006-06-13.RData"      
## [6] "Pakistan-South Africa-2013-11-08.RData"
## [7] "Sri Lanka-West Indies-2011-02-06.RData"
setwd("..")

4. Convert and save all yaml files to dataframes

This function will convert all yaml files from a source directory to dataframes and save it in the target directory with the names as mentioned above.

convertAllYaml2RDataframes("./source",targetDirMen=".",targetDirWomen=".")
## [1] 1
## i= 1   file= ./source/225171.yaml 
## [1] "first loop"
## [1] "second loop"
## [1] 633  25

5. yorkrData – A Github repositiory

Cricsheet has ODI matches from 2006. There are a total of 1167 ODI matches(files) out of which 34 yaml files had format problems and were skipped. Incidentally I have already converted the 1133 yaml files in the ODI directory of Cricsheet to dataframes and saved then as RData. The rest of the yaml files ave already been converted to RData and are available for use. All the converted RData files can be accessed from my Github link yorkrData under the folder ODI-matches. You will need to use the functions to convert new match files, as they are added to Cricsheet. There is aslo a file named ‘convertedFiles’ which will have the name of the original file and the converted file as below

convertedFiles

  • 225171.yaml:Ireland-England-2006-06-13.RData
  • 225245.yaml:England-Pakistan-2006-08-30.RData
  • 225246.yaml:England-Pakistan-2006-09-02.RData …

You can download the the zip of the files and use it directly in the functions as follows

Note 1: The package in its current form handles ODIs,T20s and IPL T20 matches

Note 2: The link to the converted data frames have been provided above. The dataframes are around 600 rows x 25 columns. In this post I have created 10 functions that analyze team performances in a match. However you are free to slice and dice the dataframe in any way you like. If you do come up with interesting analyses, please do attribute the source of the data to Cricsheet, and my package yorkr and my blog. I would appreciate it if you could send me a note. .

6. Load the match data as dataframes

As mentioned above in this post I will using the functions from Class 1. For this post I will be using the match data from 5 random matches between 10 different opposing teams/countries. For this I will directly use the converted RData files rather than getting the data through the getMatchDetails()

With the RData we can load the data in 2 ways

A. With getMatchDetails()

  1. With getMatchDetails() using the 2 teams and the date on which the match occured
aus_ind <- getMatchDetails("Australia","India","2012-02-12",dir="./data")

or

B.Directly load RData into your code.

The match details will be loaded into a dataframe called ’overs’ which you can assign to a suitable name as below

The randomly selected matches are

  • Australia vs India – 2012-02-12, Adelaide
  • England vs New Zealand – 2007-01-30, Perth
  • Pakistan vs South Africa – 2013-07-08, UAE
  • Sri Lanka vs West Indioes -2011-02-06, Colombo(SSC)
  • Bangladesh vs Zimbabwe -2009-10-27, Dhaka

Directly load RData from file

load("./data/Australia-India-2012-02-12.RData")
aus_ind <- overs
load("./data/England-New Zealand-2007-01-30.RData")
eng_nz <- overs
load("./data/Pakistan-South Africa-2013-11-08.RData")
pak_sa <- overs
load("./data/Sri Lanka-West Indies-2011-02-06.RData")
sl_wi<- overs
load("./data/Bangladesh-Zimbabwe-2009-10-27.RData")
ban_zim <- overs

7. Team batting scorecard

Compute and display the batting scorecard of the teams in the match. The top batsmen in are G Gambhir(Ind), PJ Forrest(Aus), Q De Kock(SA) and KC Sangakkara(SL)

teamBattingScorecardMatch(aus_ind,'India')
## Total= 258
## Source: local data frame [8 x 5]
## 
##     batsman ballsPlayed fours sixes  runs
##      (fctr)       (int) (dbl) (dbl) (dbl)
## 1 G Gambhir         110     7     0    92
## 2  V Sehwag          20     3     0    20
## 3   V Kohli          28     1     0    18
## 4 RG Sharma          41     1     1    33
## 5  SK Raina          30     3     1    38
## 6  MS Dhoni          57     0     1    44
## 7 RA Jadeja           8     0     0    12
## 8  R Ashwin           2     0     0     1
teamBattingScorecardMatch(aus_ind,'Australia')
## Total= 260
## Source: local data frame [9 x 5]
## 
##        batsman ballsPlayed fours sixes  runs
##         (fctr)       (int) (dbl) (dbl) (dbl)
## 1    DA Warner          23     2     0    18
## 2   RT Ponting          13     1     0     6
## 3    MJ Clarke          43     5     0    38
## 4   PJ Forrest          83     5     2    66
## 5    DJ Hussey          76     5     0    72
## 6 DT Christian          36     2     0    39
## 7      MS Wade          17     1     0    16
## 8    RJ Harris           2     0     0     2
## 9     CJ McKay           3     0     0     3
teamBattingScorecardMatch(pak_sa,'South Africa')
## Total= 256
## Source: local data frame [7 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (dbl) (dbl) (dbl)
## 1      Q de Kock         132     9     1   112
## 2        HM Amla          50     6     0    46
## 3   F du Plessis          21     1     0    10
## 4 AB de Villiers          40     2     0    30
## 5      DA Miller           9     0     0     5
## 6      JP Duminy          20     1     1    25
## 7      R McLaren          21     3     1    28
teamBattingScorecardMatch(sl_wi,'Sri Lanka')
## Total= 261
## Source: local data frame [10 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (dbl) (dbl) (dbl)
## 1       WU Tharanga          50     5     0    39
## 2        TM Dilshan          27     2     1    30
## 3     KC Sangakkara         103     4     1    75
## 4  DPMD Jayawardene          52     2     0    44
## 5     CK Kapugedera          17     0     0    17
## 6    TT Samaraweera           7     0     0     4
## 7       NLTC Perera           8     0     0     6
## 8        AD Mathews          22     1     1    36
## 9      HMRKB Herath           4     0     0     2
## 10       BAW Mendis           6     1     0     8

8. Plot the team batting partnerships

The functions below plot the team batting partnetship in the match Note: Many of the plots include an additional parameters plot which is either TRUE or FALSE. The default value is plot=TRUE. When plot=TRUE the plot will be displayed. When plot=FALSE the data frame will be returned to the user. The user can use this to create an interactive chary using one of th epackages like rcharts, ggvis,googleVis or plotly.

teamBatsmenPartnershipMatch(pak_sa,"Pakistan","South Africa")

batsmenPartnership-1

teamBatsmenPartnershipMatch(eng_nz,"New Zealand","England",plot=TRUE)

batsmenPartnership-2

teamBatsmenPartnershipMatch(ban_zim,"Bangladesh","Zimbabwe",plot=FALSE)
##              batsman        nonStriker runs
## 1        Tamim Iqbal   Junaid Siddique    0
## 2        Tamim Iqbal Mohammad Ashraful    5
## 3    Junaid Siddique       Tamim Iqbal    0
## 4  Mohammad Ashraful       Tamim Iqbal    0
## 5  Mohammad Ashraful     Raqibul Hasan   20
## 6      Raqibul Hasan Mohammad Ashraful   13
## 7      Raqibul Hasan   Shakib Al Hasan    3
## 8    Shakib Al Hasan     Raqibul Hasan   12
## 9    Shakib Al Hasan   Mushfiqur Rahim    1
## 10   Mushfiqur Rahim   Shakib Al Hasan    1
## 11   Mushfiqur Rahim       Naeem Islam   30
## 12   Mushfiqur Rahim      Abdur Razzak    6
## 13   Mushfiqur Rahim      Dolar Mahmud   11
## 14   Mushfiqur Rahim     Rubel Hossain    8
## 15       Mahmudullah   Mushfiqur Rahim    4
## 16       Naeem Islam   Mushfiqur Rahim   21
## 17      Abdur Razzak   Mushfiqur Rahim    3
## 18      Dolar Mahmud   Mushfiqur Rahim   41
teamBatsmenPartnershipMatch(aus_ind,"India","Australia", plot=TRUE)

batsmenPartnership-3

9. Batsmen vs Bowler

The function below computes and plots the performances of the batsmen vs the bowlers. As before the plot parameter can be set to TRUE or FALSE. By default it is plot=TRUE

teamBatsmenVsBowlersMatch(pak_sa,'Pakistan',"South Africa", plot=TRUE)

batsmenVsBowler-1

teamBatsmenVsBowlersMatch(aus_ind,'Australia',"India",plot=TRUE)

batsmenVsBowler-2

teamBatsmenVsBowlersMatch(ban_zim,'Zimbabwe',"Bangladesh", plot=TRUE)

batsmenVsBowler-3

m <- teamBatsmenVsBowlersMatch(sl_wi,'West Indies',"Sri Lanka", plot=FALSE)
m
## Source: local data frame [35 x 3]
## Groups: batsman [?]
## 
##      batsman        bowler runsConceded
##       (fctr)        (fctr)        (dbl)
## 1   CH Gayle  CRD Fernando            0
## 2   DM Bravo  CRD Fernando           15
## 3   DM Bravo   NLTC Perera           21
## 4   DM Bravo    AD Mathews           10
## 5   DM Bravo    BAW Mendis           11
## 6   DM Bravo CK Kapugedera            1
## 7   DM Bravo    TM Dilshan            5
## 8   DM Bravo  HMRKB Herath           16
## 9  AB Barath   NLTC Perera            0
## 10 RR Sarwan  CRD Fernando            6
## ..       ...           ...          ...

10. Bowling Scorecard

This function provides the bowling performance, the number of overs bowled, maidens, runs conceded and wickets taken for each match

teamBowlingScorecardMatch(eng_nz,'England')
## Source: local data frame [6 x 5]
## 
##           bowler overs maidens  runs wickets
##           (fctr) (int)   (int) (dbl)   (dbl)
## 1    LE Plunkett     9       0    54       3
## 2    CT Tremlett    10       0    72       1
## 3     A Flintoff    10       0    66       0
## 4     MS Panesar    10       2    35       2
## 5  JWM Dalrymple     5       0    43       0
## 6 PD Collingwood     6       0    36       1
teamBowlingScorecardMatch(eng_nz,'New Zealand')
## Source: local data frame [6 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1 JEC Franklin     8       1    45       1
## 2      SE Bond    10       0    58       1
## 3     JDP Oram     5       0    23       0
## 4     JS Patel    10       0    53       1
## 5   DL Vettori    10       0    40       3
## 6  CD McMillan     7       1    38       2
teamBowlingScorecardMatch(aus_ind,'Australia')
## Source: local data frame [6 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1    RJ Harris    10       0    57       1
## 2     MA Starc     8       0    49       0
## 3     CJ McKay    10       1    53       3
## 4 DT Christian    10       0    45       0
## 5    DJ Hussey     3       0    13       0
## 6   XJ Doherty     9       0    51       2

11. Wicket Kind

The plots below provide the bowling kind of wicket taken by the bowler (caught, bowled, lbw etc.)

teamBowlingWicketKindMatch(aus_ind,"India","Australia")

bowlingWicketKind-1

teamBowlingWicketKindMatch(aus_ind,"Australia","India")

bowlingWicketKind-2

teamBowlingWicketKindMatch(pak_sa,"South Africa","Pakistan")

bowlingWicketKind-3

m <-teamBowlingWicketKindMatch(sl_wi,"Sri Lanka",plot=FALSE)
m
##           bowler wicketKind wicketPlayerOut runs
## 1   CRD Fernando     bowled        CH Gayle   45
## 2    NLTC Perera     caught       AB Barath   36
## 3   HMRKB Herath        lbw       RR Sarwan   54
## 4     BAW Mendis     caught   S Chanderpaul   46
## 5    NLTC Perera        lbw        DM Bravo   36
## 6    NLTC Perera     caught       DJG Sammy   36
## 7   CRD Fernando     caught        DJ Bravo   45
## 8     BAW Mendis     caught       NO Miller   46
## 9     BAW Mendis     caught        CS Baugh   46
## 10    BAW Mendis     caught         SJ Benn   46
## 11    AD Mathews   noWicket        noWicket   33
## 12 CK Kapugedera   noWicket        noWicket    7
## 13    TM Dilshan   noWicket        noWicket   25

12. Wicket vs Runs conceded

The plots below provide the wickets taken and the runs conceded by the bowler in the match

teamBowlingWicketRunsMatch(pak_sa,"Pakistan","South Africa")

wicketRuns-1

teamBowlingWicketRunsMatch(aus_ind,"Australia","India")

wicketRuns-2

m <-teamBowlingWicketRunsMatch(sl_wi,"West Indies","Sri Lanka", plot=FALSE)
m
## Source: local data frame [6 x 5]
## 
##      bowler overs maidens  runs wickets
##      (fctr) (int)   (int) (dbl)   (chr)
## 1 R Rampaul     5       0    44       1
## 2 DJG Sammy    10       1    61       1
## 3  DJ Bravo    10       0    58       3
## 4  CH Gayle    10       0    34       0
## 5   SJ Benn    10       1    38       4
## 6 NO Miller     5       0    35       0

13. Wickets taken by bowler

The plots provide the wickets taken by the bowler

m <-teamBowlingWicketMatch(eng_nz,'England',"New Zealand", plot=FALSE)
m
##           bowler wicketKind wicketPlayerOut runs
## 1    LE Plunkett        lbw      SP Fleming   54
## 2    LE Plunkett     caught       PG Fulton   54
## 3 PD Collingwood     caught     LRPL Taylor   36
## 4     MS Panesar    stumped     CD McMillan   35
## 5    LE Plunkett     caught       L Vincent   54
## 6     MS Panesar     caught     BB McCullum   35
## 7    CT Tremlett     caught    JEC Franklin   72
## 8     A Flintoff   noWicket        noWicket   66
## 9  JWM Dalrymple   noWicket        noWicket   43
teamBowlingWicketMatch(sl_wi,"Sri Lanka","West Indies")

bowlingWickets-1

teamBowlingWicketMatch(eng_nz,"New Zealand","England")

bowlingWickets-2

14. Bowler Vs Batsmen

The functions compute and display how the different bowlers of the country performed against the batting opposition.

teamBowlersVsBatsmenMatch(ban_zim,"Bangladesh","Zimbabwe")

bowlerVsBatsmen-1

teamBowlersVsBatsmenMatch(aus_ind,"India","Australia")

bowlerVsBatsmen-2

teamBowlersVsBatsmenMatch(eng_nz,"England","New Zealand")

bowlerVsBatsmen-3

m <- teamBowlersVsBatsmenMatch(pak_sa,"Pakistan",plot=FALSE)
m
## Source: local data frame [30 x 3]
## Groups: bowler [?]
## 
##            bowler        batsman runsConceded
##            (fctr)         (fctr)        (dbl)
## 1  Mohammad Irfan      Q de Kock           25
## 2  Mohammad Irfan        HM Amla           17
## 3  Mohammad Irfan   F du Plessis            0
## 4  Mohammad Irfan AB de Villiers            9
## 5   Sohail Tanvir      Q de Kock           11
## 6   Sohail Tanvir        HM Amla            6
## 7   Sohail Tanvir      JP Duminy            9
## 8   Sohail Tanvir      R McLaren           12
## 9     Junaid Khan      Q de Kock           24
## 10    Junaid Khan        HM Amla            6
## ..            ...            ...          ...

15. Match worm graph

The plots below provide the match worm graph for the matches

matchWormGraph(aus_ind,'Australia',"India")

matchWorm-1

matchWormGraph(sl_wi,'Sri Lanka',"West Indies")

matchWorm-2

Conclusion

This post included all functions between 2 opposing countries from the package yorkr.As mentioned above the yaml match files have been already converted to dataframes and are available for download from Github. Go ahead and give it a try

To be continued. Watch this space!

Important note: Do check out my other posts using yorkr at yorkr-posts

You may also like

Masters of Spin: Unraveling the web with R

Here is a look at some of the masters of spin bowling in cricket. Specifically this post analyzes 3 giants of spin bowling in recent times, namely Shane Warne of Australia, Muthiah Muralitharan of Sri Lanka and our very own Anil Kumble of India.  As to “who is the best leggie” has been a hot topic in cricket in recent years.  As in my earlier post “Analyzing cricket’s batting legends: Through the mirage with R”, I was not interested in gross statistics like most wickets taken.

In this post I try to analyze how each bowler has performed over his entire test career. All bowlers have bowled around ~240 innings. All  other things being equal, it does take a sense to look a little deeper into what their performance numbers reveal about them. As in my earlier posts the data has been taken from ESPN CricInfo’s Statguru

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

acks), and $4.99/Rs 320 and $6.99/Rs448 respectively

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

I have chosen these 3 spinners for the following reasons

Shane Warne : Clearly a deadly spinner who can turn the ball at absurd angles
Muthiah Muralitharan : While controversy dogged Muralitharan he was virtually unplayable on many cricketing venues
Anil Kumble: A master spinner whose chess like strategy usually outwitted the best of batsmen.

The King of Spin according to my analysis below is clearly Muthiah Muralitharan. This is clearly shown in the final charts where the performances of bowlers are plotted on a single graph. Muralitharan is clearly a much more lethal bowler and has a higher strike rate. In addition Muralitharan has the lowest mean economy rate amongst the 3 for wickets in the range 3 to 7.  Feel free to add your own thoughts, comments and dissent.

The code for this implementation is available at GitHub at mastersOfSpin. Feel free to clone,fork or hack the code to your own needs. You should be able to use the code as-is on other bowlers with little or no modification

So here goes

Wickets frequency percentage vs Wickets plot
For this plot I determine how frequently the bowler takes ‘n’ wickets in his career and calculate the percentage over his entire career.  In other words this is done as follows in R

# Create a table of Wickets vs the frequency of the wickts
colnames(wktsDF) # Calculate wickets percentage
wktsDF$freqPercent

and plot this as a graph.

This is shown for Warne below
1) Shane Warne –  Wickets Frequency percentage vs Wickets plot

warne-wkts-1

Wickets – Mean Economy rate chart
This chart plots the mean economy rate for ‘n’ wickets for the bowler. As an example to do this for 3 wickets for Shane Warne, a list is created of economy rates when Warne has taken  3 wickets in his entire career. The average of this list is then computed and stored against Warne’s 3 wickets. This is done for all wickets taken in Warne’s career. The R snippet for this implementation is shown below

econRate for (i in 0: max(as.numeric(as.character(bowler$Wkts)))) {
# Create a vector of Economy rate  for number of wickets 'i'
a b # Compute the mean economy rate by using lapply on the list
econRate[i+1] print(econRate[i])
}

Shane Warne –  Wickets vs Mean Economy rate
This plot for Shane Warne is shown below

warne-er-1

The plots for M Muralithan and Anil Kumble are included below

2) M Muralitharan – Wickets Frequency percentage vs Wickets plot
murali-wkts

M Muralitharan – Wickets vs Mean Economy rate

murali-er

3) Anil Kumble – Wickets Frequency percentage vs Wickets plot
kumble-wkts

Anil Kumble – Wickets vs Mean Economy rate
kumble-er

Finally the relative performance of the bowlers is generated by creating a single chart where the wicket frequencies and the mean economy rate vs wickets is plotted.

This is shown below

Relative wicket percentages
relative-wkts-pct-1

Relative mean economy rate
relative-er-1

As can be seen in the above 2 charts M Muralidharan not only has a higher strike rate as far as wickets in 3 to 7 range, he also has a much lower mean economy rate

You can clone/fork the R code from GitHub at mastersOfSpin

Conclusion: The performance of Muthiah Muralitharan is clearly superior to both Shane Warne and Kumble. In my opinion the king of spin is M Muralitharan, followed by Shane Warne and finally Anil Kumble

Feel free to dispute my claims. Comments, suggestions are more than welcome

Also see

1. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
2. Informed choices through Machine Learning-2: Pitting together Kumble, Kapil, Chandra
3. Analyzing cricket’s batting legends – Through the mirage with R

You may also like
1. A peek into literacy in India:Statistical learning with R
2. A crime map of India in R: Crimes against women
3.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
4.  Bend it like Bluemix, MongoDB with autoscaling – Part 2