yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams

Alice:“How long is forever”? White Rabbit:“Sometimes, just one second.”

Alice :“Where should I go?” The Cheshire Cat: “That depends on where you want to end up.”

“I’m not strange, weird, off, nor crazy, my reality is just different from yours.”

        Alice through the looking glass - Lewis Caroll

Introduction

In this post, my R package ‘yorkr’, continues to bat in the Twenty20s. This post is a continuation of my earlier post – yorkr pads up for the Twenty20s: Part 1- Analyzing team“s match performance. This post deals with Class 2 functions namely the performances of a team in all T20 matches against a single opposition for e.g all T20 matches of India-Australia, Pakistan-West Indies etc. You can clone/fork the code for my package yorkr from Github at yorkr

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

s), and $4.99/Rs 320 and $6.99/Rs448 respectively

This post has also been published at RPubs yorkrT20-Part2 and can also be downloaded as a PDF document from yorkrT20-Part2.pdf

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Note: To do similar analysis you can use my yorkrT20templates. See my post Analysis of International T20 matches with yorkr templates

Important note 1: Do check out all the posts on the python avatar of yorkr, namely ‘yorkpy’ in my post ‘Pitching yorkpy … short of good length to IPL – Part 1

The list of function in Class 2 are

  1. teamBatsmenPartnershiOppnAllMatches()
  2. teamBatsmenPartnershipOppnAllMatchesChart()
  3. teamBatsmenVsBowlersOppnAllMatches()
  4. teamBattingScorecardOppnAllMatches()
  5. teamBowlingPerfOppnAllMatches()
  6. teamBowlersWicketsOppnAllMatches()
  7. teamBowlersVsBatsmenOppnAllMatches()
  8. teamBowlersWicketKindOppnAllMatches()
  9. teamBowlersWicketRunsOppnAllMatches()
  10. plotWinLossBetweenTeams()

1. Install the package from CRAN

library(yorkr)
rm(list=ls())

2. Get data for all T20 matches between 2 teams

We can get all T20 matches between any 2 teams using the function below. The dir parameter should point to the folder which has the T20 RData files of the individual matches. This function creates a data frame of all the T20 matches and also saves the dataframe as RData. The function below gets all matches between India and Australia

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-matches")
matches <- getAllMatchesBetweenTeams("Australia","India",dir=".")
dim(matches)
## [1] 2829   25

I have however already saved the Twenty20 matches for all possible combination of opposing countries. The data for these matches for the individual teams/countries can be obtained from Github at in the folder T20-allmatches-between-two-teams

3. Save data for all matches between all combination of 2 teams

This can be done locally using the function below. You could use this function to combine all Twenty20 matches between any 2 teams into a single dataframe and save it in the current folder. The current implementation expectes that the the RData files of individual matches are in ../data folder. Since I already have converted this I will not be running this again

#saveAllMatchesBetweenTeams(dir=".",odir=".")

4. Load data directly for all matches between 2 teams

As in my earlier post I pick all Twenty20 matches between 2 random teams. I load the data directly from the stored RData files. When we load the Rdata file a “matches” object will be created. This object can be stored for the apporpriate teams as below

# Load T20 matches between teams
setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-allmatches-between-two-teams")
load("India-Australia-allMatches.RData")
aus_ind_matches <- matches
dim(aus_ind_matches)
## [1] 2829   25
load("England-New Zealand-allMatches.RData")
eng_nz_matches <- matches
dim(eng_nz_matches)
## [1] 2760   25
load("Pakistan-South Africa-allMatches.RData")
pak_sa_matches <- matches
dim(pak_sa_matches)
## [1] 2308   25
load("Sri Lanka-West Indies-allMatches.RData")
sl_wi_matches <- matches
dim(sl_wi_matches)
## [1] 1909   25
load("Bangladesh-Ireland-allMatches.RData")
ban_ire_matches <-matches
dim(ban_ire_matches)
## [1] 479  25
load("Scotland-Canada-allMatches.RData")
sco_can_matches <-matches
dim(sco_can_matches)
## [1] 250  25
load("Netherlands-Afghanistan-allMatches.RData")
nl_afg_matches <- matches
dim(nl_afg_matches)
## [1] 927  25

5. Team Batsmen partnership in Twenty20 (all matches with opposition)

This function will create a report of the batting partnerships in the teams. The report can be brief or detailed depending on the parameter ‘report’. The top batsmen in India-Australia clashes are Shane Watson & AJ Finch from Australia and Virat Kohli & Yuvraj Singh of India.

m<- teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'Australia',report="summary")
m
## Source: local data frame [40 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1     SR Watson       284
## 2      AJ Finch       249
## 3     DA Warner       204
## 4       MS Wade       125
## 5     DJ Hussey       101
## 6     ML Hayden        79
## 7    RT Ponting        76
## 8     MJ Clarke        65
## 9     A Symonds        63
## 10 AC Gilchrist        59
## ..          ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'India',report="summary")
m
## Source: local data frame [23 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1       V Kohli       319
## 2  Yuvraj Singh       262
## 3     RG Sharma       252
## 4      MS Dhoni       213
## 5     G Gambhir       198
## 6      SK Raina       160
## 7      S Dhawan       105
## 8    RV Uthappa        70
## 9     IK Pathan        57
## 10     V Sehwag        41
## ..          ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(aus_ind_matches,'Australia',report="detailed")
m[1:30,]
##      batsman   nonStriker partnershipRuns totalRuns
## 1  SR Watson     AJ Finch              21       284
## 2  SR Watson   GJ Maxwell               3       284
## 3  SR Watson    DA Warner             127       284
## 4  SR Watson     SE Marsh              41       284
## 5  SR Watson      TM Head              63       284
## 6  SR Watson      CA Lynn              23       284
## 7  SR Watson   UT Khawaja               2       284
## 8  SR Watson  CT Bancroft               4       284
## 9   AJ Finch    BJ Haddin              15       249
## 10  AJ Finch NJ Maddinson              21       249
## 11  AJ Finch    SR Watson              25       249
## 12  AJ Finch   GJ Maxwell              12       249
## 13  AJ Finch MC Henriques              21       249
## 14  AJ Finch    DA Warner              44       249
## 15  AJ Finch    DJ Hussey              25       249
## 16  AJ Finch      MS Wade               1       249
## 17  AJ Finch     SE Marsh              66       249
## 18  AJ Finch    SPD Smith              16       249
## 19  AJ Finch      TM Head               0       249
## 20  AJ Finch      CA Lynn               3       249
## 21 DA Warner     AJ Finch              30       204
## 22 DA Warner    SR Watson             110       204
## 23 DA Warner   GJ Maxwell              11       204
## 24 DA Warner    DJ Hussey              22       204
## 25 DA Warner     CL White               6       204
## 26 DA Warner      MS Wade              25       204
## 27   MS Wade     AJ Finch               2       125
## 28   MS Wade  JP Faulkner               6       125
## 29   MS Wade    DA Warner              12       125
## 30   MS Wade    DJ Hussey              54       125
m <-teamBatsmenPartnershiOppnAllMatches(pak_sa_matches,'Pakistan',report="summary")
m
## Source: local data frame [24 x 2]
## 
##            batsman totalRuns
##             (fctr)     (dbl)
## 1       Umar Akmal       255
## 2  Mohammad Hafeez       205
## 3    Shahid Afridi       165
## 4    Ahmed Shehzad        85
## 5     Shoaib Malik        80
## 6    Nasir Jamshed        69
## 7    Misbah-ul-Haq        63
## 8     Kamran Akmal        62
## 9     Abdul Razzaq        62
## 10  Sohaib Maqsood        41
## ..             ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(eng_nz_matches,'England',report="summary")
m
## Source: local data frame [35 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1       LJ Wright       273
## 2        AD Hales       194
## 3         MJ Lumb       188
## 4      EJG Morgan       152
## 5      JC Buttler       140
## 6    KP Pietersen       112
## 7         OA Shah        91
## 8  PD Collingwood        86
## 9         IR Bell        73
## 10        JE Root        68
## ..            ...       ...
m <-teamBatsmenPartnershiOppnAllMatches(sl_wi_matches,'Sri Lanka',report="summary")
m[1:20,]
## Source: local data frame [20 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1        TM Dilshan       334
## 2  DPMD Jayawardene       202
## 3     KC Sangakkara       135
## 4     ST Jayasuriya       111
## 5        AD Mathews        98
## 6       MDKJ Perera        78
## 7  DSNFG Jayasuriya        66
## 8   HDRL Thirimanne        48
## 9      LD Chandimal        41
## 10  KMDN Kulasekara        30
## 11        LPC Silva        18
## 12        J Mubarak        15
## 13  TAM Siriwardana        15
## 14    CK Kapugedera         8
## 15       SL Malinga         7
## 16       S Prasanna         6
## 17      BMAJ Mendis         3
## 18      NLTC Perera         3
## 19  SMSM Senanayake         3
## 20     PVD Chameera         3
m <- teamBatsmenPartnershiOppnAllMatches(ban_ire_matches,"Ireland",report="summary")
m
## Source: local data frame [11 x 2]
## 
##            batsman totalRuns
##             (fctr)     (dbl)
## 1        GC Wilson        51
## 2  WTS Porterfield        49
## 3       NJ O'Brien        48
## 4       KJ O'Brien        39
## 5        JF Mooney        18
## 6      MC Sorensen        12
## 7         EC Joyce        11
## 8      DT Johnston         7
## 9      PR Stirling         4
## 10         JP Bray         2
## 11       AR Cusack         1

6. Team batsmen partnership in Twenty20 (all matches with opposition)

This is plotted graphically in the charts below. Kohli & Yuvraj top the list for India.

teamBatsmenPartnershipOppnAllMatchesChart(aus_ind_matches,"India","Australia")

teamBatsmenPartnership-1

teamBatsmenPartnershipOppnAllMatchesChart(pak_sa_matches,main="South Africa",opposition="Pakistan")

teamBatsmenPartnership-2

m<- teamBatsmenPartnershipOppnAllMatchesChart(eng_nz_matches,"New Zealand",opposition="England",plot=FALSE)
m[1:30,]
##          batsman    nonStriker runs
## 1  HD Rutherford    MJ Guptill   69
## 2  HD Rutherford   BB McCullum   61
## 3    BB McCullum    MJ Guptill   53
## 4     MJ Guptill HD Rutherford   52
## 5    BB McCullum KS Williamson   51
## 6    BB McCullum HD Rutherford   49
## 7    LRPL Taylor   BB McCullum   49
## 8    BB McCullum   LRPL Taylor   46
## 9     MJ Guptill   BB McCullum   41
## 10     SB Styris   CD McMillan   40
## 11   CD McMillan      JDP Oram   38
## 12  JEC Franklin   LRPL Taylor   33
## 13   LRPL Taylor KS Williamson   32
## 14 KS Williamson   LRPL Taylor   32
## 15     SB Styris   LRPL Taylor   31
## 16   LRPL Taylor     SB Styris   30
## 17   BB McCullum      JD Ryder   29
## 18      JDP Oram      JS Patel   28
## 19      JD Ryder   BB McCullum   27
## 20   BB McCullum  JEC Franklin   26
## 21      DR Flynn     SB Styris   22
## 22    TWM Latham   LRPL Taylor   22
## 23 KS Williamson    MJ Santner   21
## 24  JEC Franklin   NL McCullum   21
## 25       C Munro    MJ Guptill   21
## 26   LRPL Taylor        JM How   19
## 27   LRPL Taylor    MJ Guptill   19
## 28   CD McMillan     SB Styris   19
## 29    MJ Guptill  JEC Franklin   19
## 30   BB McCullum     SB Styris   18
teamBatsmenPartnershipOppnAllMatchesChart(sl_wi_matches,"Sri Lanka","West Indies")

teamBatsmenPartnership-3

teamBatsmenPartnershipOppnAllMatchesChart(ban_ire_matches,"Bangladesh","Ireland")

teamBatsmenPartnership-4

7. Team batsmen versus bowler in Twenty20 (all matches with opposition)

The plots below provide information on how each of the top batsmen fared against the opposition bowlers

teamBatsmenVsBowlersOppnAllMatches(aus_ind_matches,"India","Australia")

batsmenvsBowler-1

teamBatsmenVsBowlersOppnAllMatches(pak_sa_matches,"South Africa","Pakistan",top=3)

batsmenvsBowler-2

m <- teamBatsmenVsBowlersOppnAllMatches(eng_nz_matches,"England","New Zealnd",top=10,plot=FALSE)
m
## Source: local data frame [113 x 3]
## Groups: batsman [1]
## 
##      batsman       bowler  runs
##       (fctr)       (fctr) (dbl)
## 1  LJ Wright      SE Bond     1
## 2  LJ Wright MR Gillespie    17
## 3  LJ Wright     JDP Oram     4
## 4  LJ Wright    CS Martin    19
## 5  LJ Wright   DL Vettori    18
## 6  LJ Wright    SB Styris    14
## 7  LJ Wright     KD Mills    23
## 8  LJ Wright     MJ Mason     4
## 9  LJ Wright  NL McCullum    42
## 10 LJ Wright    IG Butler    15
## ..       ...          ...   ...
teamBatsmenVsBowlersOppnAllMatches(sl_wi_matches,"Sri Lanka","West Indies")

batsmenvsBowler-3

teamBatsmenVsBowlersOppnAllMatches(ban_ire_matches,"Bangladesh","Ireland")

batsmenvsBowler-4

8. Team batsmen versus bowler in Twenty20(all matches with opposition)

The following tables gives the overall performances of the country’s batsmen against the opposition. For India-Australia matches Virat Kohli, Yuvraj Singh and Rohit Sharma lead the way. For Australia it is Shane Watson, AJ Finch and DA Warner. In South Africa- Pakistan matches it is JP Duminy & De Kock respectively

a <-teamBattingScorecardOppnAllMatches(aus_ind_matches,main="India",opposition="Australia")
## Total= 1787
a
## Source: local data frame [23 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1       V Kohli         225    27     7   319
## 2  Yuvraj Singh         151    21    18   262
## 3     RG Sharma         175    20    12   252
## 4      MS Dhoni         189    15     7   213
## 5     G Gambhir         174    25     1   198
## 6      SK Raina         117    17     3   160
## 7      S Dhawan          65    12     3   105
## 8    RV Uthappa          54     7     3    70
## 9     IK Pathan          58     2     1    57
## 10     V Sehwag          38     5     1    41
## ..          ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(aus_ind_matches,"Australia","India")
## Total= 1767
## Source: local data frame [40 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1     SR Watson         173    16    20   284
## 2      AJ Finch         164    33     5   249
## 3     DA Warner         134    14    14   204
## 4       MS Wade          93     6     5   125
## 5     DJ Hussey          81     5     6   101
## 6     ML Hayden          63     5     6    79
## 7    RT Ponting          52    13    NA    76
## 8     MJ Clarke          54     3     1    65
## 9     A Symonds          43     4     2    63
## 10 AC Gilchrist          38     7     3    59
## ..          ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(pak_sa_matches,"South Africa","Pakistan")
## Total= 1265
## Source: local data frame [27 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1       JP Duminy         178    14     7   214
## 2       Q de Kock         110    21     2   147
## 3         HM Amla         114    17     2   146
## 4  AB de Villiers         116    10     5   144
## 5    F du Plessis         121     6     4   129
## 6       JH Kallis          92     9     2    98
## 7       CA Ingram          55     8     3    77
## 8        GC Smith          78     9    NA    74
## 9       DA Miller          54     7     2    73
## 10  RK Kleinveldt           7     1     3    22
## ..            ...         ...   ...   ...   ...
teamBattingScorecardOppnAllMatches(sl_wi_matches,"West Indies","Sri Lanka")
## Total= 1017
## Source: local data frame [20 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1       DJ Bravo         173    17     9   218
## 2     MN Samuels         132     9     8   157
## 3   ADS Fletcher          74    10     7   109
## 4       CH Gayle          91     9     2    76
## 5     KA Pollard          61     6     2    65
## 6      RR Sarwan          66     2    NA    61
## 7       D Ramdin          30     3     2    47
## 8      J Charles          51     3     3    46
## 9      DJG Sammy          34     4    NA    45
## 10    AD Russell          32    NA     4    44
## 11   LMP Simmons          29     5    NA    33
## 12     JE Taylor          23     2    NA    24
## 13     SP Narine          15     2     1    23
## 14 S Chanderpaul          28     1     1    19
## 15      DR Smith          14     1     1    17
## 16   XM Marshall          12     2    NA    14
## 17       SJ Benn           8     1    NA     6
## 18      D Bishoo           5     1    NA     6
## 19      WW Hinds           7     1    NA     5
## 20     JO Holder           4    NA    NA     2
teamBattingScorecardOppnAllMatches(eng_nz_matches,"England","New Zealand")
## Total= 1943
## Source: local data frame [35 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1       LJ Wright         167    28    12   273
## 2        AD Hales         125    22     7   194
## 3         MJ Lumb         129    15    11   188
## 4      EJG Morgan         141    12     5   152
## 5      JC Buttler          83    16     5   140
## 6    KP Pietersen          83    13     2   112
## 7         OA Shah          68     6     4    91
## 8  PD Collingwood          61     6     4    86
## 9         IR Bell          60    11     1    73
## 10        JE Root          45     8     1    68
## ..            ...         ...   ...   ...   ...
teamBatsmenPartnershiOppnAllMatches(sco_can_matches,"Scotland","Canada")
## Source: local data frame [8 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1 RD Berrington        47
## 2    KJ Coetzer        22
## 3    JH Stander        21
## 4      DF Watts        18
## 5   R Flannigan        15
## 6    CS MacLeod         2
## 7        RM Haq         2
## 8    PL Mommsen         0

9. Team performances of bowlers (all matches with opposition)

Like the function above the following tables provide the top bowlers of the countries in the matches against the oppoition. In India-Australia matches RA Jadeja leads, in Pakistan-South Africa matches Saeed Ajmal tops and so on.

teamBowlingPerfOppnAllMatches(aus_ind_matches,"India","Australia")
## Source: local data frame [26 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1        RA Jadeja    13       0   219       8
## 2         R Ashwin    12       0   232       7
## 3        JJ Bumrah     5       0   103       6
## 4    R Vinay Kumar     6       0    79       6
## 5         R Sharma     5       0    56       5
## 6          A Nehra     9       0   127       4
## 7     Yuvraj Singh     5       0    72       4
## 8          B Kumar     5       0    42       4
## 9        IK Pathan     5       0   115       3
## 10 Harbhajan Singh     9       1    83       3
## ..             ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(pak_sa_matches,main="Pakistan",opposition="South Africa")
## Source: local data frame [17 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1      Saeed Ajmal     8       1   202      10
## 2  Mohammad Hafeez    10       0   178       9
## 3    Shahid Afridi    11       0   200       6
## 4         Umar Gul     3       0    93       6
## 5    Sohail Tanvir     6       0   103       3
## 6      Junaid Khan     4       0    75       3
## 7    Shoaib Akhtar     1       0    65       3
## 8    Mohammad Amir     1       0    63       2
## 9   Bilawal Bhatti     5       0    54       2
## 10    Abdur Rehman     1       0    53       2
## 11    Yasir Arafat     3       0    25       2
## 12    Abdul Razzaq     2       0    69       1
## 13  Mohammad Irfan     3       0    46       1
## 14       Anwar Ali     2       0    22       0
## 15    Shoaib Malik     3       0    17       0
## 16      Fawad Alam     1       0    15       0
## 17      Raza Hasan     3       1    12       0
teamBowlingPerfOppnAllMatches(eng_nz_matches,"New Zealand","England")
## Source: local data frame [26 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1        KD Mills     8       0   199       5
## 2  MJ McClenaghan    10       0   189       5
## 3      TG Southee    13       0   183       5
## 4      DL Vettori     1       0    91       5
## 5    JEC Franklin     2       0    53       5
## 6     NL McCullum     9       0   281       4
## 7       CS Martin     6       0   116       4
## 8         SE Bond     1       0    49       4
## 9       IG Butler     1       0    95       3
## 10      SB Styris     4       0    80       3
## ..            ...   ...     ...   ...     ...
teamBowlingPerfOppnAllMatches(sl_wi_matches,"Sri Lanka","West Indies")
## Source: local data frame [16 x 5]
## 
##              bowler overs maidens  runs wickets
##              (fctr) (int)   (int) (dbl)   (dbl)
## 1        BAW Mendis     8       1    82      10
## 2        SL Malinga     7       0   217       9
## 3        AD Mathews     7       0    87       6
## 4   TAM Siriwardana     4       0    58       5
## 5   SMSM Senanayake     4       0    90       4
## 6    M Muralitharan     1       0    76       4
## 7   KMDN Kulasekara     7       0   158       2
## 8      PVD Chameera     4       0    66       2
## 9           I Udana     1       0    56       1
## 10 DSNFG Jayasuriya     4       0    38       1
## 11      BMAJ Mendis     2       0    32       1
## 12      A Dananjaya     3       0    16       1
## 13       S Prasanna     1       0    15       1
## 14     HMRKB Herath     3       0    43       0
## 15    ST Jayasuriya     1       0    34       0
## 16      NLTC Perera     1       0    13       0

10. Team bowler’s wickets in Twenty20 (all matches with opposition)

This provided a graphical plot of the tables above

teamBowlersWicketsOppnAllMatches(aus_ind_matches,"India","Australia")

bowlerWicketsOppn-1

teamBowlersWicketsOppnAllMatches(aus_ind_matches,"Australia","India")

bowlerWicketsOppn-2

teamBowlersWicketsOppnAllMatches(pak_sa_matches,"South Africa","Pakistan",top=10)

bowlerWicketsOppn-3

m <-teamBowlersWicketsOppnAllMatches(eng_nz_matches,"England","Zealand",plot=FALSE)
m
## Source: local data frame [20 x 2]
## 
##            bowler wickets
##            (fctr)   (int)
## 1       SCJ Broad      12
## 2     JM Anderson       7
## 3     JW Dernbach       7
## 4        GP Swann       6
## 5       LJ Wright       5
## 6   RJ Sidebottom       4
## 7         ST Finn       4
## 8         MA Wood       4
## 9  AD Mascarenhas       3
## 10 PD Collingwood       3
## 11      DJ Willey       3
## 12       DL Maddy       2
## 13     TT Bresnan       2
## 14      BA Stokes       2
## 15    JC Tredwell       2
## 16     A Flintoff       1
## 17      DR Briggs       1
## 18      WB Rankin       1
## 19      AU Rashid       1
## 20        JE Root       1
teamBowlersWicketsOppnAllMatches(ban_ire_matches,"Bangladesh","Ireland",top=3)

bowlerWicketsOppn-4

11. Team bowler vs batsmen in Twenty20(all matches with opposition)

These plots show how the bowlers fared against the batsmen. It shows which of the opposing teams batsmen were able to score the most runs

teamBowlersVsBatsmenOppnAllMatches(aus_ind_matches,'India',"Australia",top=5)

bowlerVsBatsmen-1

teamBowlersVsBatsmenOppnAllMatches(pak_sa_matches,"Pakistan","South Africa",top=3)

bowlerVsBatsmen-2

teamBowlersVsBatsmenOppnAllMatches(eng_nz_matches,"England","New Zealand")

bowlerVsBatsmen-3

teamBowlersVsBatsmenOppnAllMatches(eng_nz_matches,"New Zealand","England")

bowlerVsBatsmen-4

12. Team bowler’s wicket kind in Twenty20(caught,bowled,etc) (all matches with opposition)

The charts below show the wicket kind taken by the bowler (caught, bowled, lbw etc)

teamBowlersWicketKindOppnAllMatches(aus_ind_matches,"India","Australia",plot=TRUE)

bowlerWickets-1

m <- teamBowlersWicketKindOppnAllMatches(aus_ind_matches,"Australia","India",plot=FALSE)
m[1:30,]
##             bowler wicketKind wicketPlayerOut runs
## 1            B Lee     caught        V Sehwag  133
## 2        MJ Clarke     caught      RV Uthappa   27
## 3    BW Hilfenhaus     caught       G Gambhir   28
## 4         CJ McKay     caught       RG Sharma   75
## 5  NM Coulter-Nile     caught        SK Raina   44
## 6       XJ Doherty    stumped        S Dhawan   76
## 7         CJ McKay     caught         V Kohli   75
## 8       MG Johnson     caught        V Sehwag   54
## 9       MG Johnson     caught       G Gambhir   54
## 10      MG Johnson    run out      RV Uthappa   54
## 11       MJ Clarke     caught    Yuvraj Singh   27
## 12      MG Johnson    run out        MS Dhoni   54
## 13           B Lee    run out        V Sehwag  133
## 14      NW Bracken     caught       G Gambhir   68
## 15           B Lee     bowled      KD Karthik  133
## 16      NW Bracken     caught      RV Uthappa   68
## 17        JR Hopes     bowled       RG Sharma   10
## 18       DJ Hussey     caught        MS Dhoni   24
## 19       AA Noffke     caught         P Kumar   23
## 20        AC Voges     caught Harbhajan Singh    5
## 21        AC Voges     caught     S Sreesanth    5
## 22      NW Bracken     caught       IK Pathan   68
## 23       DP Nannes     caught         M Vijay   25
## 24       DP Nannes     caught       G Gambhir   25
## 25         SW Tait     caught        SK Raina  112
## 26       DP Nannes     bowled    Yuvraj Singh   25
## 27       SPD Smith     caught        MS Dhoni   34
## 28      MG Johnson     caught       YK Pathan   54
## 29       SR Watson    run out       RA Jadeja  201
## 30       SR Watson     caught Harbhajan Singh  201
teamBowlersWicketKindOppnAllMatches(sl_wi_matches,"Sri Lanka",'West Indies',plot=TRUE)

bowlerWickets-2

13. Team bowler’s wicket taken and runs conceded in Twenty20(all matches with opposition)

teamBowlersWicketRunsOppnAllMatches(aus_ind_matches,"India","Australia")

wicketRuns-1

m <-teamBowlersWicketRunsOppnAllMatches(pak_sa_matches,"Pakistan","South Africa",plot=FALSE)
m[1:30,]
## Source: local data frame [30 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1     Abdul Razzaq     2       0    69       1
## 2    Mohammad Amir     1       0    63       2
## 3    Shahid Afridi    11       0   200       6
## 4      Saeed Ajmal     8       1   202      10
## 5     Shoaib Malik     3       0    17       0
## 6         Umar Gul     3       0    93       6
## 7       Fawad Alam     1       0    15       0
## 8     Abdur Rehman     1       0    53       2
## 9  Mohammad Hafeez    10       0   178       9
## 10   Shoaib Akhtar     1       0    65       3
## ..             ...   ...     ...   ...     ...

14. Plot of wins vs losses between teams in Twenty20.

setwd("C:/software/cricket-package/york-test/yorkrData/Twenty20/T20-matches")
plotWinLossBetweenTeams("India","Sri Lanka")

winsLosses-1

plotWinLossBetweenTeams('Pakistan',"South Africa",".")

winsLosses-2

plotWinLossBetweenTeams('England',"New Zealand",".")

winsLosses-3

plotWinLossBetweenTeams("Australia","West Indies",".")

winsLosses-4

plotWinLossBetweenTeams('Bangladesh',"Zimbabwe",".")

winsLosses-5

plotWinLossBetweenTeams('Scotland',"Ireland",".")

winsLosses-6

Conclusion

This post included all functions for all Twenty20 matches between any 2 opposing countries. As before the data frames are already available. You can load the data and begin to use them. If more insights from the dataframe are possible do go ahead. But please do attribute the source to Cricheet (http://cricsheet.org), my package yorkr and my blog. Do give the functions a spin for yourself!

Important note: Do check out my other posts using yorkr at yorkr-posts

You may also like

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!
  2. Introducing cricket package yorkr: Part 2-Trapped leg before wicket!
  3. Introducing cricket package yorkr:Part 4-In the block hole!
  4. Introducing cricketr! : An R package to analyze performances of cricketers
  5. Cricket analytics with cricketr
  6. Experiments with deblurring using OpenCV
  7. Cloud Computing – Design Considerations
  8. A Cloud medley with IBM Bluemix, Cloudant DB and Node.js
  9. A short video tutorial on my R package cricketr

Cricket analytics with cricketr in paperback and Kindle versions

Untitled

My book “Cricket analytics with cricketr” is now available in paperback and Kindle versions. The paperback is available from Amazon (US, UK and Europe) for $ 48.99. The Kindle version can be downloaded from the Kindle store for $2.50 (Rs 169/-). Do pick your copy. It should be a good read for a Sunday afternoon.

This book of mine contains my posts based on my R package ‘cricketr’ now in CRAN. The package cricketr can analyze both batsmen and bowlers for all formats of the game Test, ODI and Twenty20. The package uses the data from ESPN Cricinfo. The analyses include runs frequency charts, performances of batsmen and bowlers in different grounds and against different teams, moving  average of  runs/wickets over the career, mean strike rate, mean economy rate and so on.

The book includes the following chapters based on my R package cricketr  There are 2 additional articles where I use Machine Learning with the package Octave.

CONTENTS
Cricket Analytics with cricketr 11
1.1. Introducing cricketr! : An R package to analyze performances of cricketers 11
1.2. Taking cricketr for a spin – Part 1 49
1.2. cricketr digs the Ashes! 70
1.3. cricketr plays the ODIs! 99
1.4. cricketr adapts to the Twenty20 International! 141
1.5. Sixer – R package cricketr’s new Shiny avatar 170
2. Other cricket posts in R 180
2.1. Analyzing cricket’s batting legends – Through the mirage with R 180
2.2. Mirror, mirror … the best batsman of them all? 206
3. Appendix 220
Cricket analysis with Machine Learning using Octave 220
3.1. Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid 221
3.2. Informed choices through Machine Learning-2 Pitting together Kumble, Kapil, Chandra 234
Further reading 248
Important Links 249

I do hope you have a great time reading it. Do pick up your copy. Feel free to get in touch with me with your comments and suggestions.  I have more interesting things lined up for the future.

Watch this space!

You may also like
1. Literacy in India : A deepR dive.
2. Natural Language Processing: What would Shakespeare say?
3. Revisiting crimes against women in India
4. Experiments with deblurring using OpenCV
5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
6. Bend it like Bluemix, MongoDB with autoscaling – Part 1
7. “Is it animal? Is it an insect?” in Android

cricketr plays the ODIs!

Published in R bloggers: cricketr plays the ODIs

Introduction

In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI batsmen and top 4 ODI bowlers.

Unititled2

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!!  See Cricketr adds team analytics to its repertoire!!!

Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players

Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

You can also read this post at Rpubs as odi-cricketr. Dowload this report as a PDF file from odi-cricketr.pdf

Important note: Do check out my other posts using cricketr at cricketr-posts

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.
Batsmen

  1. Virendar Sehwag (Ind)
  2. AB Devilliers (SA)
  3. Chris Gayle (WI)
  4. Glenn Maxwell (Aus)

Bowlers

  1. Mitchell Johnson (Aus)
  2. Lasith Malinga (SL)
  3. Dale Steyn (SA)
  4. Tim Southee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

The profile for Virender Sehwag is 35263. This can be used to get the ODI data for Sehwag. For a batsman the type should be “batting” and for a bowler the type should be “bowling” and the function is getPlayerDataOD()

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The One day data for a particular player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virendar Sehwag, etc. This will bring up a page which have the profile number for the player e.g. for Virendar Sehwag this would be http://www.espncricinfo.com/india/content/player/35263.html. Hence, Sehwag’s profile is 35263. This can be used to get the data for Virat Sehwag as shown below

sehwag <- getPlayerDataOD(35263,dir="..",file="sehwag.csv",type="batting")

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

  1. Virendar Sehwag (Ind) – Innings – 245, Runs = 8586, Average=35.05, Strike Rate= 104.33
  2. AB Devilliers (SA) – Innings – 179, Runs= 7941, Average=53.65, Strike Rate= 99.12
  3. Chris Gayle (WI) – Innings – 264, Runs= 9221, Average=37.65, Strike Rate= 85.11
  4. Glenn Maxwell (Aus) – Innings – 45, Runs= 1367, Average=35.02, Strike Rate= 126.69

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen A. Virender Sehwag

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./sehwag.csv","Sehwag")
batsman6s("./sehwag.csv","Sehwag")
batsmanScoringRateODTT("./sehwag.csv","Sehwag")

sehwag-4s6sSR-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./devilliers.csv","Devillier")
batsman6s("./devilliers.csv","Devillier")
batsmanScoringRateODTT("./devilliers.csv","Devillier")

devillier-4s6SR-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gayle.csv","Gayle")
batsman6s("./gayle.csv","Gayle")
batsmanScoringRateODTT("./gayle.csv","Gayle")

gayle-4s6sSR-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./maxwell.csv","Maxwell")
batsman6s("./maxwell.csv","Maxwell")
batsmanScoringRateODTT("./maxwell.csv","Maxwell")

maxwell-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be seen that Maxwell has a awesome strike rate in ODIs. However we need to keep in mind that Maxwell has relatively much fewer (only 45 innings) innings. He is followed by Sehwag who(most innings- 245) also has an excellent strike rate till 100 runs and then we have Devilliers who roars ahead. This is also seen in the overall strike rate in above

par(mar=c(4,4,2,2))
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

Sehwag leads in the percentage of runs in 10 run ranges upto 50 runs. Maxwell and Devilliers lead in 55-66 & 66-85 respectively.

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Percentage of 4s,6s in the runs scored

The plot below shows the percentage of runs made by the batsmen by ways of 1s,2s,3s, 4s and 6s. It can be seen that Sehwag has the higheest percent of 4s (33.36%) in his overall runs in ODIs. Maxwell has the highest percentage of 6s (13.36%) in his ODI career. If we take the overall 4s+6s then Sehwag leads with (33.36 +5.95 = 39.31%),followed by Gayle (27.80+10.15=37.95%)

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Sehwag Devilliers Gayle Maxwell
## Runs(1s,2s,3s)  60.69      67.39 62.05   62.11
## 4s              33.36      24.28 27.80   24.53
## 6s               5.95       8.32 10.15   13.36
 

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./sehwag.csv","Sehwag")
batsmanPerfForecast("./devilliers.csv","Devilliers")
batsmanPerfForecast("./gayle.csv","Gayle")
batsmanPerfForecast("./maxwell.csv","Maxwell")

swcr-perf-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./sehwag.csv","V Sehwag")
battingPerf3d("./devilliers.csv","AB Devilliers")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./gayle.csv","C Gayle")
battingPerf3d("./maxwell.csv","G Maxwell")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

sehwag <- batsmanRunsPredict("./sehwag.csv","Sehwag",newdataframe=newDF)
devilliers <- batsmanRunsPredict("./devilliers.csv","Devilliers",newdataframe=newDF)
gayle <- batsmanRunsPredict("./gayle.csv","Gayle",newdataframe=newDF)
maxwell <- batsmanRunsPredict("./maxwell.csv","Maxwell",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Maxwell sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease followed by Sehwag. But we have to keep in mind that Maxwell has only around 1/5th of the innings of Sehwag (45 to Sehwag’s 245 innings). They are followed by Devilliers and then finally Gayle

batsmen <-cbind(round(sehwag$Runs),round(devilliers$Runs),round(gayle$Runs),round(maxwell$Runs))
colnames(batsmen) <- c("Sehwag","Devilliers","Gayle","Maxwell")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Sehwag Devilliers Gayle Maxwell
## 1          10           30     11         12    11      18
## 2          31           51     33         32    28      43
## 3          52           72     55         52    46      67
## 4          73           93     77         71    63      92
## 5          94          114    100         91    81     117
## 6         116          136    122        111    98     141
## 7         137          157    144        130   116     166
## 8         158          178    167        150   133     191
## 9         179          199    189        170   151     215
## 10        200          220    211        190   168     240

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means It can be seen that Devilliers has almost 27.75% likelihood to make around 90+ runs. Gayle and Sehwag have 34% to make 40+ runs. A. Virender Sehwag

A. Virender Sehwag

batsmanRunsLikelihood("./sehwag.csv","Sehwag")

smith-1

## Summary of  Sehwag 's runs scoring likelihood
## **************************************************
## 
## There is a 35.22 % likelihood that Sehwag  will make  46 Runs in  44 balls over 67  Minutes 
## There is a 9.43 % likelihood that Sehwag  will make  119 Runs in  106 balls over  158  Minutes 
## There is a 55.35 % likelihood that Sehwag  will make  12 Runs in  13 balls over 18  Minutes

B. AB Devilliers

batsmanRunsLikelihood("./devilliers.csv","Devilliers")

warner-1

## Summary of  Devilliers 's runs scoring likelihood
## **************************************************
## 
## There is a 30.65 % likelihood that Devilliers  will make  44 Runs in  43 balls over 60  Minutes 
## There is a 29.84 % likelihood that Devilliers  will make  91 Runs in  88 balls over  124  Minutes 
## There is a 39.52 % likelihood that Devilliers  will make  11 Runs in  15 balls over 21  Minutes

C. Chris Gayle

batsmanRunsLikelihood("./gayle.csv","Gayle")

cook,cache-TRUE-1

## Summary of  Gayle 's runs scoring likelihood
## **************************************************
## 
## There is a 32.69 % likelihood that Gayle  will make  47 Runs in  51 balls over 72  Minutes 
## There is a 54.49 % likelihood that Gayle  will make  10 Runs in  15 balls over  20  Minutes 
## There is a 12.82 % likelihood that Gayle  will make  109 Runs in  119 balls over 172  Minutes

D. Glenn Maxwell

batsmanRunsLikelihood("./maxwell.csv","Maxwell")

oot-1

## Summary of  Maxwell 's runs scoring likelihood
## **************************************************
## 
## There is a 34.38 % likelihood that Maxwell  will make  39 Runs in  29 balls over 35  Minutes 
## There is a 15.62 % likelihood that Maxwell  will make  89 Runs in  55 balls over  69  Minutes 
## There is a 50 % likelihood that Maxwell  will make  6 Runs in  7 balls over 9  Minutes

Average runs at ground and against opposition

A. Virender Sehwag

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./sehwag.csv","Sehwag")
batsmanAvgRunsOpposition("./sehwag.csv","Sehwag")

avgrg-1-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./devilliers.csv","Devilliers")
batsmanAvgRunsOpposition("./devilliers.csv","Devilliers")

avgrg-2-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gayle.csv","Gayle")
batsmanAvgRunsOpposition("./gayle.csv","Gayle")

avgrg-3-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./maxwell.csv","Maxwell")
batsmanAvgRunsOpposition("./maxwell.csv","Maxwell")

avgrg-4-1

dev.off()
## null device 
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

1. The moving average of Devilliers and Maxwell is on the way up.
2. Sehwag shows a slight downward trend from his 2nd peak in 2011
3. Gayle maintains a consistent 45 runs for the last few years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./sehwag.csv","Sehwag")
batsmanMovingAverage("./devilliers.csv","Devilliers")
batsmanMovingAverage("./gayle.csv","Gayle")
batsmanMovingAverage("./maxwell.csv","Maxwell")

sdgm-ma-1

dev.off()
## null device 
##           1

Check batsmen in-form, out-of-form

  1. Maxwell, Devilliers, Sehwag are in-form. This is also evident from the moving average plot
  2. Gayle is out-of-form
checkBatsmanInForm("./sehwag.csv","Sehwag")
## *******************************************************************************************
## 
## Population size: 143  Mean of population: 33.76 
## Sample size: 16  Mean of sample: 37.44 SD of sample: 55.15 
## 
## Null hypothesis H0 : Sehwag 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Sehwag 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Sehwag 's Form Status: In-Form because the p value: 0.603525  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./devilliers.csv","Devilliers")
## *******************************************************************************************
## 
## Population size: 111  Mean of population: 43.5 
## Sample size: 13  Mean of sample: 57.62 SD of sample: 40.69 
## 
## Null hypothesis H0 : Devilliers 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Devilliers 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Devilliers 's Form Status: In-Form because the p value: 0.883541  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gayle.csv","Gayle")
## *******************************************************************************************
## 
## Population size: 140  Mean of population: 37.1 
## Sample size: 16  Mean of sample: 17.25 SD of sample: 20.25 
## 
## Null hypothesis H0 : Gayle 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Gayle 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Gayle 's Form Status: Out-of-Form because the p value: 0.000609  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./maxwell.csv","Maxwell")
## *******************************************************************************************
## 
## Population size: 28  Mean of population: 25.25 
## Sample size: 4  Mean of sample: 64.25 SD of sample: 36.97 
## 
## Null hypothesis H0 : Maxwell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Maxwell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Maxwell 's Form Status: In-Form because the p value: 0.948744  is greater than alpha=  0.05"
## *******************************************************************************************

Analysis of bowlers

  1. Mitchell Johnson (Aus) – Innings-150, Wickets – 239, Econ Rate : 4.83
  2. Lasith Malinga (SL)- Innings-182, Wickets – 287, Econ Rate : 5.26
  3. Dale Steyn (SA)- Innings-103, Wickets – 162, Econ Rate : 4.81
  4. Tim Southee (NZ)- Innings-96, Wickets – 135, Econ Rate : 5.33

Malinga has the highest number of innings and wickets followed closely by Mitchell. Steyn and Southee have relatively fewer innings.

To get the bowler’s data use

malinga <- getPlayerDataOD(49758,dir=".",file="malinga.csv",type="bowling")

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mitchell.csv","J Mitchell")
bowlerWktsFreqPercent("./malinga.csv","Malinga")
bowlerWktsFreqPercent("./steyn.csv","Steyn")
bowlerWktsFreqPercent("./southee.csv","southee")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. M Johnson and Steyn are more economical than Malinga and Southee corroborating the figures above

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./mitchell.csv","J Mitchell")
bowlerWktsRunsPlot("./malinga.csv","Malinga")
bowlerWktsRunsPlot("./steyn.csv","Steyn")
bowlerWktsRunsPlot("./southee.csv","southee")

wktsrun-1

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./mitchell.csv","J Mitchell")
bowlerAvgWktsOpposition("./mitchell.csv","J Mitchell")

gr-1-1

dev.off()
## null device 
##           1

B. Lasith Malinga

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./malinga.csv","Malinga")
bowlerAvgWktsOpposition("./malinga.csv","Malinga")

gr-2-1

dev.off()
## null device 
##           1

C. Dale Steyn

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./steyn.csv","Steyn")
bowlerAvgWktsOpposition("./steyn.csv","Steyn")

gr-3-1

dev.off()
## null device 
##           1

D. Tim Southee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./southee.csv","southee")
bowlerAvgWktsOpposition("./southee.csv","southee")

avgrg-4-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson and Southee have more wickets in 3-4 wickets range while Steyn and Malinga in 1-2 wicket range

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Steyn had the best economy rate followed by M Johnson. Malinga and Southee have a poorer economy rate

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingERODTT(frames,names)

relBowlER-1

Moving average of wickets over career

Johnson and Steyn career vs wicket graph is on the up-swing. Southee is maintaining a reasonable record while Malinga shows a decline in ODI performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mitchell.csv","M Johnson")
bowlerMovingAverage("./malinga.csv","Malinga")
bowlerMovingAverage("./steyn.csv","Steyn")
bowlerMovingAverage("./southee.csv","Southee")

jmss-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./mitchell.csv","M Johnson")
bowlerPerfForecast("./malinga.csv","Malinga")
bowlerPerfForecast("./steyn.csv","Steyn")
bowlerPerfForecast("./southee.csv","southee")

jsba-pfcst-1

dev.off()
## null device 
##           1

Check bowler in-form, out-of-form

All the bowlers are shown to be still in-form

checkBowlerInForm("./mitchell.csv","J Mitchell")
## *******************************************************************************************
## 
## Population size: 135  Mean of population: 1.55 
## Sample size: 15  Mean of sample: 2 SD of sample: 1.07 
## 
## Null hypothesis H0 : J Mitchell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : J Mitchell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "J Mitchell 's Form Status: In-Form because the p value: 0.937917  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./malinga.csv","Malinga")
## *******************************************************************************************
## 
## Population size: 163  Mean of population: 1.58 
## Sample size: 19  Mean of sample: 1.58 SD of sample: 1.22 
## 
## Null hypothesis H0 : Malinga 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Malinga 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Malinga 's Form Status: In-Form because the p value: 0.5  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./steyn.csv","Steyn")
## *******************************************************************************************
## 
## Population size: 93  Mean of population: 1.59 
## Sample size: 11  Mean of sample: 1.45 SD of sample: 0.69 
## 
## Null hypothesis H0 : Steyn 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Steyn 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Steyn 's Form Status: In-Form because the p value: 0.257438  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./southee.csv","southee")
## *******************************************************************************************
## 
## Population size: 86  Mean of population: 1.48 
## Sample size: 10  Mean of sample: 0.8 SD of sample: 1.14 
## 
## Null hypothesis H0 : southee 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : southee 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "southee 's Form Status: Out-of-Form because the p value: 0.044302  is less than alpha=  0.05"
## *******************************************************************************************

***************

Key findings

Here are some key conclusions ODI batsmen

  1. AB Devilliers has high frequency of runs in the 60-120 range and the highest average
  2. Sehwag has the most number of innings and good strike rate
  3. Maxwell has the best strike rate but it should be kept in mind that he has 1/5 of the innings of Sehwag. We need to see how he progress further
  4. Sehwag has the highest percentage of 4s in the runs scored, while Maxwell has the most 6s
  5. For a hypothetical Balls Faced and Minutes at creases Maxwell will score the most runs followed by Sehwag
  6. The moving average of indicates that the best is yet to come for Devilliers and Maxwell. Sehwag has a few more years in him while Gayle shows a decline in ODI performance and an out of form is indicated.

ODI bowlers

  1. Malinga has the highest played the highest innings and also has the highest wickets though he has poor economy rate
  2. M Johnson is the most effective in the 3-4 wicket range followed by Southee
  3. M Johnson and Steyn has the best overall economy rate followed by Malinga and Steyn 4 M Johnson and Steyn’s career is on the up-swing,Southee maintains a steady consistent performance, while Malinga shows a downward trend

Hasta la vista! I’ll be back!
Watch this space!

Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. cricketr digs the Ashes!
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Mirror, mirror . the best batsman of them all?

You may also like

  1. A closer look at “Robot Horse on a Trot” in Android
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  6. Deblurring with OpenCV:Weiner filter reloadedhttp://www.r-bloggers.com/cricketr-plays-the-odis/

Adventures in LogParser, HTA and charts

In my earlier post “Slicing and dicing with LogParser & VBA”  I had mentioned that LogParser is really a slick Microsoft utility that can be used to obtain information on files, event logs, IIS logs etc. Continuing on the journey in LogParser I came to realize that you can also create cool charts with output of LogParser which can be either a line graph, a pie chart , a 3D pie chart a 3D bar chart etc. The options are many. So I started to play around with the utility.

To create a chart you can run the command from a LogParser prompt. Some samples are shown below

LogParser “SELECT TOP  10 TO_LOWERCASE (Name) AS NewName, Size INTO .\chart.gif, Path, LastWriteTime FROM ‘” &   files & “‘ WHERE NOT Attributes LIKE ‘%D%’ AND NOT ATTRIBUTES LIKE ‘%H%’ ORDER BY Size DESC ” &   “-chartType:column3D -i:FS –chartTitle: “My chart”

This will create a gif file with a 3D bar chart with the top 10 files by size

Similarly you could also create a 3D pie chart as follows

LogParser “SELECT TOP 5 Name, Size INTO c:\Chart.gif FROM C:\*.* ORDER BY Size DESC” -chartType:PieExploded3D  -i:FS

However I wanted to create these charts in a HTA application and display it dynamically along with the output of LogParser.  Thankfully the procedure is very similar. Here is what you need to do for this.

To do this you need to set up the environment as below. I have used VBscript.

Set objLogParser = CreateObject(“MSUtil.LogQuery”)

Set objInputFormat =   CreateObject(“MSUtil.LogQuery.FileSystemInputFormat”)

Then you need to specify the chart options

Set objOutputChartFormat = CreateObject(“MSUtil.LogQuery.ChartOutputFormat”)

objOutputChartFormat.groupSize = “400×300”

objOutputChartFormat.fileType = “GIF”

objOutputChartFormat.chartType = “Column3D”

objOutputChartFormat.categories = “ON”

objOutputChartFormat.values = “ON”

objOutputChartFormat.legend = “ON”

Finally create a LogParser query and execute it as shown below where “topN” & the directory path “files” is taken as input from the user

strQuery1 = “SELECT TOP ” & topN & ” Name, Size INTO c:\tes\filesize.gif FROM ” &  files & ” WHERE NOT Attributes LIKE ‘%D%’ AND NOT ATTRIBUTES LIKE ‘%H%’ ORDER BY Size DESC ”

objOutputChartFormat.config = “c:\test\FileSize.js”

Set objRecordSet1 = objLogParser.ExecuteBatch(strQuery1,  objInputFormat , objOutputChartFormat )

To specify the chart title, the X axis & Y axis a javascript/VBscript file has to be created as  below (FileSize.js) which is specified in objOutputChartFormat.config

FileSize.js (contents)

// Set the title above the chart.

chart.HasTitle = true;

chart.Title.Caption = “Top N files by size”

 

// Set the border style for the chart.

chartSpace.Border.Color = “#000000”;

chartSpace.Border.Weight = 2;

 

// Change the background color for the plot area.

chart.PlotArea.Interior.Color = “#f0f0f0”;

 

// Set the font size for the chart values.

chart.SeriesCollection(0).DataLabelsCollection(0).Font.Size = 6;

 

// Set the caption below the chart.

chartSpace.HasChartSpaceTitle = true;

chartSpace.ChartSpaceTitle.Caption =

    “This chart shows the Top N files by file sizes in the specified directory “;

 

chartSpace.ChartSpaceTitle.Font.Size = 10;

chartSpace.ChartSpaceTitle.Position = chartSpace.Constants.chTitlePositionBottom;

 

// Set the style and caption for the Y axis.

chart.Axes(0).Font.Size = 8;

chart.Axes(0).HasTitle = true;

chart.Axes(0).Title.Caption = “File Name”;

chart.Axes(0).Title.Font.Size = 9;

 

// Set the style and caption for the X axis.

chart.Axes(1).Font.Size = 7;

chart.Axes(1).HasTitle = true;

chart.Axes(1).Title.Caption = “Size in bytes”;

chart.Axes(1).Title.Font.Size = 9;

 

 

Lastly to display the chart dynamically as it is created in the HTA file do the following

imagearea.innerHTML = “

where imagearea will be specified in the HTML portion as

where filesize.png is any  image prior to the creation of the chart through LogParser.

A sample output is shown below

LogParser charts are really cool and well worth the effort!

Also see
Brewing a potion with Bluemix, PostgreSQL, Node.js in the cloud
A Bluemix recipe with MongoDB and Node.js A Cloud medley with IBM Bluemix, Cloudant DB and Node.js
Rock N’ Roll with Bluemix, Cloudant & NodeExpress

You may also like
– A crime map of India in R: Crimes against women
– What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
– Bend it like Bluemix, MongoDB with autoscaling – Part 1
– Analyzing cricket’s batting legends – Through the mirage with R
– Masters of spin: Unraveling the web with R

Find me on Google+