Introducing cricpy:A python package to analyze performances of cricketers


Full many a gem of purest ray serene,
The dark unfathomed caves of ocean bear;
Full many a flower is born to blush unseen,
And waste its sweetness on the desert air.

            Thomas Gray, An Elegy Written In A Country Churchyard
            

Introduction

It is finally here! cricpy, the python avatar , of my R package cricketr is now ready to rock-n-roll! My R package cricketr had its genesis about 3 and some years ago and went through a couple of enhancements. During this time I have always thought about creating an equivalent python package like cricketr. Now I have finally done it.

So here it is. My python package ‘cricpy!!!’

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

This post is also hosted on Rpubs at Introducing cricpy. You can also download the pdf version of this post at cricpy.pdf

Do check out my post on R package cricketr at Re-introducing cricketr! : An R package to analyze performances of cricketers

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This package uses the statistics info available in ESPN Cricinfo Statsguru.

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

The cricpy package

The cricpy package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecasting, performance of a player against different oppositions, contribution to wins and losses etc.

The data for a particular player can be obtained with the getPlayerData() function. To do this you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli, Alastair Cook etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below

The cricpy package is almost a clone of my R package cricketr. The signature of all the python functions are identical with that of its R avatar namely  ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familiar with one of the languages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the cricpy package at Github cricpy

The following 2 examples show the similarity between cricketr and cricpy packages

1a.Importing cricketr – R

Importing cricketr in R

#install.packages("cricketr")
library(cricketr)

2a. Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy
# You could either do
#1.  
import cricpy.analytics as ca 
#ca.batsman4s("../dravid.csv","Rahul Dravid")
# Or
#2.
from cricpy.analytics import *
#batsman4s("../dravid.csv","Rahul Dravid")

I would recommend using option 1 namely ca.batsman4s() as I may add an advanced analytics module in the future to cricpy.

2 Invoking functions

You can seen how the 2 calls are identical for both the R package cricketr and the Python package cricpy

2a. Invoking functions with R package ‘cricketr’

library(cricketr)
batsman4s("../dravid.csv","Rahul Dravid")

2b. Invoking functions with Python package ‘cricpy’

import cricpy.analytics as ca 
ca.batsman4s("../dravid.csv","Rahul Dravid")

 

3a. Getting help from cricketr – R

#help("getPlayerData")

3b. Getting help from cricpy – Python

help(ca.getPlayerData)
## Help on function getPlayerData in module cricpy.analytics:
## 
## getPlayerData(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2], result=[1, 2, 4], create=True)
##     Get the player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerData(profile,opposition="",host="",dir="./data",file="player001.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Sachin Tendulkar this turns out to be http://www.espncricinfo.com/india/content/player/35320.html. Hence the profile for Sachin is 35320
##     opposition  
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     host        
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is a list with either 1,2 or both. 1 is for home 2 is for away
##     result      
##     This is a list that can take values 1,2,4. 1 - won match 2- lost match 4- draw
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp
##     
##     Examples
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar1.csv",
##     type="batting", homeOrAway=[1,2],result=[1,2,4])
##     
##     # Only away. Get data only for won and lost innings
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar2.csv",
##     type="batting",homeOrAway=[2],result=[1,2])
##     
##     # Get bowling data and store in file for future
##     kumble = getPlayerData(30176,dir=".",file="kumble1.csv",
##     type="bowling",homeOrAway=[1],result=[1,2])
##     
##     #Get the Tendulkar's Performance against Australia in Australia
##     tendulkar = getPlayerData(35320, opposition = 2,host=2,dir=".", 
##     file="tendulkarVsAusInAus.csv",type="batting")

The details below will introduce the different functions that are available in cricpy.

3. Get the player data for a player using the function getPlayerData()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

import cricpy.analytics as ca
#dravid =ca.getPlayerData(28114,dir="..",file="dravid.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#acook =ca.getPlayerData(11728,dir="..",file="acook.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
import cricpy.analytics as ca
#lara =ca.getPlayerData(52337,dir="..",file="lara.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])253802
#kohli =ca.getPlayerData(253802,dir="..",file="kohli.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])

4 Rahul Dravid’s performance – Basic Analyses

The 3 plots below provide the following for Rahul Dravid

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("../dravid.csv","Rahul Dravid")

ca.batsmanMeanStrikeRate("../dravid.csv","Rahul Dravid")

ca.batsmanRunsRanges("../dravid.csv","Rahul Dravid") 

5. More analyses

import cricpy.analytics as ca
ca.batsman4s("../dravid.csv","Rahul Dravid")

ca.batsman6s("../dravid.csv","Rahul Dravid") 

ca.batsmanDismissals("../dravid.csv","Rahul Dravid")

6. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Dravid Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")

7. Average runs at different venues

The plot below gives the average runs scored by Dravid at different grounds. The plot also the number of innings at each ground as a label at x-axis. It can be seen Dravid did great in Rawalpindi, Leeds, Georgetown overseas and , Mohali and Bangalore at home

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("../dravid.csv","Rahul Dravid")

 

8. Average runs against different opposing teams

This plot computes the average runs scored by Dravid against different countries. Dravid has an average of 50+ in England, New Zealand, West Indies and Zimbabwe.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("../dravid.csv","Rahul Dravid")

9 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Dravid’s  highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("../dravid.csv","Rahul Dravid")

10. A look at the Top 4 batsman – Rahul Dravid, Alastair Cook, Brian Lara and Virat Kohli

The following batsmen have been very prolific in test cricket and will be used for teh analyses

  1. Rahul Dravid :Average:52.31,100’s – 36, 50’s – 63
  2. Alastair Cook : Average: 45.35, 100’s – 33, 50’s – 57
  3. Brian Lara : Average: 52.88, 100’s – 34 , 50’s – 48
  4. Virat Kohli: Average: 54.57 ,100’s – 24 , 50’s – 19

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

11. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("../dravid.csv","Rahul Dravid")

ca.batsmanPerfBoxHist("../acook.csv","Alastair Cook")

ca.batsmanPerfBoxHist("../lara.csv","Brian Lara")


ca.batsmanPerfBoxHist("../kohli.csv","Virat Kohli")


12. Contribution to won and lost matches

The plot below shows the contribution of Dravid, Cook, Lara and Kohli in matches won and lost. It can be seen that in matches where India has won Dravid and Kohli have scored more and must have been instrumental in the win

For the 2 functions below you will have to use the getPlayerDataSp() function as shown below. I have commented this as I already have these files

import cricpy.analytics as ca
#dravidsp = ca.getPlayerDataSp(28114,tdir=".",tfile="dravidsp.csv",ttype="batting")
#acooksp = ca.getPlayerDataSp(11728,tdir=".",tfile="acooksp.csv",ttype="batting")
#larasp = ca.getPlayerDataSp(52337,tdir=".",tfile="larasp.csv",ttype="batting")
#kohlisp = ca.getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
import cricpy.analytics as ca
ca.batsmanContributionWonLost("../dravidsp.csv","Rahul Dravid")

ca.batsmanContributionWonLost("../acooksp.csv","Alastair Cook")

ca.batsmanContributionWonLost("../larasp.csv","Brian Lara")

ca.batsmanContributionWonLost("../kohlisp.csv","Virat Kohli")


13. Performance at home and overseas

From the plot below it can be seen

Dravid has a higher median overseas than at home.Cook, Lara and Kohli have a lower median of runs overseas than at home.

This function also requires the use of getPlayerDataSp() as shown above

import cricpy.analytics as ca
ca.batsmanPerfHomeAway("../dravidsp.csv","Rahul Dravid")

ca.batsmanPerfHomeAway("../acooksp.csv","Alastair Cook")

ca.batsmanPerfHomeAway("../larasp.csv","Brian Lara")

ca.batsmanPerfHomeAway("../kohlisp.csv","Virat Kohli")

14 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Lara’s performance seems to have been quite good before his retirement(wonder why retired so early!). Kohli’s performance has been steadily improving over the years

import cricpy.analytics as ca
ca.batsmanMovingAverage("../dravid.csv","Rahul Dravid")

ca.batsmanMovingAverage("../acook.csv","Alastair Cook")

ca.batsmanMovingAverage("../lara.csv","Brian Lara")

ca.batsmanMovingAverage("../kohli.csv","Virat Kohli")

15 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Dravid averages around 48, Cook around 44, Lara around 50 and Kohli shows a steady improvement in his cumulative average. Kohli seems to be getting better with time.

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("../dravid.csv","Rahul Dravid")

ca.batsmanCumulativeAverageRuns("../acook.csv","Alastair Cook")

ca.batsmanCumulativeAverageRuns("../lara.csv","Brian Lara")

ca.batsmanCumulativeAverageRuns("../kohli.csv","Virat Kohli")

16 Cumulative Average strike rate of batsman in career

Lara has a terrific strike rate of 52+. Cook has a better strike rate over Dravid. Kohli’s strike rate has improved over the years.

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("../dravid.csv","Rahul Dravid")

ca.batsmanCumulativeStrikeRate("../acook.csv","Alastair Cook")

ca.batsmanCumulativeStrikeRate("../lara.csv","Brian Lara")

ca.batsmanCumulativeStrikeRate("../kohli.csv","Virat Kohli")


17 Future Runs forecast

Here are plots that forecast how the batsman will perform in future. Currently ARIMA has been used for the forecast. (To do:  Perform Holt-Winters forecast!)

import cricpy.analytics as ca
ca.batsmanPerfForecast("../dravid.csv","Rahul Dravid")
##                              ARIMA Model Results                              
## ==============================================================================
## Dep. Variable:                 D.runs   No. Observations:                  284
## Model:                 ARIMA(5, 1, 0)   Log Likelihood               -1522.837
## Method:                       css-mle   S.D. of innovations             51.488
## Date:                Sun, 28 Oct 2018   AIC                           3059.673
## Time:                        09:47:39   BIC                           3085.216
## Sample:                    07-04-1996   HQIC                          3069.914
##                          - 01-24-2012                                         
## ================================================================================
##                    coef    std err          z      P>|z|      [0.025      0.975]
## --------------------------------------------------------------------------------
## const           -0.1336      0.884     -0.151      0.880      -1.867       1.599
## ar.L1.D.runs    -0.7729      0.058    -13.322      0.000      -0.887      -0.659
## ar.L2.D.runs    -0.6234      0.071     -8.753      0.000      -0.763      -0.484
## ar.L3.D.runs    -0.5199      0.074     -7.038      0.000      -0.665      -0.375
## ar.L4.D.runs    -0.3490      0.071     -4.927      0.000      -0.488      -0.210
## ar.L5.D.runs    -0.2116      0.058     -3.665      0.000      -0.325      -0.098
##                                     Roots                                    
## =============================================================================
##                  Real           Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## AR.1            0.5789           -1.1743j            1.3093           -0.1771
## AR.2            0.5789           +1.1743j            1.3093            0.1771
## AR.3           -1.3617           -0.0000j            1.3617           -0.5000
## AR.4           -0.7227           -1.2257j            1.4230           -0.3348
## AR.5           -0.7227           +1.2257j            1.4230            0.3348
## -----------------------------------------------------------------------------
##                 0
## count  284.000000
## mean    -0.306769
## std     51.632947
## min   -106.653589
## 25%    -33.835148
## 50%     -8.954253
## 75%     21.024763
## max    223.152901
## 
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:646: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
##   if issubdtype(paramsdtype, float):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:650: FutureWarning: Conversion of the second argument of issubdtype from `complex` to `np.complexfloating` is deprecated. In future, it will be treated as `np.complex128 == np.dtype(complex).type`.
##   elif issubdtype(paramsdtype, complex):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:577: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
##   if issubdtype(paramsdtype, float):

18 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 30 – 100 innings – Lara leads followed by Dravid Range 100+ innings – Kohli races ahead of the rest

import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

19. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

Brian Lara towers over the Dravid, Cook and Kohli. However you will notice that Kohli’s strike rate is going up

import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

20. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")

ca.battingPerf3d("../acook.csv","Alastair Cook")

ca.battingPerf3d("../lara.csv","Brian Lara")

ca.battingPerf3d("../kohli.csv","Virat Kohli")

21. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
dravid = ca.batsmanRunsPredict("../dravid.csv",newDF,"Dravid")
print(dravid)
##             BF        Mins        Runs
## 0    10.000000   30.000000    0.519667
## 1    37.857143   70.714286   13.821794
## 2    65.714286  111.428571   27.123920
## 3    93.571429  152.142857   40.426046
## 4   121.428571  192.857143   53.728173
## 5   149.285714  233.571429   67.030299
## 6   177.142857  274.285714   80.332425
## 7   205.000000  315.000000   93.634552
## 8   232.857143  355.714286  106.936678
## 9   260.714286  396.428571  120.238805
## 10  288.571429  437.142857  133.540931
## 11  316.428571  477.857143  146.843057
## 12  344.285714  518.571429  160.145184
## 13  372.142857  559.285714  173.447310
## 14  400.000000  600.000000  186.749436

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

22 Analysis of Top 3 wicket takers

The following 3 bowlers have had an excellent career and will be used for the analysis

  1. Glenn McGrath:Wickets: 563, Average = 21.64, Economy Rate – 2.49
  2. Kapil Dev : Wickets: 434, Average = 29.64, Economy Rate – 2.78
  3. James Anderson: Wickets: 564, Average = 28.64, Economy Rate – 2.88

How do Glenn McGrath, Kapil Dev and James Anderson compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

23. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#mcgrath =ca.getPlayerData(6565,dir=".",file="mcgrath.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#kapil =ca.getPlayerData(30028,dir=".",file="kapil.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#anderson =ca.getPlayerData(8608,dir=".",file="anderson.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])

24. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("../mcgrath.csv","Glenn McGrath")

ca.bowlerWktsFreqPercent("../kapil.csv","Kapil Dev")

ca.bowlerWktsFreqPercent("../anderson.csv","James Anderson")

25. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("../mcgrath.csv","Glenn McGrath")

ca.bowlerWktsRunsPlot("../kapil.csv","Kapil Dev")

ca.bowlerWktsRunsPlot("../anderson.csv","James Anderson")

26 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("../mcgrath.csv","Glenn McGrath")

ca.bowlerAvgWktsGround("../kapil.csv","Kapil Dev")

ca.bowlerAvgWktsGround("../anderson.csv","James Anderson")

27 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("../mcgrath.csv","Glenn McGrath")

ca.bowlerAvgWktsOpposition("../kapil.csv","Kapil Dev")

ca.bowlerAvgWktsOpposition("../anderson.csv","James Anderson")

28 Wickets taken moving average

From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets

import cricpy.analytics as ca
ca.bowlerMovingAverage("../mcgrath.csv","Glenn McGrath")

ca.bowlerMovingAverage("../kapil.csv","Kapil Dev")

ca.bowlerMovingAverage("../anderson.csv","James Anderson")

29 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. mcGrath plateaus around 2.4 wickets, Kapil Dev’s performance deteriorates over the years. Anderson holds on rock steady around 2 wickets

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("../mcgrath.csv","Glenn McGrath")

ca.bowlerCumulativeAvgWickets("../kapil.csv","Kapil Dev")

ca.bowlerCumulativeAvgWickets("../anderson.csv","James Anderson")

30 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. McGrath’s was very expensive early in his career conceding about 2.8 runs per over which drops to around 2.5 runs towards the end. Kapil Dev’s economy rate drops from 3.6 to 2.8. Anderson is probably more expensive than the other 2.

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("../mcgrath.csv","Glenn McGrath")

ca.bowlerCumulativeAvgEconRate("../kapil.csv","Kapil Dev")

ca.bowlerCumulativeAvgEconRate("../anderson.csv","James Anderson")

31 Future Wickets forecast

import cricpy.analytics as ca
ca.bowlerPerfForecast("../mcgrath.csv","Glenn McGrath")
##                              ARIMA Model Results                              
## ==============================================================================
## Dep. Variable:              D.Wickets   No. Observations:                  236
## Model:                 ARIMA(5, 1, 0)   Log Likelihood                -480.815
## Method:                       css-mle   S.D. of innovations              1.851
## Date:                Sun, 28 Oct 2018   AIC                            975.630
## Time:                        09:28:32   BIC                            999.877
## Sample:                    11-12-1993   HQIC                           985.404
##                          - 01-02-2007                                         
## ===================================================================================
##                       coef    std err          z      P>|z|      [0.025      0.975]
## -----------------------------------------------------------------------------------
## const               0.0037      0.033      0.113      0.910      -0.061       0.068
## ar.L1.D.Wickets    -0.9432      0.064    -14.708      0.000      -1.069      -0.818
## ar.L2.D.Wickets    -0.7254      0.086     -8.469      0.000      -0.893      -0.558
## ar.L3.D.Wickets    -0.4827      0.093     -5.217      0.000      -0.664      -0.301
## ar.L4.D.Wickets    -0.3690      0.085     -4.324      0.000      -0.536      -0.202
## ar.L5.D.Wickets    -0.1709      0.064     -2.678      0.008      -0.296      -0.046
##                                     Roots                                    
## =============================================================================
##                  Real           Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## AR.1            0.5630           -1.2761j            1.3948           -0.1839
## AR.2            0.5630           +1.2761j            1.3948            0.1839
## AR.3           -0.8433           -1.0820j            1.3718           -0.3554
## AR.4           -0.8433           +1.0820j            1.3718            0.3554
## AR.5           -1.5981           -0.0000j            1.5981           -0.5000
## -----------------------------------------------------------------------------
##                 0
## count  236.000000
## mean    -0.005142
## std      1.856961
## min     -3.457002
## 25%     -1.433391
## 50%     -0.080237
## 75%      1.446149
## max      5.840050

32 Get player data special

As discussed above the next 2 charts require the use of getPlayerDataSp()

import cricpy.analytics as ca
#mcgrathsp =ca.getPlayerDataSp(6565,tdir=".",tfile="mcgrathsp.csv",ttype="bowling")
#kapilsp =ca.getPlayerDataSp(30028,tdir=".",tfile="kapilsp.csv",ttype="bowling")
#andersonsp =ca.getPlayerDataSp(8608,tdir=".",tfile="andersonsp.csv",ttype="bowling")

33 Contribution to matches won and lost

The plot below is extremely interesting Glenn McGrath has been more instrumental in Australia winning than Kapil and Anderson as seems to have taken more wickets when Australia won.

import cricpy.analytics as ca
ca.bowlerContributionWonLost("../mcgrathsp.csv","Glenn McGrath")

ca.bowlerContributionWonLost("../kapilsp.csv","Kapil Dev")

ca.bowlerContributionWonLost("../andersonsp.csv","James Anderson")

34 Performance home and overseas

McGrath and Kapil Dev have performed better overseas than at home. Anderson has performed about the same home and overseas

import cricpy.analytics as ca
ca.bowlerPerfHomeAway("../mcgrathsp.csv","Glenn McGrath")

ca.bowlerPerfHomeAway("../kapilsp.csv","Kapil Dev")

ca.bowlerPerfHomeAway("../andersonsp.csv","James Anderson")

35 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate shows that McGrath has the best economy rate followed by Kapil Dev and then Anderson.

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

36 Relative Economy Rate against wickets taken

McGrath has been economical regardless of the number of wickets taken. Kapil Dev has been slightly more expensive when he takes more wickets

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlingER(frames,names)

37 Relative cumulative average wickets of bowlers in career

The plot below shows that McGrath has the best overall cumulative average wickets. Kapil’s leads Anderson till about 150 innings after which Anderson takes over

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

Key insights

1. Brian Lara is head and shoulders above the rest in the overall strike rate
2. Kohli performance has been steadily improving over the years and with the way he is going he will shatter all records.
3. Kohli and Dravid have scored more in matches where India has won than the other two.
4. Dravid has performed very well overseas
5. The cumulative average runs has Kohli just edging out the other 3. Kohli is probably midway in his career but considering that his moving average is improving strongly, we can expect great things of him with the way he is going.
6. McGrath has had some great performances overseas
7. Mcgrath has the best economy rate and has contributed significantly to Australia’s wins.
8.In the cumulative average wickets race McGrath leads the pack. Kapil leads Anderson till about 150 matches after which Anderson takes over.

The code for cricpy can be accessed at Github at cricpy

Do let me know if you run into issues.

Conclusion

I have long wanted to make a python equivalent of cricketr and I have been able to make it. cricpy is still work in progress. I have add the necessary functions for ODI and Twenty20.  Go ahead give ‘cricpy’ a spin!!

Stay tuned!

My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon


Note: The 3rd edition of this book is now available My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon

The third edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback ($12.99) and kindle ($9.99/Rs449) versions.  This second edition includes more content,  extensive comments and formatting for better readability.

In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code.
1. Practical machine with R and Python: Third Edition – Machine Learning in Stereo(Paperback-$12.99)
2. Practical machine with R and Third Edition – Machine Learning in Stereo(Kindle- $9.99/Rs449)

This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Table of Contents
Preface …………………………………………………………………………….4
Introduction ………………………………………………………………………6
1. Essential R ………………………………………………………………… 8
2. Essential Python for Datascience ……………………………………………57
3. R vs Python …………………………………………………………………81
4. Regression of a continuous variable ……………………………………….101
5. Classification and Cross Validation ………………………………………..121
6. Regression techniques and regularization ………………………………….146
7. SVMs, Decision Trees and Validation curves ………………………………191
8. Splines, GAMs, Random Forests and Boosting ……………………………222
9. PCA, K-Means and Hierarchical Clustering ………………………………258
References ……………………………………………………………………..269

Pick up your copy today!!
Hope you have a great time learning as I did while implementing these algorithms!

Presentation on ‘Machine Learning in plain English – Part 3


This is the 3rd and final part of Machine Learning in plain English -Part 3. In this presentation, I discuss the intuition behind SVMs, B-Splines, GAMs, Decision Trees, Random Forest and Gradient Boosting. Also I touch upon Unsupervised Learning, specifically PCA and K-Means. As before the presentation does not include any math or programming. The presentation can be seen below


The implementations of all the discussed algorithm are are available in my book which is available on Amazon My book ‘Practical Machine Learning with R and Python’ on Amazon

You may also like
1. My TEDx talk on the “Internet of Things”
2. Deep Learning from first principles in Python, R and Octave – Part 2
3. De-blurring revisited with Wiener filter using OpenCV
4. Architecting a cloud based IP Multimedia System (IMS)
5.The 3rd paperback & kindle editions of my books on Cricket, now on Amazon

To see all posts click Index of posts

My book ‘Practical Machine Learning with R and Python’ on Amazon


Note: The 3rd edition of this book is now available My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon

My book ‘Practical Machine Learning with R and Python: Second Edition – Machine Learning in stereo’ is now available in both paperback ($10.99) and kindle ($7.99/Rs449) versions. In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code. This is almost like listening to parallel channels of music in stereo!
1. Practical machine with R and Python: Third Edition – Machine Learning in Stereo(Paperback-$12.99)
2. Practical machine with R and Python Third Edition – Machine Learning in Stereo(Kindle- $8.99/Rs449)
This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Table of Contents
Essential R …………………………………….. 7
Essential Python for Datascience ………………..   54
R vs Python ……………………………………. 77
Regression of a continuous variable ………………. 96
Classification and Cross Validation ……………….113
Regression techniques and regularization …………. 134
SVMs, Decision Trees and Validation curves …………175
Splines, GAMs, Random Forests and Boosting …………202
PCA, K-Means and Hierarchical Clustering …………. 234

Pick up your copy today!!
Hope you have a great time learning as I did while implementing these algorithms!

Practical Machine Learning with R and Python – Part 6


Introduction

This is the final and concluding part of my series on ‘Practical Machine Learning with R and Python’. In this series I included the implementations of the most common Machine Learning algorithms in R and Python. The algorithms implemented were

1. Practical Machine Learning with R and Python – Part 1 In this initial post, I touch upon regression of a continuous target variable. Specifically I touch upon Univariate, Multivariate, Polynomial regression and KNN regression in both R and Python
2. Practical Machine Learning with R and Python – Part 2 In this post, I discuss Logistic Regression, KNN classification and Cross Validation error for both LOOCV and K-Fold in both R and Python
3. Practical Machine Learning with R and Python – Part 3 This 3rd part included feature selection in Machine Learning. Specifically I touch best fit, forward fit, backward fit, ridge(L2 regularization) & lasso (L1 regularization). The post includes equivalent code in R and Python.
4. Practical Machine Learning with R and Python – Part 4 In this part I discussed SVMs, Decision Trees, Validation, Precision-Recall, AUC and ROC curves
5. Practical Machine Learning with R and Python – Part 5  In this penultimate part, I touch upon B-splines, natural splines, smoothing spline, Generalized Additive Models(GAMs), Decision Trees, Random Forests and Gradient Boosted Treess.

In this last part I cover Unsupervised Learning. Specifically I cover the implementations of Principal Component Analysis (PCA). K-Means and Heirarchical Clustering. You can download this R Markdown file from Github at MachineLearning-RandPython-Part6

Note: Please listen to my video presentations Machine Learning in youtube
1. Machine Learning in plain English-Part 1
2. Machine Learning in plain English-Part 2
3. Machine Learning in plain English-Part 3

Check out my compact and minimal book  “Practical Machine Learning with R and Python:Third edition- Machine Learning in stereo”  available in Amazon in paperback($12.99) and kindle($8.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!!

 

1.1a Principal Component Analysis (PCA) – R code

Principal Component Analysis is used to reduce the dimensionality of the input. In the code below 8 x 8 pixel of handwritten digits is reduced into its principal components. Then a scatter plot of the first 2 principal components give a very good visial representation of the data

library(dplyr)
library(ggplot2)
#Note: This example is adapted from an the example in the book Python Datascience handbook by 
# Jake VanderPlas (https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html)

# Read the digits data (From sklearn datasets)
digits= read.csv("digits.csv")
# Create a digits classes target variable
digitClasses <- factor(digits$X0.000000000000000000e.00.29)

#Invoke the Principal Componsent analysis on columns 1-64
digitsPCA=prcomp(digits[,1:64])

# Create a dataframe of PCA
df <- data.frame(digitsPCA$x)
# Bind the digit classes
df1 <- cbind(df,digitClasses)
# Plot only the first 2 Principal components as a scatter plot. This plot uses only the
# first 2 principal components 
ggplot(df1,aes(x=PC1,y=PC2,col=digitClasses)) + geom_point() +
  ggtitle("Top 2 Principal Components")

1.1 b Variance explained vs no principal components – R code

In the code below the variance explained vs the number of principal components is plotted. It can be seen that with 20 Principal components almost 90% of the variance is explained by this reduced dimensional model.

# Read the digits data (from sklearn datasets)
digits= read.csv("digits.csv")
# Digits target
digitClasses <- factor(digits$X0.000000000000000000e.00.29)
digitsPCA=prcomp(digits[,1:64])


# Get the Standard Deviation
sd=digitsPCA$sdev
# Compute the variance
digitsVar=digitsPCA$sdev^2
#Compute the percent variance explained
percentVarExp=digitsVar/sum(digitsVar)

# Plot the percent variance exlained as a function of the  number of principal components
#plot(cumsum(percentVarExp), xlab="Principal Component", 
#     ylab="Cumulative Proportion of Variance Explained", 
#     main="Principal Components vs % Variance explained",ylim=c(0,1),type='l',lwd=2,
#       col="blue")

1.1c Principal Component Analysis (PCA) – Python code

import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
  
from sklearn.datasets import load_digits
# Load the digits data
digits = load_digits()
# Select only the first 2 principal components
pca = PCA(2)  # project from 64 to 2 dimensions
#Compute the first 2 PCA
projected = pca.fit_transform(digits.data)

# Plot a scatter plot of the first 2 principal components
plt.scatter(projected[:, 0], projected[:, 1],
            c=digits.target, edgecolor='none', alpha=0.5,
            cmap=plt.cm.get_cmap('spectral', 10))
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.colorbar();
plt.title("Top 2 Principal Components")
plt.savefig('fig1.png', bbox_inches='tight')

1.1 b Variance vs no principal components

– Python code

import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
  
from sklearn.datasets import load_digits
digits = load_digits()
# Select all 64 principal components
pca = PCA(64)  # project from 64 to 2 dimensions
projected = pca.fit_transform(digits.data)

# Obtain the explained variance for each principal component
varianceExp= pca.explained_variance_ratio_
# Compute the total sum of variance
totVarExp=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)

# Plot the variance explained as a function of the number of principal components
plt.plot(totVarExp)
plt.xlabel('No of principal components')
plt.ylabel('% variance explained')
plt.title('No of Principal Components vs Total Variance explained')
plt.savefig('fig2.png', bbox_inches='tight')

1.2a K-Means – R code

In the code first the scatter plot of the first 2 Principal Components of the handwritten digits is plotted as a scatter plot. Over this plot 10 centroids of the 10 different clusters corresponding the 10 diferent digits is plotted over the original scatter plot.

library(ggplot2)
# Read the digits data
digits= read.csv("digits.csv")
# Create digit classes target variable
digitClasses <- factor(digits$X0.000000000000000000e.00.29)

# Compute the Principal COmponents
digitsPCA=prcomp(digits[,1:64])

# Create a data frame of Principal components and the digit classes 
df <- data.frame(digitsPCA$x)
df1 <- cbind(df,digitClasses)

# Pick only the first 2 principal components
a<- df[,1:2]
# Compute K Means of 10 clusters and allow for 1000 iterations
k<-kmeans(a,10,1000)

# Create a dataframe of the centroids of the clusters
df2<-data.frame(k$centers)

#Plot the first 2 principal components with the K Means centroids
ggplot(df1,aes(x=PC1,y=PC2,col=digitClasses)) + geom_point() +
    geom_point(data=df2,aes(x=PC1,y=PC2),col="black",size = 4) + 
    ggtitle("Top 2 Principal Components with KMeans clustering") 

1.2b K-Means – Python code

The centroids of the 10 different handwritten digits is plotted over the scatter plot of the first 2 principal components.

import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.cluster import KMeans
digits = load_digits()

# Select only the 1st 2 principal components
pca = PCA(2)  # project from 64 to 2 dimensions
projected = pca.fit_transform(digits.data)

# Create 10 different clusters
kmeans = KMeans(n_clusters=10)

# Compute  the clusters
kmeans.fit(projected)
y_kmeans = kmeans.predict(projected)
# Get the cluster centroids
centers = kmeans.cluster_centers_
centers

#Create a scatter plot of the first 2 principal components
plt.scatter(projected[:, 0], projected[:, 1],
            c=digits.target, edgecolor='none', alpha=0.5,
            cmap=plt.cm.get_cmap('spectral', 10))
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.colorbar();
# Overlay the centroids on the scatter plot
plt.scatter(centers[:, 0], centers[:, 1], c='darkblue', s=100)
plt.savefig('fig3.png', bbox_inches='tight')

1.3a Heirarchical clusters – R code

Herirachical clusters is another type of unsupervised learning. It successively joins the closest pair of objects (points or clusters) in succession based on some ‘distance’ metric. In this type of clustering we do not have choose the number of centroids. We can cut the created dendrogram mat an appropriate height to get a desired and reasonable number of clusters These are the following ‘distance’ metrics used while combining successive objects

  • Ward
  • Complete
  • Single
  • Average
  • Centroid
# Read the IRIS dataset
iris <- datasets::iris
iris2 <- iris[,-5]
species <- iris[,5]

#Compute the distance matrix
d_iris <- dist(iris2) 

# Use the 'average' method to for the clsuters
hc_iris <- hclust(d_iris, method = "average")

# Plot the clusters
plot(hc_iris)

# Cut tree into 3 groups
sub_grp <- cutree(hc_iris, k = 3)

# Number of members in each cluster
table(sub_grp)
## sub_grp
##  1  2  3 
## 50 64 36
# Draw rectangles around the clusters
rect.hclust(hc_iris, k = 3, border = 2:5)

1.3a Heirarchical clusters – Python code

from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# Load the IRIS data set
iris = load_iris()


# Generate the linkage matrix using the average method
Z = linkage(iris.data, 'average')

#Plot the dendrogram
#dendrogram(Z)
#plt.xlabel('Data')
#plt.ylabel('Distance')
#plt.suptitle('Samples clustering', fontweight='bold', fontsize=14);
#plt.savefig('fig4.png', bbox_inches='tight')

Conclusion

This is the last and concluding part of my series on Practical Machine Learning with R and Python. These parallel implementations of R and Python can be used as a quick reference while working on a large project. A person who is adept in one of the languages R or Python, can quickly absorb code in the other language.

Hope you find this series useful!

More interesting things to come. Watch this space!

References

  1. Statistical Learning, Prof Trevor Hastie & Prof Robert Tibesherani, Online Stanford
  2. Applied Machine Learning in Python Prof Kevyn-Collin Thomson, University Of Michigan, Coursera

Also see
1. The many faces of latency
2. Simulating a Web Join in Android
3. The Anamoly
4. yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions
5. Bend it like Bluemix, MongoDB using Auto-scale – Part 1!

To see all posts see ‘Index of posts

Introducing cricket package yorkr: Part 3-Foxed by flight!


Introduction

He will win, who knows when to fight and when not to fight.

He will win, who knows how to handle both superior and inferior forces

If you know neither the enemy nor yourself, you will succumb in every battle.

Hence the skilful fighter puts himself in a position which makes defeat impossible, and does not miss the moment for defeating the enemy.

Hence that general is skillful in attack whose opponent does not know what to defend; and he is skilled in defense whose opponent does know what to attack.

                                         The Art of War - Sun Tzu

This post is a continuation of my introduction to my latest cricket package yorkr. This is the 3rd part of the introduction, the 2 earlier ones were

  1. Introducing cricket package yorkr-Part1:Beaten by sheer pace!.
  2. Introducing cricket package yorkr: Part 2-Trapped leg before wicket!

This post deals with Class 3 functions, namely the performances of a team in all matches against all oppositions for e.g India/Australia/South Africa against all oppositions in all matches. In other words it is the performance of the team against the rest of the world.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This post has also been published at RPubs yorkr-Part3 and can also be downloaded as a PDF document from yorkr-Part3.pdf.

You can clone/fork the code for the package yorkr from Github at yorkr-package

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

The list of functions in Class 3 are

  1. teamBattingScorecardAllOppnAllMatches()
  2. teamBatsmenPartnershipAllOppnAllMatches()
  3. teamBatsmenPartnershipAllOppnAllMatchesPlot()
  4. teamBatsmenVsBowlersAllOppnAllMatchesRept()
  5. teamBatsmenVsBowlersAllOppnAllMatchesPlot()
  6. teamBowlingScorecardAllOppnAllMatchesMain()
  7. teamBowlersVsBatsmenAllOppnAllMatchesRept()
  8. teamBowlersVsBatsmenAllOppnAllMatchesPlot()
  9. teamBowlingWicketKindAllOppnAllMatches()
  10. teamBowlingWicketRunsAllOppnAllMatches()

Note 1: The yorkr package in its current avatar supports ODI, T20 and IPL T20 matches. 

Note 2: As in the previous parts the plots usually have the plot=TRUE/FALSE parameter. This is to allow the user to get a return value of the desired dataframe. The user can choose to plot this, in any way he/she likes for e.g in interactive charts using rcharts, ggvis,googleVis,plotly etc

1. Install the package from CRAN

The yorkr package can be installed directly from CRAN now! Install the yorkr package.

if (!require("yorkr")) {
  install.packages("yorkr") 
  library("yorkr")
}
rm(list=ls())

2. Get data for all matches against all oppositions for a team

We can get all matches against all oppositions for a team/country using the function below. The dir parameter should point to the folder in which the RData files of the individual matches exist. This function creates a data frame of all the matches and also saves the resulting dataframe as RData

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-team-allmatches-allOppositions")

# Get all matches against all oppositions for India and save as RData
matches <-getAllMatchesAllOpposition("India",dir=".",save=TRUE)
dim(matches)
## [1] 140655     25

“`

3. Save data for all matches against all oppositions

This can be done locally using the function below. This function gets all the matches of the country/team against all other countrioes//teams and combines them into a single dataframe and saves it in the current folder. The current implementation expects that the the RData files of individual matches are in ../data folder. Since I already have converted this I will not be running this again

#saveAllMatchesAllOpposition()

4. Load data directly for all matches between 2 teams

As in my earlier posts (yorkr-Part1 & yorkr-Part2) I have however already saved the data, for all matches of the individual countries, against all oppositons. The data for these matches for the individual teams/countries can be downloaded directly from Github folder at ODI-team-allmatches-allOppositions

Note: The dataframe for the different for all the matches of a country agaisnt all oppositons can be loaded directly into your code. As can be seen in the calls below the datframes are ~100,000+ rows x 25 columns. While I have 10+ functions to process these dataframes, for a particular team, feel free to download these data frames and perform your own analysis. The data frames include ball-by-ball details, details on non-striker, bowler, runs, extras, venue,date etc. Certainly these data frames are a gold-mine of interesting insights. So do go ahead and unleash your bagging/boosting algorithms, SVM classifiers or Random Forest algorithm on them.

I plan to try out some algorithms of statistical/machine learning in the months to come. If you do come up with interesting insights, I would appreciate if attribute the source to Cricsheet(http://cricsheet.org), and my package yorkr and my blog Giga thoughts, besides dropping me a note.*

As in my earlier post I will be directly loading the saved files. For the illustration of the functions, I will use India in all the functions, (for obvious reasons) and will randomly use the data from the rest of the top 8 teams

setwd("C:/software/cricket-package/york-test/yorkrData/ODI/ODI-team-allmatches-allOppositions")
load("allMatchesAllOpposition-India.RData")
ind_matches <- matches
dim(ind_matches)
## [1] 140655     25
load("allMatchesAllOpposition-Australia.RData")
aus_matches <- matches
dim(aus_matches)
## [1] 128148     25
load("allMatchesAllOpposition-New Zealand.RData")
nz_matches <- matches
dim(nz_matches)
## [1] 98573    25
load("allMatchesAllOpposition-Pakistan.RData")
pak_matches <- matches
dim(pak_matches)
## [1] 117947     25
load("allMatchesAllOpposition-England.RData")
eng_matches <- matches
dim(eng_matches)
## [1] 118859     25
load("allMatchesAllOpposition-Sri Lanka.RData")
sl_matches <- matches
dim(sl_matches)
## [1] 125893     25
load("allMatchesAllOpposition-West Indies.RData")
wi_matches <- matches
dim(wi_matches)
## [1] 92716    25
load("allMatchesAllOpposition-South Africa.RData")
sa_matches <- matches
dim(sa_matches)
## [1] 100916     25

5. Team Batting Scorecard (all matches with opposition)

The following functions shows the batting scorecards in each country. It returns a dataframe with the top batsmen in each country

#Top ODI performers for India
m <-teamBattingScorecardAllOppnAllMatches(ind_matches,theTeam="India")
## Total= 58079
## Source: local data frame [68 x 5]
## 
##         batsman ballsPlayed fours sixes  runs
##          (fctr)       (int) (int) (int) (dbl)
## 1       V Kohli        7774   663    67  7039
## 2      MS Dhoni        7878   515   129  6885
## 3      SK Raina        5076   429   114  4964
## 4     G Gambhir        5138   472    15  4503
## 5     RG Sharma        5245   372    89  4385
## 6  SR Tendulkar        4708   504    43  4196
## 7  Yuvraj Singh        4472   403    96  3976
## 8      V Sehwag        3106   494    74  3681
## 9      S Dhawan        2956   314    37  2694
## 10    AM Rahane        2490   195    24  2009
## ..          ...         ...   ...   ...   ...
#Top ODI batsmen for Australia
m <-teamBattingScorecardAllOppnAllMatches(aus_matches,theTeam="Australia")
## Total= 54736
## Source: local data frame [70 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (int) (int) (dbl)
## 1   MJ Clarke        7060   440    39  5485
## 2   SR Watson        5435   519   114  5035
## 3  RT Ponting        5301   447    43  4440
## 4  MEK Hussey        4990   286    60  4286
## 5   BJ Haddin        3308   266    69  2858
## 6   DA Warner        2701   264    43  2537
## 7   GJ Bailey        2805   176    43  2392
## 8   SPD Smith        2303   174    19  2082
## 9    CL White        2471   142    44  2018
## 10  ML Hayden        2276   219    37  2002
## ..        ...         ...   ...   ...   ...
#Top ODI batsmen for Pakistan
m <-teamBattingScorecardAllOppnAllMatches(pak_matches,theTeam="Pakistan")
## Total= NA
## Source: local data frame [74 x 5]
## 
##            batsman ballsPlayed fours sixes  runs
##             (fctr)       (int) (int) (int) (dbl)
## 1  Mohammad Hafeez        5714   471    71  4574
## 2      Younis Khan        4561   306    24  3465
## 3    Shahid Afridi        2316   264   132  3125
## 4     Shoaib Malik        3472   240    40  2897
## 5       Umar Akmal        3272   241    47  2843
## 6    Ahmed Shehzad        3386   259    18  2491
## 7  Mohammad Yousuf        2933   191    11  2241
## 8     Kamran Akmal        2533   247    25  2104
## 9      Salman Butt        2037   206     6  1653
## 10   Nasir Jamshed        1862   150    19  1418
## ..             ...         ...   ...   ...   ...
#Top ODI batsmen for New Zealand
m <-teamBattingScorecardAllOppnAllMatches(nz_matches,theTeam="New Zealand")
## Total= 39993
## Source: local data frame [68 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1    LRPL Taylor        6153   418   103  5120
## 2    BB McCullum        4321   446   159  4489
## 3     MJ Guptill        5205   462   100  4460
## 4  KS Williamson        4044   325    25  3418
## 5      SB Styris        2324   167    23  1944
## 6     GD Elliott        2274   149    26  1889
## 7       JD Ryder        1232   139    33  1223
## 8       JDP Oram        1174    81    48  1195
## 9     DL Vettori        1238    97     8  1130
## 10      L Ronchi         927   108    32  1070
## ..           ...         ...   ...   ...   ...
#Top ODI batsmen for England
m <-teamBattingScorecardAllOppnAllMatches(eng_matches,theTeam="England")
## Total= 48152
## Source: local data frame [72 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1         IR Bell        6401   488    31  5051
## 2      EJG Morgan        4249   323    98  3927
## 3    KP Pietersen        3828   315    44  3231
## 4         AN Cook        4052   360    10  3163
## 5  PD Collingwood        3693   213    48  2992
## 6       IJL Trott        3418   205     3  2653
## 7       RS Bopara        3326   202    32  2624
## 8      AJ Strauss        3062   276    20  2566
## 9         JE Root        2983   200    26  2543
## 10     JC Buttler        1467   155    54  1777
## ..            ...         ...   ...   ...   ...
#Top ODI batsmen for West Indies
m <-teamBattingScorecardAllOppnAllMatches(wi_matches,theTeam="West Indies")
## Total= 34622
## Source: local data frame [65 x 5]
## 
##          batsman ballsPlayed fours sixes  runs
##           (fctr)       (int) (int) (int) (dbl)
## 1       CH Gayle        3839   386   144  3635
## 2     MN Samuels        4057   294    72  3062
## 3  S Chanderpaul        3521   188    28  2469
## 4       DJ Bravo        2804   193    49  2390
## 5       DM Bravo        2916   174    41  2051
## 6      RR Sarwan        2682   172    20  1960
## 7     KA Pollard        2064   127    92  1947
## 8    LMP Simmons        2538   157    46  1863
## 9      DJG Sammy        1799   143    83  1835
## 10      D Ramdin        1817   115    23  1516
## ..           ...         ...   ...   ...   ...
#Top ODI batsmen for Sri Lanka
m <-teamBattingScorecardAllOppnAllMatches(sl_matches,theTeam="Sri Lanka")
## Total= NA
## Source: local data frame [60 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (int) (int) (dbl)
## 1     KC Sangakkara       10449   852    64  8778
## 2        TM Dilshan        8838   914    45  7981
## 3  DPMD Jayawardene        7482   599    43  6260
## 4       WU Tharanga        5690   483    24  4232
## 5        AD Mathews        4383   288    59  3764
## 6     ST Jayasuriya        2266   297    61  2396
## 7   HDRL Thirimanne        3286   192    17  2371
## 8      LD Chandimal        3026   165    27  2308
## 9   KMDN Kulasekara        1406    83    37  1204
## 10      NLTC Perera        1007    90    42  1137
## ..              ...         ...   ...   ...   ...

6. Team Batting Scorecard

The following functions show the best batsmen from the opposition ‘theTeam’ in the ‘matches’. For e.g. when the matches=ind_matches and theTeam=“England” then the returned dataframe shows the best English batsmen against India

#Top England batsmen against India
m <-teamBattingScorecardAllOppnAllMatches(matches=ind_matches,theTeam="England")
## Total= 7620
## Source: local data frame [43 x 5]
## 
##           batsman ballsPlayed fours sixes  runs
##            (fctr)       (int) (int) (int) (dbl)
## 1         IR Bell        1238   110     9  1085
## 2    KP Pietersen         990    89    10   847
## 3         AN Cook        1049   103     2   822
## 4       RS Bopara         632    42     8   534
## 5  PD Collingwood         450    39     6   397
## 6         OA Shah         394    40     7   385
## 7       IJL Trott         410    33     2   349
## 8         JE Root         408    32     4   336
## 9        SR Patel         336    25    10   329
## 10   C Kieswetter         309    34    13   313
## ..            ...         ...   ...   ...   ...
#Top Australian batsmen against India
m <-teamBattingScorecardAllOppnAllMatches(matches=ind_matches,theTeam="Australia")
## Total= 9995
## Source: local data frame [47 x 5]
## 
##       batsman ballsPlayed fours sixes  runs
##        (fctr)       (int) (int) (int) (dbl)
## 1  RT Ponting        1107    86     8   876
## 2  MEK Hussey         816    56     5   753
## 3   GJ Bailey         578    51    13   614
## 4   SR Watson         653    81    10   609
## 5   MJ Clarke         786    45     5   607
## 6   ML Hayden         660    72     8   573
## 7   A Symonds         543    43    15   536
## 8    AJ Finch         617    52     9   525
## 9   SPD Smith         431    44     7   467
## 10  DA Warner         385    40     6   391
## ..        ...         ...   ...   ...   ...
#Top New Zealand batsmen against Australia
m <-teamBattingScorecardAllOppnAllMatches(aus_matches,theTeam="New Zealand")
## Total= 6106
## Source: local data frame [44 x 5]
## 
##        batsman ballsPlayed fours sixes  runs
##         (fctr)       (int) (int) (int) (dbl)
## 1  LRPL Taylor        1012    71    13   804
## 2  BB McCullum         768    71    25   761
## 3   MJ Guptill         618    50    17   485
## 4    PG Fulton         526    35     9   425
## 5   GD Elliott         469    29     4   405
## 6    SB Styris         415    36     5   369
## 7   DL Vettori         334    24     2   291
## 8    L Vincent         338    27     5   272
## 9  CD McMillan         227    28    10   266
## 10    JDP Oram         181    13     7   193
## ..         ...         ...   ...   ...   ...
#Top Sri Lankan batsmen against West Indies
m <-teamBattingScorecardAllOppnAllMatches(wi_matches,theTeam="Sri Lanka")
## Total= 1851
## Source: local data frame [28 x 5]
## 
##             batsman ballsPlayed fours sixes  runs
##              (fctr)       (int) (int) (int) (dbl)
## 1  DPMD Jayawardene         330    26     2   288
## 2     KC Sangakkara         326    16     2   238
## 3        TM Dilshan         173    18     7   224
## 4       WU Tharanga         349    22    NA   220
## 5        AD Mathews         171    10     3   161
## 6     ST Jayasuriya         146    19     4   160
## 7       ML Udawatte         138     8     1    87
## 8   HDRL Thirimanne         144     6    NA    67
## 9       MDKJ Perera          63     4     2    64
## 10    CK Kapugedera          68     2    NA    57
## ..              ...         ...   ...   ...   ...

7. Team Batting Partnerships

This gives the top batting partnerships in each team in all its matches against all oppositions. The report can either be a ‘summary’ or a ‘detailed’ breakup of the batting partnerships.

# The function gives the names of highest partnership for India. The default report parameter is "summary"
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='India')
m
## Source: local data frame [68 x 2]
## 
##         batsman totalRuns
##          (fctr)     (dbl)
## 1       V Kohli      7039
## 2      MS Dhoni      6885
## 3      SK Raina      4964
## 4     G Gambhir      4503
## 5     RG Sharma      4385
## 6  SR Tendulkar      4196
## 7  Yuvraj Singh      3976
## 8      V Sehwag      3681
## 9      S Dhawan      2694
## 10    AM Rahane      2009
## ..          ...       ...
# When the report parameter is 'detailed' then the detailed break up of the partnership is returned as a data frame
m <- teamBatsmenPartnershipAllOppnAllMatches(matches,theTeam='India',report="detailed")
head(m,30)
##     batsman      nonStriker partnershipRuns totalRuns
## 1   V Kohli        S Dhawan             661      7039
## 2   V Kohli       AM Rahane             502      7039
## 3   V Kohli       RG Sharma            1073      7039
## 4   V Kohli      KD Karthik             139      7039
## 5   V Kohli    SR Tendulkar             278      7039
## 6   V Kohli        R Dravid             132      7039
## 7   V Kohli        V Sehwag             255      7039
## 8   V Kohli    Yuvraj Singh             420      7039
## 9   V Kohli        SK Raina            1072      7039
## 10  V Kohli        MS Dhoni             534      7039
## 11  V Kohli Harbhajan Singh              13      7039
## 12  V Kohli       IK Pathan               1      7039
## 13  V Kohli       G Gambhir             962      7039
## 14  V Kohli      RV Uthappa              10      7039
## 15  V Kohli       RA Jadeja              91      7039
## 16  V Kohli        R Ashwin              71      7039
## 17  V Kohli       AT Rayudu             345      7039
## 18  V Kohli Gurkeerat Singh               1      7039
## 19  V Kohli       YK Pathan              68      7039
## 20  V Kohli       STR Binny               4      7039
## 21  V Kohli       MK Tiwary             111      7039
## 22  V Kohli        AR Patel              39      7039
## 23  V Kohli        PA Patel             180      7039
## 24  V Kohli         M Vijay              33      7039
## 25  V Kohli       KM Jadhav              10      7039
## 26  V Kohli        AM Nayar              25      7039
## 27  V Kohli     S Badrinath               9      7039
## 28 MS Dhoni        S Dhawan              49      6885
## 29 MS Dhoni       AM Rahane              50      6885
## 30 MS Dhoni       RG Sharma             300      6885

9. More Team Batting Partnerships

When we use the dataframe ind_matches (matches of India against all opoositions) and choose another country in the theTeam then we will get the names of those top batsmen against India.

# Top England batting partnerships against India (report="summary")
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='England')
m
## Source: local data frame [43 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1         IR Bell      1085
## 2    KP Pietersen       847
## 3         AN Cook       822
## 4       RS Bopara       534
## 5  PD Collingwood       397
## 6         OA Shah       385
## 7       IJL Trott       349
## 8         JE Root       336
## 9        SR Patel       329
## 10   C Kieswetter       313
## ..            ...       ...
# Top South Africa  batting partnerships against India (report="detailed")
m <- teamBatsmenPartnershipAllOppnAllMatches(ind_matches,theTeam='South Africa', report="detailed")
m[1:30,]
##           batsman       nonStriker partnershipRuns totalRuns
## 1  AB de Villiers       MN van Wyk              30      1179
## 2  AB de Villiers        JH Kallis             207      1179
## 3  AB de Villiers         HH Gibbs              20      1179
## 4  AB de Villiers        JP Duminy             168      1179
## 5  AB de Villiers       MV Boucher              37      1179
## 6  AB de Villiers          JM Kemp               5      1179
## 7  AB de Villiers      AN Petersen               8      1179
## 8  AB de Villiers       WD Parnell              56      1179
## 9  AB de Villiers         DW Steyn               5      1179
## 10 AB de Villiers    CK Langeveldt              19      1179
## 11 AB de Villiers          HM Amla              26      1179
## 12 AB de Villiers         GC Smith             106      1179
## 13 AB de Villiers     F du Plessis             133      1179
## 14 AB de Villiers        Q de Kock             113      1179
## 15 AB de Villiers        DA Miller             103      1179
## 16 AB de Villiers      F Behardien              64      1179
## 17 AB de Villiers        CH Morris              32      1179
## 18 AB de Villiers      AM Phangiso              37      1179
## 19 AB de Villiers       SM Pollock              10      1179
## 20        HM Amla       MN van Wyk              66       704
## 21        HM Amla   AB de Villiers               9       704
## 22        HM Amla        JH Kallis              88       704
## 23        HM Amla         HH Gibbs              10       704
## 24        HM Amla        JP Duminy              79       704
## 25        HM Amla        LE Bosman              43       704
## 26        HM Amla RE van der Merwe              17       704
## 27        HM Amla         GC Smith              92       704
## 28        HM Amla     F du Plessis              45       704
## 29        HM Amla      RJ Peterson               2       704
## 30        HM Amla        Q de Kock             211       704

10. Team Batting partnerships of other countries

#Top Indian batting partnerships  against England matches
m <- teamBatsmenPartnershipAllOppnAllMatches(eng_matches,theTeam='India',report="detailed")
head(m,30)
##     batsman    nonStriker partnershipRuns totalRuns
## 1  MS Dhoni     G Gambhir               6      1083
## 2  MS Dhoni      R Dravid              59      1083
## 3  MS Dhoni     PP Chawla               1      1083
## 4  MS Dhoni        Z Khan               4      1083
## 5  MS Dhoni      RP Singh              26      1083
## 6  MS Dhoni  Yuvraj Singh             157      1083
## 7  MS Dhoni      RR Powar              15      1083
## 8  MS Dhoni    RV Uthappa              29      1083
## 9  MS Dhoni     AM Rahane               1      1083
## 10 MS Dhoni       V Kohli              28      1083
## 11 MS Dhoni      SK Raina             372      1083
## 12 MS Dhoni       P Kumar              42      1083
## 13 MS Dhoni R Vinay Kumar              12      1083
## 14 MS Dhoni      R Ashwin              27      1083
## 15 MS Dhoni     RA Jadeja             238      1083
## 16 MS Dhoni     AT Rayudu              17      1083
## 17 MS Dhoni     STR Binny              41      1083
## 18 MS Dhoni     YK Pathan               8      1083
## 19 SK Raina     G Gambhir              23       918
## 20 SK Raina      R Dravid               1       918
## 21 SK Raina      MS Dhoni             450       918
## 22 SK Raina  Yuvraj Singh              56       918
## 23 SK Raina     AM Rahane              17       918
## 24 SK Raina       V Kohli             144       918
## 25 SK Raina     RG Sharma              58       918
## 26 SK Raina     MK Tiwary              28       918
## 27 SK Raina      R Ashwin              15       918
## 28 SK Raina     RA Jadeja              59       918
## 29 SK Raina     AT Rayudu              61       918
## 30 SK Raina      V Sehwag               6       918
#Top South Africa batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(sa_matches,theTeam='South Africa', report="detailed")
head(m,30)
##           batsman       nonStriker partnershipRuns totalRuns
## 1  AB de Villiers         GC Smith             957      7693
## 2  AB de Villiers        JH Kallis             897      7693
## 3  AB de Villiers         HH Gibbs             295      7693
## 4  AB de Villiers       MV Boucher             143      7693
## 5  AB de Villiers          JM Kemp               8      7693
## 6  AB de Villiers       SM Pollock              16      7693
## 7  AB de Villiers    CK Langeveldt              19      7693
## 8  AB de Villiers          HM Amla            1437      7693
## 9  AB de Villiers        JP Duminy            1123      7693
## 10 AB de Villiers        JA Morkel             169      7693
## 11 AB de Villiers          J Botha              27      7693
## 12 AB de Villiers        Q de Kock             248      7693
## 13 AB de Villiers     F du Plessis             667      7693
## 14 AB de Villiers        DA Miller             571      7693
## 15 AB de Villiers        R McLaren             120      7693
## 16 AB de Villiers         DW Steyn              32      7693
## 17 AB de Villiers      AM Phangiso              37      7693
## 18 AB de Villiers         M Morkel              21      7693
## 19 AB de Villiers       WD Parnell              83      7693
## 20 AB de Villiers      F Behardien             223      7693
## 21 AB de Villiers     VD Philander              12      7693
## 22 AB de Villiers       RR Rossouw              90      7693
## 23 AB de Villiers      RJ Peterson               5      7693
## 24 AB de Villiers      AN Petersen             132      7693
## 25 AB de Villiers       MN van Wyk              89      7693
## 26 AB de Villiers        CH Morris              32      7693
## 27 AB de Villiers        KJ Abbott              21      7693
## 28 AB de Villiers          D Elgar              54      7693
## 29 AB de Villiers RE van der Merwe               1      7693
## 30 AB de Villiers        CA Ingram             138      7693
#Top Sri Lanka batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(sl_matches,theTeam='Sri Lanka',report="summary")
m
## Source: local data frame [60 x 2]
## 
##             batsman totalRuns
##              (fctr)     (dbl)
## 1     KC Sangakkara      8778
## 2        TM Dilshan      7981
## 3  DPMD Jayawardene      6260
## 4       WU Tharanga      4232
## 5        AD Mathews      3764
## 6     ST Jayasuriya      2396
## 7   HDRL Thirimanne      2371
## 8      LD Chandimal      2308
## 9   KMDN Kulasekara      1204
## 10      NLTC Perera      1137
## ..              ...       ...
#Top England batting partnerships 
m <- teamBatsmenPartnershipAllOppnAllMatches(eng_matches,theTeam='England',report="summary")
m
## Source: local data frame [72 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1         IR Bell      5051
## 2      EJG Morgan      3927
## 3    KP Pietersen      3231
## 4         AN Cook      3163
## 5  PD Collingwood      2992
## 6       IJL Trott      2653
## 7       RS Bopara      2624
## 8      AJ Strauss      2566
## 9         JE Root      2543
## 10     JC Buttler      1777
## ..            ...       ...
#Top Australian batting partnerships in West Indian matches
m <- teamBatsmenPartnershipAllOppnAllMatches(wi_matches,theTeam='Australia',report="summary")
m
## Source: local data frame [39 x 2]
## 
##       batsman totalRuns
##        (fctr)     (dbl)
## 1   SR Watson       851
## 2  MEK Hussey       630
## 3  RT Ponting       503
## 4   MJ Clarke       435
## 5   GJ Bailey       341
## 6   A Symonds       252
## 7    SE Marsh       245
## 8   BJ Haddin       220
## 9   DJ Hussey       211
## 10   AC Voges       209
## ..        ...       ...
#Top England batting partnerships in New Zealand  matches
m <- teamBatsmenPartnershipAllOppnAllMatches(nz_matches,theTeam='England',report="summary")
m
## Source: local data frame [47 x 2]
## 
##           batsman totalRuns
##            (fctr)     (dbl)
## 1         IR Bell       654
## 2         JE Root       612
## 3  PD Collingwood       514
## 4      EJG Morgan       479
## 5         AN Cook       464
## 6       IJL Trott       362
## 7    KP Pietersen       358
## 8      JC Buttler       287
## 9         OA Shah       274
## 10      RS Bopara       222
## ..            ...       ...

11. Team Batting Partnership plots

Graphical plot of batting partnerships for the countries

# Plot of batting partnerships of India (Virat Kohli and M S Dhoni have the best partnerships)
teamBatsmenPartnershipAllOppnAllMatchesPlot(ind_matches,"India",main="India")

batsmenPartnership1-1

# Plot of batting partnerships of Pakistan
teamBatsmenPartnershipAllOppnAllMatchesPlot(pak_matches,"Pakistan",main="Pakistan")

batsmenPartnership1-2

# Plot of batting partnerships of Australia
teamBatsmenPartnershipAllOppnAllMatchesPlot(aus_matches,"Australia",main="Australia")

batsmenPartnership1-3

12. Top opposition batting partnerships.

This gives the best performance of the team against a specified country Indian partnetships against Australia

New Zealand Partnetship against South Africa

# Top India partnerships against West Indies
teamBatsmenPartnershipAllOppnAllMatchesPlot(ind_matches,"India",main="West Indies")

batsmenPartnership2-1

# Top Sri Lanka parnerships ahgains India
teamBatsmenPartnershipAllOppnAllMatchesPlot(sl_matches,"Sri Lanka",main="India")

batsmenPartnership2-2

# Top New Zealand partnerships against South Africa
teamBatsmenPartnershipAllOppnAllMatchesPlot(nz_matches,"New Zealand",main="South Africa")

batsmenPartnership2-3

13. Batsmen vs Bowlers

The function below gives the top performance of batsmen against the opposition countries

# Top batsmen against bowlers when rank=0
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=0)
m
## Source: local data frame [68 x 2]
## 
##         batsman runsScored
##          (fctr)      (dbl)
## 1       V Kohli       7039
## 2      MS Dhoni       6885
## 3      SK Raina       4964
## 4     G Gambhir       4503
## 5     RG Sharma       4385
## 6  SR Tendulkar       4196
## 7  Yuvraj Singh       3976
## 8      V Sehwag       3681
## 9      S Dhawan       2694
## 10    AM Rahane       2009
## ..          ...        ...
# Performance of India batsman with rank=1 against international bowlers and runs scored against bowlers. This is Virat Kohli for India
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=1,dispRows=30)
m
## Source: local data frame [30 x 3]
## Groups: batsman [1]
## 
##    batsman          bowler  runs
##     (fctr)          (fctr) (dbl)
## 1  V Kohli     NLTC Perera   242
## 2  V Kohli KMDN Kulasekara   196
## 3  V Kohli      SL Malinga   175
## 4  V Kohli      AD Mathews   155
## 5  V Kohli      BAW Mendis   132
## 6  V Kohli       R Rampaul   127
## 7  V Kohli     JW Dernbach   121
## 8  V Kohli     JP Faulkner   118
## 9  V Kohli       DJG Sammy   116
## 10 V Kohli    HMRKB Herath   113
## ..     ...             ...   ...
# Performance of India batsman with rank=2 against international bowlers and runs scored against these bowlers. This is M S Dhoni for India
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=2,dispRows=50)
m
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##     batsman         bowler  runs
##      (fctr)         (fctr) (dbl)
## 1  MS Dhoni M Muralitharan   195
## 2  MS Dhoni  ST Jayasuriya   183
## 3  MS Dhoni     SL Malinga   144
## 4  MS Dhoni      SR Watson   135
## 5  MS Dhoni        ST Finn   130
## 6  MS Dhoni     MG Johnson   128
## 7  MS Dhoni    JP Faulkner   125
## 8  MS Dhoni  Shahid Afridi   120
## 9  MS Dhoni     TT Bresnan   111
## 10 MS Dhoni     AD Mathews   111
## ..      ...            ...   ...
# Performance of England batsman with rank=1 against international bowlers and runs scored against these bowlers. This returns a data frame of the the theTeam's batsmen against the bowlers for which the 'matches' dataframe is used. This Is IR Bell,
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=ind_matches,theTeam="England",rank=1,dispRows=25)
m
## Source: local data frame [25 x 3]
## Groups: batsman [1]
## 
##    batsman       bowler  runs
##     (fctr)       (fctr) (dbl)
## 1  IR Bell       Z Khan   127
## 2  IR Bell    PP Chawla   111
## 3  IR Bell    RA Jadeja    94
## 4  IR Bell      B Kumar    78
## 5  IR Bell     MM Patel    77
## 6  IR Bell     R Ashwin    71
## 7  IR Bell   AB Agarkar    66
## 8  IR Bell     I Sharma    57
## 9  IR Bell     RP Singh    51
## 10 IR Bell Yuvraj Singh    51
## ..     ...          ...   ...
# All the best Australian batsmen against India in all of Indian matches
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"Australia",rank=0)
m
## Source: local data frame [47 x 2]
## 
##       batsman runsScored
##        (fctr)      (dbl)
## 1  RT Ponting        876
## 2  MEK Hussey        753
## 3   GJ Bailey        614
## 4   SR Watson        609
## 5   MJ Clarke        607
## 6   ML Hayden        573
## 7   A Symonds        536
## 8    AJ Finch        525
## 9   SPD Smith        467
## 10  DA Warner        391
## ..        ...        ...

14. Batsmen vs Bowlers (continued)

# The best India batsman(rank=0) against England and his performance against England bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(eng_matches,"India",rank=1,dispRows=30)
m
## Source: local data frame [28 x 3]
## Groups: batsman [1]
## 
##     batsman      bowler  runs
##      (fctr)      (fctr) (dbl)
## 1  MS Dhoni     ST Finn   130
## 2  MS Dhoni  TT Bresnan   111
## 3  MS Dhoni    GP Swann   101
## 4  MS Dhoni JW Dernbach    95
## 5  MS Dhoni   SCJ Broad    92
## 6  MS Dhoni JM Anderson    89
## 7  MS Dhoni    SR Patel    83
## 8  MS Dhoni JC Tredwell    40
## 9  MS Dhoni   CR Woakes    38
## 10 MS Dhoni  MS Panesar    37
## ..      ...         ...   ...
# All the top Sri Lanka batsmen (rank=0) against Australia and performances against Australian bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(aus_matches,"Sri Lanka",rank=0)
m
## Source: local data frame [31 x 2]
## 
##             batsman runsScored
##              (fctr)      (dbl)
## 1     KC Sangakkara        888
## 2  DPMD Jayawardene        846
## 3        TM Dilshan        799
## 4       WU Tharanga        464
## 5      LD Chandimal        413
## 6        AD Mathews        404
## 7   HDRL Thirimanne        290
## 8   KMDN Kulasekara        232
## 9     ST Jayasuriya        117
## 10       SL Malinga         91
## ..              ...        ...
#All the top England batsmen (rank=0) and their performances against South African bowlers
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(sa_matches,"England",rank=0)
m
## Source: local data frame [39 x 2]
## 
##           batsman runsScored
##            (fctr)      (dbl)
## 1       IJL Trott        424
## 2         JE Root        372
## 3         IR Bell        362
## 4      EJG Morgan        335
## 5  PD Collingwood        319
## 6        AD Hales        271
## 7    KP Pietersen        192
## 8      A Flintoff        192
## 9         OA Shah        177
## 10     JC Buttler        154
## ..            ...        ...

15. Batsmen vs Bowlers Plot

The following functions plot the performances of the batsman based on the rank chosen against opposition bowlers. Note: The rank has to be >0

#The following plot displays the performance of the top India batsman (rank=1) against all opposition bowlers. This is Virat Kohli for India

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=1,dispRows=50)
d
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##    batsman          bowler  runs
##     (fctr)          (fctr) (dbl)
## 1  V Kohli     NLTC Perera   242
## 2  V Kohli KMDN Kulasekara   196
## 3  V Kohli      SL Malinga   175
## 4  V Kohli      AD Mathews   155
## 5  V Kohli      BAW Mendis   132
## 6  V Kohli       R Rampaul   127
## 7  V Kohli     JW Dernbach   121
## 8  V Kohli     JP Faulkner   118
## 9  V Kohli       DJG Sammy   116
## 10 V Kohli    HMRKB Herath   113
## ..     ...             ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-1

e <- teamBatsmenVsBowlersAllOppnAllMatchesPlot(d,plot=FALSE)
e
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##    batsman          bowler  runs
##     (fctr)          (fctr) (dbl)
## 1  V Kohli     NLTC Perera   242
## 2  V Kohli KMDN Kulasekara   196
## 3  V Kohli      SL Malinga   175
## 4  V Kohli      AD Mathews   155
## 5  V Kohli      BAW Mendis   132
## 6  V Kohli       R Rampaul   127
## 7  V Kohli     JW Dernbach   121
## 8  V Kohli     JP Faulkner   118
## 9  V Kohli       DJG Sammy   116
## 10 V Kohli    HMRKB Herath   113
## ..     ...             ...   ...
# The following plot displays the performance of the batsman (rank=2) against all opposition bowlers. This is M S Dhoni for India
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"India",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-2

# Best batsman of South Africa against Indian  bowlers
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(ind_matches,"South Africa",rank=1,dispRows=30)
d
## Source: local data frame [30 x 3]
## Groups: batsman [1]
## 
##           batsman          bowler  runs
##            (fctr)          (fctr) (dbl)
## 1  AB de Villiers Harbhajan Singh   133
## 2  AB de Villiers         B Kumar    93
## 3  AB de Villiers       RA Jadeja    90
## 4  AB de Villiers        A Mishra    77
## 5  AB de Villiers       MM Sharma    68
## 6  AB de Villiers          Z Khan    65
## 7  AB de Villiers     S Sreesanth    61
## 8  AB de Villiers         A Nehra    58
## 9  AB de Villiers        R Ashwin    55
## 10 AB de Villiers       IK Pathan    45
## ..            ...             ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-3

# Best batsman of England (rank=1) against Indian bowlers (matches=ind_matches)
d <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=ind_matches,"England",rank=1,dispRows=50)
d
## Source: local data frame [28 x 3]
## Groups: batsman [1]
## 
##    batsman       bowler  runs
##     (fctr)       (fctr) (dbl)
## 1  IR Bell       Z Khan   127
## 2  IR Bell    PP Chawla   111
## 3  IR Bell    RA Jadeja    94
## 4  IR Bell      B Kumar    78
## 5  IR Bell     MM Patel    77
## 6  IR Bell     R Ashwin    71
## 7  IR Bell   AB Agarkar    66
## 8  IR Bell     I Sharma    57
## 9  IR Bell     RP Singh    51
## 10 IR Bell Yuvraj Singh    51
## ..     ...          ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler1-4

15. Batsmen vs Bowlers Plot (continued)

# Top batsman of South Africa and performance against opposition bowlers of all countries
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(sa_matches,"South Africa",rank=1,dispRows=50)
d
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##           batsman          bowler  runs
##            (fctr)          (fctr) (dbl)
## 1  AB de Villiers   Shahid Afridi   227
## 2  AB de Villiers     Saeed Ajmal   174
## 3  AB de Villiers Mohammad Hafeez   151
## 4  AB de Villiers       JO Holder   138
## 5  AB de Villiers Harbhajan Singh   133
## 6  AB de Villiers      Wahab Riaz   130
## 7  AB de Villiers      MG Johnson   129
## 8  AB de Villiers        P Utseya   128
## 9  AB de Villiers       DJG Sammy   110
## 10 AB de Villiers        DJ Bravo   107
## ..            ...             ...   ...
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-1

# Do not display plot but return dataframe
e <- teamBatsmenVsBowlersAllOppnAllMatchesPlot(d,plot=FALSE)
e
## Source: local data frame [50 x 3]
## Groups: batsman [1]
## 
##           batsman          bowler  runs
##            (fctr)          (fctr) (dbl)
## 1  AB de Villiers   Shahid Afridi   227
## 2  AB de Villiers     Saeed Ajmal   174
## 3  AB de Villiers Mohammad Hafeez   151
## 4  AB de Villiers       JO Holder   138
## 5  AB de Villiers Harbhajan Singh   133
## 6  AB de Villiers      Wahab Riaz   130
## 7  AB de Villiers      MG Johnson   129
## 8  AB de Villiers        P Utseya   128
## 9  AB de Villiers       DJG Sammy   110
## 10 AB de Villiers        DJ Bravo   107
## ..            ...             ...   ...
# Top batsman of Sri Lanka against bowlers of all countries
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(sl_matches,"Sri Lanka",rank=1,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-2

# Best West Indian against English bowlrs
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(eng_matches,"West Indies",rank=1,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

batsmenVsBowler2-3

16 Team bowling scorecard against all opposition

The functions lists the top bowlers of each country in ODI matches. This function returns a dataframe when ‘matches’ is the matches of the country and ‘theTeam’ is the same country as in the functions below

teamBowlingScorecardAllOppnAllMatchesMain(matches=ind_matches,theTeam="India")
## Source: local data frame [57 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1        RA Jadeja    43       0  4749     153
## 2         R Ashwin    49       0  4225     146
## 3           Z Khan    47       0  3692     141
## 4  Harbhajan Singh    45       0  4040     123
## 5         I Sharma    51       0  3216     113
## 6         MM Patel    49       1  2400      92
## 7          P Kumar    50       2  2752      84
## 8         UT Yadav    51       0  2442      80
## 9   Mohammed Shami    43       0  1806      80
## 10    Yuvraj Singh    38       0  2588      77
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(matches=aus_matches,theTeam="Australia")
## Source: local data frame [54 x 5]
## 
##          bowler overs maidens  runs wickets
##          (fctr) (int)   (int) (dbl)   (dbl)
## 1    MG Johnson    51       0  5635     245
## 2         B Lee    50       0  3400     147
## 3     SR Watson    45      NA    NA     136
## 4    NW Bracken    51       0  2763     114
## 5      CJ McKay    49      NA    NA     103
## 6      MA Starc    48       1  1769      97
## 7   JP Faulkner    44       0  2004      75
## 8      JR Hopes    43       0  2098      69
## 9       SW Tait    50       0  1461      66
## 10 DE Bollinger    51       0  1482      65
## ..          ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(eng_matches,"England")
## Source: local data frame [52 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1     JM Anderson    51       0  5688     202
## 2       SCJ Broad    51       0  5160     198
## 3      TT Bresnan    51       0  3730     117
## 4         ST Finn    49       0  2839     106
## 5        GP Swann    39       0  2760     106
## 6  PD Collingwood    40       1  2517      77
## 7      A Flintoff    45       0  1260      68
## 8     JC Tredwell    42       0  1614      62
## 9       CR Woakes    47       0  1859      57
## 10      RS Bopara    34       0  1508      42
## ..            ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(pak_matches,"Pakistan")
## Source: local data frame [55 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1    Shahid Afridi    45       0  6674     212
## 2      Saeed Ajmal    44       0  4089     184
## 3         Umar Gul    49       0  4127     151
## 4       Wahab Riaz    50       0  2954     111
## 5  Mohammad Hafeez    51       0  3502     109
## 6   Mohammad Irfan    49       0  2523      86
## 7    Sohail Tanvir    48       1  2534      75
## 8      Junaid Khan    48       1  2056      75
## 9   Iftikhar Anjum    49       2  1674      62
## 10    Shoaib Malik    41       1  2206      59
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(sa_matches,"South Africa")
## Source: local data frame [41 x 5]
## 
##           bowler overs maidens  runs wickets
##           (fctr) (int)   (int) (dbl)   (dbl)
## 1       DW Steyn    51       0  4294     179
## 2       M Morkel    51       0  4012     172
## 3    LL Tsotsobe    42       0  2231     100
## 4    Imran Tahir    39       0  2124      93
## 5      R McLaren    41       1  1983      80
## 6      JH Kallis    44       0  2075      77
## 7     WD Parnell    44       0  1957      74
## 8        J Botha    44       0  2311      69
## 9    RJ Peterson    47       1  1872      68
## 10 CK Langeveldt    49       0  1829      65
## ..           ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(nz_matches,"New Zealand")
## Source: local data frame [51 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1        KD Mills    50       1  3918     160
## 2      DL Vettori    43       1  3767     147
## 3      TG Southee    51       0  3996     134
## 4  MJ McClenaghan    49       0  2252      85
## 5        JDP Oram    46       0  2064      78
## 6     NL McCullum    46       0  2840      67
## 7         SE Bond    37       1  1449      62
## 8        TA Boult    40       3  1324      58
## 9     CJ Anderson    41       0  1297      52
## 10       MJ Henry    41       0  1098      47
## ..            ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(sl_matches,"Sri Lanka")
## Source: local data frame [54 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1       SL Malinga    51       0  7214     281
## 2  KMDN Kulasekara    51       0  5481     179
## 3       BAW Mendis    47       0  2979     135
## 4      NLTC Perera    48       0  3624     129
## 5   M Muralitharan    45       0  2471     114
## 6       AD Mathews    51       0  3394     113
## 7       TM Dilshan    50       0  3049      73
## 8     CRD Fernando    51       1  2067      73
## 9     HMRKB Herath    41       0  2027      71
## 10     MF Maharoof    48       0  1860      70
## ..             ...   ...     ...   ...     ...
teamBowlingScorecardAllOppnAllMatchesMain(wi_matches,"West Indies")
## Source: local data frame [45 x 5]
## 
##        bowler overs maidens  runs wickets
##        (fctr) (int)   (int) (dbl)   (dbl)
## 1    DJ Bravo    51       0  4239     153
## 2   JE Taylor    50       0  2530     103
## 3   R Rampaul    46       1  2608     102
## 4   KAJ Roach    49       0  2500      98
## 5   SP Narine    47       0  1924      82
## 6   DJG Sammy    51       1  3584      79
## 7  AD Russell    48       0  1987      63
## 8    CH Gayle    38       0  1955      53
## 9   JO Holder    44       0  1542      50
## 10 MN Samuels    38       0  2209      48
## ..        ...   ...     ...   ...     ...

17 Team bowling scorecard against all opposition (continued)

The function lists the top bowlers of a country (‘matches’) against the opposition country

# Best Indian bowlers in matches against Australia
teamBowlingScorecardAllOppnAllMatches(ind_matches,'Australia')
## Source: local data frame [36 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         I Sharma    44       1   739      26
## 2  Harbhajan Singh    40       0   926      25
## 3        IK Pathan    42       1   702      22
## 4         UT Yadav    37       2   606      18
## 5      S Sreesanth    34       0   454      18
## 6        RA Jadeja    39       0   867      16
## 7           Z Khan    33       1   500      15
## 8         R Ashwin    43       0   684      14
## 9          P Kumar    27       0   501      14
## 10   R Vinay Kumar    31       1   380      14
## ..             ...   ...     ...   ...     ...
# Best Australian bowlers in matches against India
teamBowlingScorecardAllOppnAllMatches(aus_matches,'India')
## Source: local data frame [39 x 5]
## 
##         bowler overs maidens  runs wickets
##         (fctr) (int)   (int) (dbl)   (dbl)
## 1   MG Johnson    47       0  1020      44
## 2        B Lee    41       3   671      28
## 3    SR Watson    36       1   532      18
## 4     CJ McKay    37       1   403      18
## 5      GB Hogg    33       0   427      17
## 6  JP Faulkner    26       0   598      16
## 7     JR Hopes    31       0   346      14
## 8   NW Bracken    35       1   429      13
## 9  JW Hastings    27       2   259      13
## 10    MA Starc    26       0   251      13
## ..         ...   ...     ...   ...     ...
# Best New Zealand bowlers in matches against England
teamBowlingScorecardAllOppnAllMatches(nz_matches,'England')
## Source: local data frame [33 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1      TG Southee    39       2   684      33
## 2      DL Vettori    27       1   561      28
## 3        KD Mills    27       0   742      24
## 4  MJ McClenaghan    25       1   515      20
## 5    JEC Franklin    23       0   418      12
## 6         SE Bond    16       0   205      12
## 7      GD Elliott    10       3   194      12
## 8       SB Styris     8       0   296       9
## 9     NL McCullum    24       0   425       7
## 10     MJ Santner    18       0   230       7
## ..            ...   ...     ...   ...     ...
# Best Sri Lankan bowlers in matches against West Indies
teamBowlingScorecardAllOppnAllMatches(sl_matches,"West Indies")
## Source: local data frame [24 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1       SL Malinga    28       1   280      14
## 2       BAW Mendis    15       0   267       9
## 3  KMDN Kulasekara    13       1   185       8
## 4       AD Mathews    14       0   191       7
## 5   M Muralitharan    20       1   157       6
## 6      MF Maharoof     9       2    14       6
## 7       WPUJC Vaas     7       2    82       5
## 8       RAS Lakmal     7       0    55       5
## 9     HMRKB Herath    10       1   124       4
## 10   ST Jayasuriya     1       0    38       4
## ..             ...   ...     ...   ...     ...

18. Team Bowlers versus Batsmen (against all oppositions)

The functions below give the peformance of bowlers versus batsman. They give the best bowlers and the total runs conceded and against whom were the runs conceded

# Best bowlers overall from India against all opposition (rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1        RA Jadeja  4691
## 2         R Ashwin  4111
## 3  Harbhajan Singh  3858
## 4           Z Khan  3514
## 5         I Sharma  3100
## 6          P Kumar  2646
## 7     Yuvraj Singh  2542
## 8        IK Pathan  2359
## 9         UT Yadav  2343
## 10        MM Patel  2314
# Top ODI bowler of India and runs conceded against different opposition batsmen 
(rank=1)
## [1] 1
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=1)
m
## Source: local data frame [207 x 3]
## Groups: bowler [1]
## 
##       bowler          batsman runsConceded
##       (fctr)           (fctr)        (dbl)
## 1  RA Jadeja    KC Sangakkara          172
## 2  RA Jadeja DPMD Jayawardene          117
## 3  RA Jadeja       TM Dilshan          108
## 4  RA Jadeja     LD Chandimal          103
## 5  RA Jadeja        GJ Bailey           99
## 6  RA Jadeja      LRPL Taylor           95
## 7  RA Jadeja          IR Bell           94
## 8  RA Jadeja    KS Williamson           92
## 9  RA Jadeja   AB de Villiers           90
## 10 RA Jadeja        SR Watson           85
## ..       ...              ...          ...
# Top ODI bowler of India and runs conceded against different opposition batsmen (rank=2)
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=2)
m
## Source: local data frame [177 x 3]
## Groups: bowler [1]
## 
##      bowler          batsman runsConceded
##      (fctr)           (fctr)        (dbl)
## 1  R Ashwin        GJ Bailey          132
## 2  R Ashwin    KC Sangakkara          117
## 3  R Ashwin          AN Cook          115
## 4  R Ashwin    KS Williamson          114
## 5  R Ashwin         DM Bravo          111
## 6  R Ashwin       AD Mathews          100
## 7  R Ashwin     LD Chandimal           98
## 8  R Ashwin      LRPL Taylor           93
## 9  R Ashwin DPMD Jayawardene           93
## 10 R Ashwin     KP Pietersen           81
## ..      ...              ...          ...

18. Team Bowlers versus Batsmen (against all oppositions continued)

# Top bowlers versus batsmen of South Africa(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(sa_matches,theTeam="South Africa",rank=0)
## Source: local data frame [10 x 2]
## 
##         bowler  runs
##         (fctr) (dbl)
## 1     DW Steyn  4116
## 2     M Morkel  3808
## 3      J Botha  2244
## 4  LL Tsotsobe  2147
## 5    JP Duminy  2111
## 6  Imran Tahir  2087
## 7    JH Kallis  2014
## 8   WD Parnell  1864
## 9    R McLaren  1863
## 10 RJ Peterson  1842
# Top bowlers versus batsmen of Pakistan(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(pak_matches,theTeam="Pakistan",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1    Shahid Afridi  6444
## 2      Saeed Ajmal  3956
## 3         Umar Gul  3901
## 4  Mohammad Hafeez  3434
## 5       Wahab Riaz  2755
## 6   Mohammad Irfan  2399
## 7    Sohail Tanvir  2337
## 8     Shoaib Malik  2105
## 9      Junaid Khan  1974
## 10  Iftikhar Anjum  1626
# Top bowlers versus batsmen of Sri Lanka(rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(sl_matches,theTeam="Sri Lanka",rank=1)
## Source: local data frame [314 x 3]
## Groups: bowler [1]
## 
##        bowler         batsman runsConceded
##        (fctr)          (fctr)        (dbl)
## 1  SL Malinga Mohammad Hafeez          191
## 2  SL Malinga         V Kohli          175
## 3  SL Malinga       G Gambhir          170
## 4  SL Malinga        MS Dhoni          144
## 5  SL Malinga      Umar Akmal          142
## 6  SL Malinga        V Sehwag          140
## 7  SL Malinga         IR Bell          134
## 8  SL Malinga    SR Tendulkar          133
## 9  SL Malinga   Ahmed Shehzad          121
## 10 SL Malinga         AN Cook          120
## ..        ...             ...          ...
m <-teamBowlersVsBatsmenAllOppnAllMatchesMain(ind_matches,theTeam="India",rank=2)
m
## Source: local data frame [177 x 3]
## Groups: bowler [1]
## 
##      bowler          batsman runsConceded
##      (fctr)           (fctr)        (dbl)
## 1  R Ashwin        GJ Bailey          132
## 2  R Ashwin    KC Sangakkara          117
## 3  R Ashwin          AN Cook          115
## 4  R Ashwin    KS Williamson          114
## 5  R Ashwin         DM Bravo          111
## 6  R Ashwin       AD Mathews          100
## 7  R Ashwin     LD Chandimal           98
## 8  R Ashwin      LRPL Taylor           93
## 9  R Ashwin DPMD Jayawardene           93
## 10 R Ashwin     KP Pietersen           81
## ..      ...              ...          ...

19. Team bowlers versus batsmen report (all oppositions)

#Top bowlers of other countries against India
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=ind_matches,theTeam="India",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1  KMDN Kulasekara  1448
## 2       SL Malinga  1319
## 3      NLTC Perera   959
## 4      JM Anderson   954
## 5       MG Johnson   939
## 6        SCJ Broad   877
## 7       BAW Mendis   783
## 8       AD Mathews   776
## 9          ST Finn   751
## 10      TT Bresnan   741
# Best performer against India is KMDN Kulasekar of Sri Lanka in ODIs
a <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="India",rank=1)
a
## Source: local data frame [31 x 3]
## Groups: bowler [1]
## 
##             bowler      batsman runsConceded
##             (fctr)       (fctr)        (dbl)
## 1  KMDN Kulasekara     V Sehwag          199
## 2  KMDN Kulasekara      V Kohli          196
## 3  KMDN Kulasekara    G Gambhir          157
## 4  KMDN Kulasekara SR Tendulkar          127
## 5  KMDN Kulasekara Yuvraj Singh          118
## 6  KMDN Kulasekara    RG Sharma          114
## 7  KMDN Kulasekara     SK Raina          104
## 8  KMDN Kulasekara     MS Dhoni           80
## 9  KMDN Kulasekara   KD Karthik           56
## 10 KMDN Kulasekara   SC Ganguly           51
## ..             ...          ...          ...

20. Team bowlers versus batsmen report (all oppositions continued)

#Top Indian bowlers against Sri Lanka 
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=ind_matches,theTeam="Sri Lanka",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1           Z Khan  1141
## 2        RA Jadeja   882
## 3         I Sharma   855
## 4  Harbhajan Singh   805
## 5          P Kumar   758
## 6         R Ashwin   740
## 7        IK Pathan   678
## 8          A Nehra   584
## 9         UT Yadav   544
## 10        MM Patel   488
#Top Indian bowlers against England
teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,"England",rank=0)
## Source: local data frame [10 x 2]
## 
##          bowler  runs
##          (fctr) (dbl)
## 1      R Ashwin   777
## 2     RA Jadeja   735
## 3        Z Khan   507
## 4      MM Patel   463
## 5      RP Singh   410
## 6      I Sharma   396
## 7     PP Chawla   375
## 8  Yuvraj Singh   370
## 9       B Kumar   353
## 10   AB Agarkar   336

21. Team bowlers versus batsmen report (all oppositions coninued-1)

#Top ODI opposition bowlers against New Zealand
teamBowlersVsBatsmenAllOppnAllMatchesRept(nz_matches,theTeam="New Zealand",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1      JM Anderson   889
## 2       MG Johnson   828
## 3    Shahid Afridi   751
## 4  KMDN Kulasekara   728
## 5        SCJ Broad   638
## 6       NW Bracken   626
## 7       SL Malinga   601
## 8         DW Steyn   556
## 9          ST Finn   482
## 10       SR Watson   468
# Top ODI opposition bowlers against Australia
teamBowlersVsBatsmenAllOppnAllMatchesRept(aus_matches,"Australia",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1      JM Anderson  1211
## 2       TT Bresnan  1087
## 3       SL Malinga  1078
## 4        SCJ Broad   948
## 5  Harbhajan Singh   890
## 6       DL Vettori   883
## 7  KMDN Kulasekara   875
## 8         DW Steyn   872
## 9        RA Jadeja   853
## 10        DJ Bravo   830
# Top ODI bowlers against Sri Lanka
teamBowlersVsBatsmenAllOppnAllMatchesRept(sl_matches,"Sri Lanka",rank=0)
## Source: local data frame [10 x 2]
## 
##             bowler  runs
##             (fctr) (dbl)
## 1    Shahid Afridi  1177
## 2           Z Khan  1141
## 3        RA Jadeja   882
## 4         I Sharma   855
## 5      Saeed Ajmal   814
## 6  Harbhajan Singh   805
## 7  Mohammad Hafeez   774
## 8          P Kumar   758
## 9         R Ashwin   740
## 10        Umar Gul   718

22. Team bowlers versus batsmen report (all oppositions) plot

This function can only be used for rank>0 (rank=1,2,3..)

# Top ODI bowler against India (KMDN Kulasekara)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="India",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","India")

bowlerVsbatsmen1-1

# Top ODI Indian bowler versus England (R Ashwin)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="England",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","England")

bowlerVsbatsmen1-2

#Top ODI Indian bowler against West Indies (RA Jadeja)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(ind_matches,theTeam="West Indies",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"India","West Indies")

bowlerVsbatsmen1-3

23. Team bowlers versus batsmen plot (all oppositions)

#Top bowler against South Africa (Shahid Afridi)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(sa_matches,theTeam="South Africa",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"South Africa","South Africa")

bowlerVsbatsmen2-1

# Top  bowler versus Pakistan (SL Malinga)
df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(pak_matches,theTeam="Pakistan",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"Pakistan","Pakistan")

bowlerVsbatsmen2-2

24. Team Bowler Wicket Kind

# Top opposition bowlers against India and the kind of wickets
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="All")

bowlingWicketkind1-1

# Get the data frame. Do not plot
m <-teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=FALSE)
m
## Source: local data frame [34 x 3]
## Groups: bowler [?]
## 
##         bowler        wicketKind     m
##         (fctr)             (chr) (int)
## 1   MG Johnson            bowled     8
## 2   MG Johnson            caught    27
## 3   MG Johnson caught and bowled     1
## 4   MG Johnson               lbw     6
## 5   MG Johnson           run out     2
## 6  JM Anderson            bowled     4
## 7  JM Anderson            caught    25
## 8  JM Anderson               lbw     1
## 9  JM Anderson           run out     3
## 10     ST Finn            bowled    10
## ..         ...               ...   ...
# Best Indian bowlers against South Africa
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="South Africa")

bowlingWicketkind1-2

# Best Indian bowlers against Pakistan
teamBowlingWicketKindAllOppnAllMatches(ind_matches,t1="India",t2="Pakistan")

bowlingWicketkind1-3

25. Team Bowler Wicket Kind (continued)

# Best ODI opposition bowlers against  England
teamBowlingWicketKindAllOppnAllMatches(eng_matches,t1="England",t2="All")

bowlingWicketkind2-1

# Best ODI opposition bowlers  Australia
teamBowlingWicketKindAllOppnAllMatches(aus_matches,t1="Australia",t2="All")

bowlingWicketkind2-2

# Best bowlera against  Sri Lanka
teamBowlingWicketKindAllOppnAllMatches(sl_matches,t1="Sri Lanka",t2="All")

bowlingWicketkind2-3

26. Team Bowler Wicket Runs

# Opposition bowlers against India and runs conceded
teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=TRUE)

bowlingWicketRuns1-1

# Opposition bowlers against India and runs conceded returned as dataframe
m <-teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="All",plot=FALSE)
m
## Source: local data frame [10 x 3]
## 
##             bowler runsConceded wickets
##             (fctr)        (dbl)   (dbl)
## 1       MG Johnson         1020      44
## 2  KMDN Kulasekara         1492      40
## 3         DW Steyn          714      34
## 4       BAW Mendis          810      34
## 5      JM Anderson          991      33
## 6       SL Malinga         1402      33
## 7       AD Mathews          800      31
## 8          ST Finn          775      30
## 9      NLTC Perera          983      30
## 10       SCJ Broad          903      29
# Top Indian bowlers and runs conceded
teamBowlingWicketRunsAllOppnAllMatches(ind_matches,t1="India",t2="Australia",plot=TRUE)

bowlingWicketRuns1-2

27. Team Bowler Wicket Runs (continued)

#Top opposition bowlers against Pakistan
teamBowlingWicketRunsAllOppnAllMatches(pak_matches,t1="Pakistan",t2="All",plot=TRUE)

bowlingWicketRuns2-1

#Top opposition bowlers against West Indies
teamBowlingWicketRunsAllOppnAllMatches(wi_matches,t1="West Indies",t2="All",plot=TRUE)

bowlingWicketRuns2-2

#Top opposition bowlers against Sri Lanka
teamBowlingWicketRunsAllOppnAllMatches(sl_matches,t1="Sri Lanka",t2="All",plot=TRUE)

bowlingWicketRuns2-3

#Top opposition bowlers against New Zealand
teamBowlingWicketRunsAllOppnAllMatches(nz_matches,t1="New Zealand",t2="All",plot=TRUE)

bowlingWicketRuns2-4

Conclusion

This post included all functions for a team in all matches against all oppositions. As before the data frames are already available. You can load the data and begin to use them. If more insights from the dataframe are possible do go ahead. But please do attribute the source to Cricheet (http://cricsheet.org), my package yorkr and my blog. Do give the functions a spin for yourself.

I will be coming up with the last part to my introduction to cricket package yorkr soon.

Watch this space!

You may also like

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. Cricket analytics with cricketr
  3. Literacy in India: A deepR dive
  4. Simulating an Edge shape in Android
  5. Re-working the Lucy Richardson algorithm in OpenCV
  6. Design principles of scalable distributed systems 7.TWS-4: Gossip protocol: Epidemics and rumors to the rescue

cricketr adapts to the Twenty20 International!


Introduction

This should be last in the series of posts based on my R package cricketr. That is, unless some bright idea comes trotting along and light bulbs go on around my head.

In this post cricketr adapts to the Twenty20 International format. Now cricketr can handle stats from all 3 formats of the game namely Test matches, ODIs and Twenty20 International from ESPN Cricinfo. You should be able to install the package from GitHub and use the many of the functions available in the package.

Please be mindful of the ESPN Cricinfo Terms of Use

Unititled2

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

You can also read this post at Rpubs as twenty20-cricketr. Download this report as a PDF file from twenty20-cricketr.pdf

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

I have chosen the Top 4 batsmen and top 4 bowlers based on ICC rankings and/or number of matches played.

Batsmen

  1. Virat Kohli (Ind)
  2. Faf du Plessis (SA)
  3. A J Finch (Aus)
  4. Brendon McCullum (Aus)

Bowlers

  1. Samuel Badree (WI)
  2. Sunil Narine (WI)
  3. Ravichander Ashwin (Ind)
  4. Ajantha Mendis (SL)

I have explained the plots and added my own observations. Please feel free to draw your conclusions!

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html.

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohlis profile is 253802. This can be used to get the data for Virat Kohli as shown below

kohli <- getPlayerDataTT(253802,dir="..",file="kohli.csv",type="batting")

The analysis is included below

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

  1. Virat Kohli (Ind) – Innings-26, Runs-972, Average-46.28,Strike Rate-131.70
  2. Faf du Plessis (SA) – Innings-24, Runs-805, Average-42.36,Strike Rate-135.75
  3. A J Finch (Aus) – Innings-22, Runs-756, Average-39.78,Strike Rate-152.41
  4. Brendon McCullum (NZ) – Innings-70, Runs-2140, Average-35.66,Strike Rate-136.21

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored A regression line is fitted in each of these plots for each of the ODI batsmen

A. Virat Kohli
– The 1st plot shows that Kohli approximately hits about 5 4’s on his way to the 50s
– The 2nd box plot of no of 6s and runs shows the range of runs when Kohli scored 1,2 or 4 6s. The dark line in the box shows the average runs when he scored those number of 6s. So when he scored 1 6 the average runs he scored was 45
– The 3rd plot shows the number of runs scored against the balls faced. It can be seen when Kohli faced 50 balls he had scored around ~ 70 runs

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanScoringRateODTT("./kohli.csv","Kohli")

kohli-4s6sSR-1

dev.off()
## null device 
##           1

B. Faf du Plessis

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./plessis.csv","Du Plessis")
batsman6s("./plessis.csv","Du Plessis")
batsmanScoringRateODTT("./plessis.csv","Du Plessss")

plessis-4s6SR-1

dev.off()
## null device 
##           1

C. A J Finch

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./finch.csv","A J Finch")
batsman6s("./finch.csv","A J Finch")
batsmanScoringRateODTT("./finch.csv","A J Finch")

finch-4s6sSR-1

dev.off()
## null device 
##           1

D. Brendon McCullum

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./mccullum.csv","McCullum")
batsman6s("./mccullum.csv","McCullum")
batsmanScoringRateODTT("./mccullum.csv","McCullum")

mccullum-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

This plot shows the Mean Strike Rate of the batsman in each run range. It can be seen the A J Finch has the best strike rate followed by B McCullum.

par(mar=c(4,4,2,2))
frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

The plot below provides the average runs scored in each run range 0-5,5-10,10-15 etc. Clearly Kohli has the most runs scored in most of the runs ranges. . This is also evident in the fact that Kohli has the highest average. He is followed by McCullum

frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Percent 4’s,6’s in total runs scored

The plot below shows the percentage of runs scored by way of 4s and 6s for each batsman. Du Plessis has the highest percentage of 4s, McCullum has the highest 6s. Finch has the highest percentage of 4s & 6s – 25.37 + 15.64= 41.01%

rames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Kohli Du Plessis Finch McCullum
## Runs(1s,2s,3s) 64.29      64.55 58.99    61.45
## 4s             27.78      24.38 25.37    22.87
## 6s              7.94      11.07 15.64    15.69

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is then fitted based on the Balls Faced and Minutes at Crease to give the runs scored

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./plessis.csv","Du Plessis")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./finch.csv","A J Finch")
battingPerf3d("./mccullum.csv","McCullum")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A hypothetical Balls faced and Minutes at Crease is used to predict the runs scored by each batsman based on the computed prediction plane

BF <- seq( 5, 70,length=10)
Mins <- seq(5,70,length=10)
newDF <- data.frame(BF,Mins)

kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
plessis <- batsmanRunsPredict("./plessis.csv","Du Plessis",newdataframe=newDF)
finch <- batsmanRunsPredict("./finch.csv","A J Finch",newdataframe=newDF)
mccullum <- batsmanRunsPredict("./mccullum.csv","McCullum",newdataframe=newDF)

The predicted runs is displayed. As can be seen Finch has the best overall strike rate followed by McCullum.

batsmen <-cbind(round(kohli$Runs),round(plessis$Runs),round(finch$Runs),round(mccullum$Runs))
colnames(batsmen) <- c("Kohli","Du Plessis","Finch","McCullum")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kohli Du Plessis Finch McCullum
## 1           5            5     2          1     5        3
## 2          12           12    12         10    22       16
## 3          19           19    22         19    40       28
## 4          27           27    31         28    57       41
## 5          34           34    41         37    74       54
## 6          41           41    51         47    91       66
## 7          48           48    60         56   108       79
## 8          56           56    70         65   125       91
## 9          63           63    79         74   142      104
## 10         70           70    89         84   159      117

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means Kohli has the highest likelihood of scoring runs 34.2% likely to score 66 runs. Du Plessis has 25% likelihood to score 53 runs, A. Virat Kohli

batsmanRunsLikelihood("./kohli.csv","Kohli")

kohli-lh-1

## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 23.08 % likelihood that Kohli  will make  10 Runs in  10 balls over 13  Minutes 
## There is a 42.31 % likelihood that Kohli  will make  29 Runs in  23 balls over  30  Minutes 
## There is a 34.62 % likelihood that Kohli  will make  66 Runs in  47 balls over 63  Minutes

B. Faf Du Plessis

batsmanRunsLikelihood("./plessis.csv","Du Plessis")

plessis-l-1

## Summary of  Du Plessis 's runs scoring likelihood
## **************************************************
## 
## There is a 62.5 % likelihood that Du Plessis  will make  14 Runs in  11 balls over 19  Minutes 
## There is a 25 % likelihood that Du Plessis  will make  53 Runs in  40 balls over  50  Minutes 
## There is a 12.5 % likelihood that Du Plessis  will make  94 Runs in  61 balls over 90  Minutes

C. A J Finch

batsmanRunsLikelihood("./finch.csv","A J Finch")

finch-lh,cache-TRUE-1

## Summary of  A J Finch 's runs scoring likelihood
## **************************************************
## 
## There is a 20 % likelihood that A J Finch  will make  95 Runs in  54 balls over 70  Minutes 
## There is a 25 % likelihood that A J Finch  will make  42 Runs in  27 balls over  35  Minutes 
## There is a 55 % likelihood that A J Finch  will make  8 Runs in  8 balls over 12  Minutes

D. Brendon McCullum

batsmanRunsLikelihood("./mccullum.csv","McCullum")

mccullum-1

## Summary of  McCullum 's runs scoring likelihood
## **************************************************
## 
## There is a 50.72 % likelihood that McCullum  will make  11 Runs in  10 balls over 13  Minutes 
## There is a 28.99 % likelihood that McCullum  will make  36 Runs in  27 balls over  37  Minutes 
## There is a 20.29 % likelihood that McCullum  will make  74 Runs in  48 balls over 70  Minutes

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following. It must be noted that there is not sufficient data yet on Twenty20 Internationals. Kpohli, Du Plessis and Finch average only 26 innings while McCullum has close to 70. So the moving average while an indication will regress towards the mean over time.

  1. The moving average of Kohli and Du Plessis is on the way up.
  2. McCullum has a consistent performance while Finch had a brief burst in 2013-2014
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./plessis.csv","Du Plessis")
batsmanMovingAverage("./finch.csv","A J Finch")
batsmanMovingAverage("./mccullum.csv","McCullum")

sdgm-ma-1

dev.off()
## null device 
##           1

Analysis of bowlers

  1. Samuel Badree (WI) – Innings-22, Runs -464, Wickets – 31, Econ Rate : 5.39
  2. Sunil Narine (WI)- Innings-31,Runs-666, Wickets – 38 , Econ Rate : 5.70
  3. Ravichander Ashwin (Ind)- Innings-26, Runs- 732, Wickets – 25, Econ Rate : 7.32
  4. Ajantha Mendis (SL)- Innings-39, Runs – 952,Wickets – 66, Econ Rate : 6.45

The plot shows the frequency with which the bowlers have taken 1,2,3 etc wickets. The most wickets taken is by Ajantha Mendis (6 wickets)

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./badree.csv","Badree")
bowlerWktsFreqPercent("./mendis.csv","Mendis")
bowlerWktsFreqPercent("./narine.csv","Narine")
bowlerWktsFreqPercent("./ashwin.csv","Ashwin")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. The ends of the box indicate the 25% and 75% percentile of runs scored for the wickets taken and the dark balck line is the average runs conceded.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./badree.csv","Badree")
bowlerWktsRunsPlot("./mendis.csv","Mendis")
bowlerWktsRunsPlot("./narine.csv","Narine")
bowlerWktsRunsPlot("./ashwin.csv","Ashwin")

wktsrun-1

dev.off()
## null device 
##           1

This plot below shows the average number of deliveries needed by the bowler to take the wickets (1,2,3 etc)

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./badree.csv","Badree")
bowlerWktRateTT("./mendis.csv","Mendis")

wktsrate1-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./narine.csv","Narine")
bowlerWktRateTT("./ashwin.csv","Ashwin")

wktsrate2-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Narine has the most wickets in the 2 -4 range followed by Mendis

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

The economy rate can be deduced as follows from the plot below. Narine has a good economy rate around 1 & 4 wickets, Ashwin around 2 wickets and Badree around 3. wickets

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingERODTT(frames,names)

relBowlER-1

Relative Wicket Rate

The relative wicket rate plots the mean number of deliveries needed to take the wickets namely (1,2,3,4). For e.g. Narine needed an average of 22 deliveries to take 1 wicket and 22.5,23.2, 24 deliveries to take 2,3 & 4 wickets respectively

frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeWktRateTT(frames,names)

relBowlWktRate-1

Moving average of wickets over career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./badree.csv","Badree")
bowlerMovingAverage("./mendis.csv","Mendis")
bowlerMovingAverage("./narine.csv","Narine")
bowlerMovingAverage("./ashwin.csv","Ashwin")
## null device 
##           1

jsba-bowlma-1

Key findings

Here are some key conclusions

Twenty 20 batsmen

  1. Kohli has the a very consistent performance scoring high runs in the different run ranges. Kohli also has a 34.2% likelihood to score 6 runs. He is followed by McCullum for consisten performance
  2. Finch has a best strike rate followed by McCullum.
  3. Du Plessis has the highest percentage of 4s and McCullum has the percentage of 6s. Finch is superior in the percentage of runs scored in 4s and 6s
  4. For a hypothetical balls faced and minutes at crease, Finch does best followed by McCullum
  5. Kohli’s & Du Plessis Twenty20 career is on a upswing. Can they maintain the momentum. McCullum is consistent

Twenty20 bowlers

  1. Narine has the highest wickets percentage for different wickets taken followed by Mendis
  2. Mendis has taken 1,2,3,4,6 wickets in 24 deliveries
  3. Narine has the lowest economy rate for 1 & 4 wickets, Ashwin for 2 wickets and Badree for 3 wickets. Mendis is comparatively expensive
  4. Narine needed the least deliveries to get 1 (22.5) & 2 (23.2) wickets, Mendis needed 20.5 deliveries and Ashwin 19 deliveries for 4 wickets

Key takeaways 1. If all the above batsment and bowlers were in the same team we expect

  1. Finch would be most useful when the run rate has to be greatly accelerated followed by McCullum
  2. If the need is to consolidate, then Kohli is the best man for the job followed by McCullum
  3. Overall McCullum is the best bet for Twenty20
  4. When it comes to bowling Narine wins hands down as he has the most wickets, a good economy rate and a very good attack rate. So Narine is great bet for providing a vital breakthrough.

Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. cricketr plays the ODIs!
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Mirror, mirror . the best batsman of them all?

You may also like

  1. A closer look at “Robot Horse on a Trot” in Android
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  6. Deblurring with OpenCV:Weiner filter reloaded
  7. Architecting a cloud based IP Multimedia System (IMS)

cricketr plays the ODIs!


Published in R bloggers: cricketr plays the ODIs

Introduction

In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI batsmen and top 4 ODI bowlers.

Unititled2

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

d $4.99/Rs 320 and $6.99/Rs448 respectively

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

You can also read this post at Rpubs as odi-cricketr. Dowload this report as a PDF file from odi-cricketr.pdf

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.
Batsmen

  1. Virendar Sehwag (Ind)
  2. AB Devilliers (SA)
  3. Chris Gayle (WI)
  4. Glenn Maxwell (Aus)

Bowlers

  1. Mitchell Johnson (Aus)
  2. Lasith Malinga (SL)
  3. Dale Steyn (SA)
  4. Tim Southee (NZ)

I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below

The profile for Virender Sehwag is 35263. This can be used to get the ODI data for Sehwag. For a batsman the type should be “batting” and for a bowler the type should be “bowling” and the function is getPlayerDataOD()

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

The One day data for a particular player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virendar Sehwag, etc. This will bring up a page which have the profile number for the player e.g. for Virendar Sehwag this would be http://www.espncricinfo.com/india/content/player/35263.html. Hence, Sehwag’s profile is 35263. This can be used to get the data for Virat Sehwag as shown below

sehwag <- getPlayerDataOD(35263,dir="..",file="sehwag.csv",type="batting")

Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

  1. Virendar Sehwag (Ind) – Innings – 245, Runs = 8586, Average=35.05, Strike Rate= 104.33
  2. AB Devilliers (SA) – Innings – 179, Runs= 7941, Average=53.65, Strike Rate= 99.12
  3. Chris Gayle (WI) – Innings – 264, Runs= 9221, Average=37.65, Strike Rate= 85.11
  4. Glenn Maxwell (Aus) – Innings – 45, Runs= 1367, Average=35.02, Strike Rate= 126.69

Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

  1. 4s vs Runs scored
  2. 6s vs Runs scored
  3. Balls faced vs Runs scored

A regression line is fitted in each of these plots for each of the ODI batsmen A. Virender Sehwag

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./sehwag.csv","Sehwag")
batsman6s("./sehwag.csv","Sehwag")
batsmanScoringRateODTT("./sehwag.csv","Sehwag")

sehwag-4s6sSR-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./devilliers.csv","Devillier")
batsman6s("./devilliers.csv","Devillier")
batsmanScoringRateODTT("./devilliers.csv","Devillier")

devillier-4s6SR-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gayle.csv","Gayle")
batsman6s("./gayle.csv","Gayle")
batsmanScoringRateODTT("./gayle.csv","Gayle")

gayle-4s6sSR-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./maxwell.csv","Maxwell")
batsman6s("./maxwell.csv","Maxwell")
batsmanScoringRateODTT("./maxwell.csv","Maxwell")

maxwell-4s6sout-1

dev.off()
## null device 
##           1

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. It can be seen that Maxwell has a awesome strike rate in ODIs. However we need to keep in mind that Maxwell has relatively much fewer (only 45 innings) innings. He is followed by Sehwag who(most innings- 245) also has an excellent strike rate till 100 runs and then we have Devilliers who roars ahead. This is also seen in the overall strike rate in above

par(mar=c(4,4,2,2))
frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeBatsmanSRODTT(frames,names)

plot-1-1

Relative Runs Frequency Percentage

Sehwag leads in the percentage of runs in 10 run ranges upto 50 runs. Maxwell and Devilliers lead in 55-66 & 66-85 respectively.

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
relativeRunsFreqPerfODTT(frames,names)

plot-2-1

Percentage of 4s,6s in the runs scored

The plot below shows the percentage of runs made by the batsmen by ways of 1s,2s,3s, 4s and 6s. It can be seen that Sehwag has the higheest percent of 4s (33.36%) in his overall runs in ODIs. Maxwell has the highest percentage of 6s (13.36%) in his ODI career. If we take the overall 4s+6s then Sehwag leads with (33.36 +5.95 = 39.31%),followed by Gayle (27.80+10.15=37.95%)

Percent 4’s,6’s in total runs scored

The plot below shows the contrib

frames <- list("./sehwag.csv","./devilliers.csv","gayle.csv","maxwell.csv")
names <- list("Sehwag","Devilliers","Gayle","Maxwell")
runs4s6s <-batsman4s6s(frames,names)

plot-46s-1

print(runs4s6s)
##                Sehwag Devilliers Gayle Maxwell
## Runs(1s,2s,3s)  60.69      67.39 62.05   62.11
## 4s              33.36      24.28 27.80   24.53
## 6s               5.95       8.32 10.15   13.36
 

Runs forecast

The forecast for the batsman is shown below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./sehwag.csv","Sehwag")
batsmanPerfForecast("./devilliers.csv","Devilliers")
batsmanPerfForecast("./gayle.csv","Gayle")
batsmanPerfForecast("./maxwell.csv","Maxwell")

swcr-perf-1

dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./sehwag.csv","V Sehwag")
battingPerf3d("./devilliers.csv","AB Devilliers")

plot-3-1

dev.off()
## null device 
##           1
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./gayle.csv","C Gayle")
battingPerf3d("./maxwell.csv","G Maxwell")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)

sehwag <- batsmanRunsPredict("./sehwag.csv","Sehwag",newdataframe=newDF)
devilliers <- batsmanRunsPredict("./devilliers.csv","Devilliers",newdataframe=newDF)
gayle <- batsmanRunsPredict("./gayle.csv","Gayle",newdataframe=newDF)
maxwell <- batsmanRunsPredict("./maxwell.csv","Maxwell",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Maxwell sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease followed by Sehwag. But we have to keep in mind that Maxwell has only around 1/5th of the innings of Sehwag (45 to Sehwag’s 245 innings). They are followed by Devilliers and then finally Gayle

batsmen <-cbind(round(sehwag$Runs),round(devilliers$Runs),round(gayle$Runs),round(maxwell$Runs))
colnames(batsmen) <- c("Sehwag","Devilliers","Gayle","Maxwell")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Sehwag Devilliers Gayle Maxwell
## 1          10           30     11         12    11      18
## 2          31           51     33         32    28      43
## 3          52           72     55         52    46      67
## 4          73           93     77         71    63      92
## 5          94          114    100         91    81     117
## 6         116          136    122        111    98     141
## 7         137          157    144        130   116     166
## 8         158          178    167        150   133     191
## 9         179          199    189        170   151     215
## 10        200          220    211        190   168     240

Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means It can be seen that Devilliers has almost 27.75% likelihood to make around 90+ runs. Gayle and Sehwag have 34% to make 40+ runs. A. Virender Sehwag

A. Virender Sehwag

batsmanRunsLikelihood("./sehwag.csv","Sehwag")

smith-1

## Summary of  Sehwag 's runs scoring likelihood
## **************************************************
## 
## There is a 35.22 % likelihood that Sehwag  will make  46 Runs in  44 balls over 67  Minutes 
## There is a 9.43 % likelihood that Sehwag  will make  119 Runs in  106 balls over  158  Minutes 
## There is a 55.35 % likelihood that Sehwag  will make  12 Runs in  13 balls over 18  Minutes

B. AB Devilliers

batsmanRunsLikelihood("./devilliers.csv","Devilliers")

warner-1

## Summary of  Devilliers 's runs scoring likelihood
## **************************************************
## 
## There is a 30.65 % likelihood that Devilliers  will make  44 Runs in  43 balls over 60  Minutes 
## There is a 29.84 % likelihood that Devilliers  will make  91 Runs in  88 balls over  124  Minutes 
## There is a 39.52 % likelihood that Devilliers  will make  11 Runs in  15 balls over 21  Minutes

C. Chris Gayle

batsmanRunsLikelihood("./gayle.csv","Gayle")

cook,cache-TRUE-1

## Summary of  Gayle 's runs scoring likelihood
## **************************************************
## 
## There is a 32.69 % likelihood that Gayle  will make  47 Runs in  51 balls over 72  Minutes 
## There is a 54.49 % likelihood that Gayle  will make  10 Runs in  15 balls over  20  Minutes 
## There is a 12.82 % likelihood that Gayle  will make  109 Runs in  119 balls over 172  Minutes

D. Glenn Maxwell

batsmanRunsLikelihood("./maxwell.csv","Maxwell")

oot-1

## Summary of  Maxwell 's runs scoring likelihood
## **************************************************
## 
## There is a 34.38 % likelihood that Maxwell  will make  39 Runs in  29 balls over 35  Minutes 
## There is a 15.62 % likelihood that Maxwell  will make  89 Runs in  55 balls over  69  Minutes 
## There is a 50 % likelihood that Maxwell  will make  6 Runs in  7 balls over 9  Minutes

Average runs at ground and against opposition

A. Virender Sehwag

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./sehwag.csv","Sehwag")
batsmanAvgRunsOpposition("./sehwag.csv","Sehwag")

avgrg-1-1

dev.off()
## null device 
##           1

B. AB Devilliers

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./devilliers.csv","Devilliers")
batsmanAvgRunsOpposition("./devilliers.csv","Devilliers")

avgrg-2-1

dev.off()
## null device 
##           1

C. Chris Gayle

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gayle.csv","Gayle")
batsmanAvgRunsOpposition("./gayle.csv","Gayle")

avgrg-3-1

dev.off()
## null device 
##           1

D. Glenn Maxwell

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./maxwell.csv","Maxwell")
batsmanAvgRunsOpposition("./maxwell.csv","Maxwell")

avgrg-4-1

dev.off()
## null device 
##           1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

1. The moving average of Devilliers and Maxwell is on the way up.
2. Sehwag shows a slight downward trend from his 2nd peak in 2011
3. Gayle maintains a consistent 45 runs for the last few years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./sehwag.csv","Sehwag")
batsmanMovingAverage("./devilliers.csv","Devilliers")
batsmanMovingAverage("./gayle.csv","Gayle")
batsmanMovingAverage("./maxwell.csv","Maxwell")

sdgm-ma-1

dev.off()
## null device 
##           1

Check batsmen in-form, out-of-form

  1. Maxwell, Devilliers, Sehwag are in-form. This is also evident from the moving average plot
  2. Gayle is out-of-form
checkBatsmanInForm("./sehwag.csv","Sehwag")
## *******************************************************************************************
## 
## Population size: 143  Mean of population: 33.76 
## Sample size: 16  Mean of sample: 37.44 SD of sample: 55.15 
## 
## Null hypothesis H0 : Sehwag 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Sehwag 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Sehwag 's Form Status: In-Form because the p value: 0.603525  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./devilliers.csv","Devilliers")
## *******************************************************************************************
## 
## Population size: 111  Mean of population: 43.5 
## Sample size: 13  Mean of sample: 57.62 SD of sample: 40.69 
## 
## Null hypothesis H0 : Devilliers 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Devilliers 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Devilliers 's Form Status: In-Form because the p value: 0.883541  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gayle.csv","Gayle")
## *******************************************************************************************
## 
## Population size: 140  Mean of population: 37.1 
## Sample size: 16  Mean of sample: 17.25 SD of sample: 20.25 
## 
## Null hypothesis H0 : Gayle 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Gayle 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Gayle 's Form Status: Out-of-Form because the p value: 0.000609  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./maxwell.csv","Maxwell")
## *******************************************************************************************
## 
## Population size: 28  Mean of population: 25.25 
## Sample size: 4  Mean of sample: 64.25 SD of sample: 36.97 
## 
## Null hypothesis H0 : Maxwell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Maxwell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Maxwell 's Form Status: In-Form because the p value: 0.948744  is greater than alpha=  0.05"
## *******************************************************************************************

Analysis of bowlers

  1. Mitchell Johnson (Aus) – Innings-150, Wickets – 239, Econ Rate : 4.83
  2. Lasith Malinga (SL)- Innings-182, Wickets – 287, Econ Rate : 5.26
  3. Dale Steyn (SA)- Innings-103, Wickets – 162, Econ Rate : 4.81
  4. Tim Southee (NZ)- Innings-96, Wickets – 135, Econ Rate : 5.33

Malinga has the highest number of innings and wickets followed closely by Mitchell. Steyn and Southee have relatively fewer innings.

To get the bowler’s data use

malinga <- getPlayerDataOD(49758,dir=".",file="malinga.csv",type="bowling")

Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./mitchell.csv","J Mitchell")
bowlerWktsFreqPercent("./malinga.csv","Malinga")
bowlerWktsFreqPercent("./steyn.csv","Steyn")
bowlerWktsFreqPercent("./southee.csv","southee")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. M Johnson and Steyn are more economical than Malinga and Southee corroborating the figures above

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./mitchell.csv","J Mitchell")
bowlerWktsRunsPlot("./malinga.csv","Malinga")
bowlerWktsRunsPlot("./steyn.csv","Steyn")
bowlerWktsRunsPlot("./southee.csv","southee")

wktsrun-1

dev.off()
## null device 
##           1

Average wickets in different grounds and opposition

A. Mitchell Johnson

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./mitchell.csv","J Mitchell")
bowlerAvgWktsOpposition("./mitchell.csv","J Mitchell")

gr-1-1

dev.off()
## null device 
##           1

B. Lasith Malinga

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./malinga.csv","Malinga")
bowlerAvgWktsOpposition("./malinga.csv","Malinga")

gr-2-1

dev.off()
## null device 
##           1

C. Dale Steyn

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./steyn.csv","Steyn")
bowlerAvgWktsOpposition("./steyn.csv","Steyn")

gr-3-1

dev.off()
## null device 
##           1

D. Tim Southee

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./southee.csv","southee")
bowlerAvgWktsOpposition("./southee.csv","southee")

avgrg-4-1

dev.off()
## null device 
##           1

Relative bowling performance

The plot below shows that Mitchell Johnson and Southee have more wickets in 3-4 wickets range while Steyn and Malinga in 1-2 wicket range

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Steyn had the best economy rate followed by M Johnson. Malinga and Southee have a poorer economy rate

frames <- list("./mitchell.csv","./malinga.csv","steyn.csv","southee.csv")
names <- list("M Johnson","Malinga","Steyn","Southee")
relativeBowlingERODTT(frames,names)

relBowlER-1

Moving average of wickets over career

Johnson and Steyn career vs wicket graph is on the up-swing. Southee is maintaining a reasonable record while Malinga shows a decline in ODI performance

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./mitchell.csv","M Johnson")
bowlerMovingAverage("./malinga.csv","Malinga")
bowlerMovingAverage("./steyn.csv","Steyn")
bowlerMovingAverage("./southee.csv","Southee")

jmss-bowlma-1

dev.off()
## null device 
##           1

Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./mitchell.csv","M Johnson")
bowlerPerfForecast("./malinga.csv","Malinga")
bowlerPerfForecast("./steyn.csv","Steyn")
bowlerPerfForecast("./southee.csv","southee")

jsba-pfcst-1

dev.off()
## null device 
##           1

Check bowler in-form, out-of-form

All the bowlers are shown to be still in-form

checkBowlerInForm("./mitchell.csv","J Mitchell")
## *******************************************************************************************
## 
## Population size: 135  Mean of population: 1.55 
## Sample size: 15  Mean of sample: 2 SD of sample: 1.07 
## 
## Null hypothesis H0 : J Mitchell 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : J Mitchell 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "J Mitchell 's Form Status: In-Form because the p value: 0.937917  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./malinga.csv","Malinga")
## *******************************************************************************************
## 
## Population size: 163  Mean of population: 1.58 
## Sample size: 19  Mean of sample: 1.58 SD of sample: 1.22 
## 
## Null hypothesis H0 : Malinga 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Malinga 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Malinga 's Form Status: In-Form because the p value: 0.5  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./steyn.csv","Steyn")
## *******************************************************************************************
## 
## Population size: 93  Mean of population: 1.59 
## Sample size: 11  Mean of sample: 1.45 SD of sample: 0.69 
## 
## Null hypothesis H0 : Steyn 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Steyn 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Steyn 's Form Status: In-Form because the p value: 0.257438  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./southee.csv","southee")
## *******************************************************************************************
## 
## Population size: 86  Mean of population: 1.48 
## Sample size: 10  Mean of sample: 0.8 SD of sample: 1.14 
## 
## Null hypothesis H0 : southee 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : southee 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "southee 's Form Status: Out-of-Form because the p value: 0.044302  is less than alpha=  0.05"
## *******************************************************************************************

***************

Key findings

Here are some key conclusions ODI batsmen

  1. AB Devilliers has high frequency of runs in the 60-120 range and the highest average
  2. Sehwag has the most number of innings and good strike rate
  3. Maxwell has the best strike rate but it should be kept in mind that he has 1/5 of the innings of Sehwag. We need to see how he progress further
  4. Sehwag has the highest percentage of 4s in the runs scored, while Maxwell has the most 6s
  5. For a hypothetical Balls Faced and Minutes at creases Maxwell will score the most runs followed by Sehwag
  6. The moving average of indicates that the best is yet to come for Devilliers and Maxwell. Sehwag has a few more years in him while Gayle shows a decline in ODI performance and an out of form is indicated.

ODI bowlers

  1. Malinga has the highest played the highest innings and also has the highest wickets though he has poor economy rate
  2. M Johnson is the most effective in the 3-4 wicket range followed by Southee
  3. M Johnson and Steyn has the best overall economy rate followed by Malinga and Steyn 4 M Johnson and Steyn’s career is on the up-swing,Southee maintains a steady consistent performance, while Malinga shows a downward trend

Hasta la vista! I’ll be back!
Watch this space!

Also see my other posts in R

  1. Introducing cricketr! : An R package to analyze performances of cricketers
  2. cricketr digs the Ashes!
  3. A peek into literacy in India: Statistical Learning with R
  4. A crime map of India in R – Crimes against women
  5. Analyzing cricket’s batting legends – Through the mirage with R
  6. Mirror, mirror . the best batsman of them all?

You may also like

  1. A closer look at “Robot Horse on a Trot” in Android
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  6. Deblurring with OpenCV:Weiner filter reloadedhttp://www.r-bloggers.com/cricketr-plays-the-odis/

Taking cricketr for a spin – Part 1


“Curiouser and curiouser!” cried Alice
“The time has come,” the walrus said, “to talk of many things: Of shoes and ships – and sealing wax – of cabbages and kings”
“Begin at the beginning,”the King said, very gravely,“and go on till you come to the end: then stop.”
“And what is the use of a book,” thought Alice, “without pictures or conversation?”

            Excerpts from Alice in Wonderland by Lewis Carroll

Introduction

This post is a continuation of my previous post “Introducing cricketr! A R package to analyze the performances of cricketers.” In this post I take my package cricketr for a spin. For this analysis I focus on the Indian batting legends

– Sachin Tendulkar (Master Blaster)
– Rahul Dravid (The Will)
– Sourav Ganguly ( The Dada Prince)
– Sunil Gavaskar (Little Master)

This post is also hosted on RPubs – cricketr-1

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

d $4.99/Rs 320 and $6.99/Rs448 respectively

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

(Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar)

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

The package can be installed directly from CRAN

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)

or from Github

library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency The plot below indicate the Tendulkar’s average is the highest. He is followed by Dravid, Gavaskar and then Ganguly

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
tkps-boxhist-1
batsmanPerfBoxHist("./dravid.csv","Rahul Dravid")
tkps-boxhist-2
batsmanPerfBoxHist("./ganguly.csv","Sourav Ganguly")
tkps-boxhist-3
batsmanPerfBoxHist("./gavaskar.csv","Sunil Gavaskar")
tkps-boxhist-4

Relative Mean Strike Rate

In this first plot I plot the Mean Strike Rate of the batsmen. Tendulkar leads in the Mean Strike Rate for each runs in the range 100- 180. Ganguly has a very good Mean Strike Rate for runs range 40 -80

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeBatsmanSR(frames,names)

plot-1-1

Relative Runs Frequency Percentage

The plot below show the percentage contribution in each 10 runs bucket over the entire career.The percentage Runs Frequency is fairly close but Gavaskar seems to lead most of the way

frames <- list("./tendulkar.csv","./dravid.csv","ganguly.csv","gavaskar.csv")
names <- list("Tendulkar","Dravid","Ganguly","Gavaskar")
relativeRunsFreqPerf(frames,names)

plot-2-1

Moving Average of runs over career

The moving average for the 4 batsmen indicate the following – Tendulkar and Ganguly’s career has a downward trend and their retirement didn’t come too soon – Dravid and Gavaskar’s career definitely shows an upswing. They probably had a year or two left.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./dravid.csv","Dravid")
batsmanMovingAverage("./ganguly.csv","Ganguly")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

tdsg-ma-1

dev.off()
## null device 
##           1

Runs forecast

The forecast for the batsman is shown below. The plots indicate that only Tendulkar seemed to maintain a consistency over the period while the rest seem to score less than their forecasted runs in the last 10% of the career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./dravid.csv","Rahul Dravid")
batsmanPerfForecast("./ganguly.csv","Sourav Ganguly")
batsmanPerfForecast("./gavaskar.csv","Sunil Gavaskar")

tdsg-perf-1

dev.off()
## null device 
##           1

Check for batsman in-form/out-of-form

The following snippet checks whether the batsman is in-inform or ouyt-of-form during the last 10% innings of the career. This is done by choosing the null hypothesis (h0) to indicate that the batsmen are in-form. Ha is the alternative hypothesis that they are not-in-form. The population is based on the 1st 90% of career runs. The last 10% is taken as the sample and a check is made on the lower tail to see if the sample mean is less than 95% confidence interval. If this difference is >0.05 then the batsman is considered out-of-form.

The computation show that Tendulkar was out-of-form while the other’s weren’t. While Dravid and Gavaskar’s moving average do show an upward trend the surprise is Ganguly. This could be that Ganguly was able to keep his average in the last 10% to with the 95$ confidence interval. It has to be noted that Ganguly’s average was much lower than Tendulkar

checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## *******************************************************************************************
## 
## Population size: 294  Mean of population: 50.48 
## Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 
## 
## Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Tendulkar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./dravid.csv","Dravid")
## *******************************************************************************************
## 
## Population size: 256  Mean of population: 46.98 
## Sample size: 29  Mean of sample: 43.48 SD of sample: 40.89 
## 
## Null hypothesis H0 : Dravid 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Dravid 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Dravid 's Form Status: In-Form because the p value: 0.324138  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./ganguly.csv","Ganguly")
## *******************************************************************************************
## 
## Population size: 169  Mean of population: 38.94 
## Sample size: 19  Mean of sample: 33.21 SD of sample: 32.97 
## 
## Null hypothesis H0 : Ganguly 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Ganguly 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Ganguly 's Form Status: In-Form because the p value: 0.229006  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## *******************************************************************************************
## 
## Population size: 125  Mean of population: 44.67 
## Sample size: 14  Mean of sample: 57.86 SD of sample: 58.55 
## 
## Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater than alpha=  0.05"
## *******************************************************************************************
dev.off()
## null device 
##           1

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Tendulkar")
battingPerf3d("./dravid.csv","Dravid")

plot-3-1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./ganguly.csv","Ganguly")
battingPerf3d("./gavaskar.csv","Gavaskar")

plot-4-1

dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
dravid <- batsmanRunsPredict("./dravid.csv","Dravid",newdataframe=newDF)
ganguly <- batsmanRunsPredict("./ganguly.csv","Ganguly",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Tendulkar has a much higher Runs scored than all of the others.

Tendulkar is followed by Ganguly who we saw earlier had a very good strike rate. However it must be noted that Dravid and Gavaskar have a better average.

batsmen <-cbind(round(tendulkar$Runs),round(dravid$Runs),round(ganguly$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Dravid","Ganguly","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Dravid Ganguly Gavaskar
## 1          10           30         7      1       7        4
## 2          38           71        23     14      21       17
## 3          66          111        39     27      35       30
## 4          94          152        54     40      50       43
## 5         121          193        70     54      64       56
## 6         149          234        86     67      78       69
## 7         177          274       102     80      93       82
## 8         205          315       118     94     107       95
## 9         233          356       134    107     121      108
## 10        261          396       150    120     136      121
## 11        289          437       165    134     150      134
## 12        316          478       181    147     165      147
## 13        344          519       197    160     179      160
## 14        372          559       213    173     193      173
## 15        400          600       229    187     208      186

Contribution to matches won and lost

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost(35320,"Tendulkar")
batsmanContributionWonLost(28114,"Dravid")
batsmanContributionWonLost(28779,"Ganguly")
batsmanContributionWonLost(28794,"Gavaskar")

tdgg-1

Home and overseas performance

From the plot below Tendulkar and Dravid have a lot more matches both home and abroad and their performance has good both at home and overseas. Tendulkar has the best performance home and abroad and is consistent all across. Dravid is also cossistent at all venues. Gavaskar played fewer matches than Tendulkar & Dravid. The range of runs at home is higher than overseas, however the average is consistent both at home and abroad. Finally we have Ganguly.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway(35320,"Tendulkar")
batsmanPerfHomeAway(28114,"Dravid")
batsmanPerfHomeAway(28779,"Ganguly")
batsmanPerfHomeAway(28794,"Gavaskar")
tdgg-ha-1

Average runs at ground and against opposition

Tendulkar has above 50 runs average against Sri Lanka, Bangladesh, West Indies and Zimbabwe. The performance against Australia and England average very close to 50. Sydney, Port Elizabeth, Bloemfontein, Collombo are great huntings grounds for Tendulkar

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
avgrg-1-1
dev.off()
## null device 
##           1

Dravid plundered runs at Adelaide, Georgetown, Oval, Hamiltom etc. Dravid has above average against England, Bangaldesh, New Zealand, Pakistan, West Indies and Zimbabwe

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./dravid.csv","Dravid")
batsmanAvgRunsOpposition("./dravid.csv","Dravid")
avgrg-2-1
dev.off()
## null device 
##           1

Ganguly has good performance at the Oval, Rawalpindi, Johannesburg and Kandy. Ganguly averages 50 runs against England and Bangladesh.

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./ganguly.csv","Ganguly")
batsmanAvgRunsOpposition("./ganguly.csv","Ganguly")
avgrg-3-1
dev.off()
## null device 
##           1

The Oval, Sydney, Perth, Melbourne, Brisbane, Manchester are happy hunting grounds for Gavaskar. Gavaskar averages around 50 runs Australia, Pakistan, Sri Lanka, West Indies.

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")
avgrg-4-1
dev.off()
## null device 
##           1

Key findings

Here are some key conclusions

  1. Tendulkar has the highest average among the 4. He is followed by Dravid, Gavaskar and Ganguly.
  2. Tendulkar’s predicted performance for a given number of Balls Faced and Minutes at Crease is superior to the rest
  3. Dravid averages above 50 against 6 countries
  4. West Indies and Australia are Gavaskar’s favorite batting grounds
  5. Ganguly has a very good Mean Strike Rate for the range 40-80 and Tendulkar from 100-180
  6. In home and overseas performance, Tendulkar is the best. Dravid and Gavaskar also have good performance overseas.
  7. Dravid and Gavaskar probably retired a year or two earlier while Tendulkar and Ganguly’s time was clearly up

Final thoughts

Tendulkar is clearly the greatest batsman India has produced as he leads in almost all aspects of batting – number of centuries, strike rate, predicted runs and home and overseas performance. Dravid follows Tendulkar with 48 centuries, consistent performance home and overseas and a career that was still green. Gavaskar has fewer matches than rest but his performance overseas is very good in those helmetless times. Finally we have Ganguly.

Dravid and Gavaskar had a few more years of great batting while Tendulkar and Ganguly’s career was on a decline.

Note:It is really not fair to include Gavaskar in the analysis as he played in a different era when helmets were not used, even against the fiery pace of Thomson, Lillee, Roberts, Holding etc. In addition Gavaskar did not play against some of the newer countries like Bangladesh and Zimbabwe where he could have amassed runs. Yet I wanted to include him and his performance is clearly excellent

Also see my other posts in R

  1. A peek into literacy in India: Statistical Learning with R
  2. A crime map of India in R – Crimes against women
  3. Analyzing cricket’s batting legends – Through the mirage with R
  4. Masters of Spin: Unraveling the web with R
  5. Mirror, mirror . the best batsman of them all?

You may also like

  1. A crime map of India in R: Crimes against women
  2. What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
  3. Bend it like Bluemix, MongoDB with autoscaling – Part 2
  4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
  5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
  6. Deblurring with OpenCV:Weiner filter reloaded

Introducing cricketr! : An R package to analyze performances of cricketers


Yet all experience is an arch wherethro’
Gleams that untravell’d world whose margin fades
For ever and forever when I move.
How dull it is to pause, to make an end,
To rust unburnish’d, not to shine in use!

Ulysses by Alfred Tennyson

Introduction

This is an initial post in which I introduce a cricketing package ‘cricketr’ which I have created. This package was a natural culmination to my earlier posts on cricket and my finishing 10 modules of Data Science Specialization, from John Hopkins University at Coursera. The thought of creating this package struck me some time back, and I have finally been able to bring this to fruition.

So here it is. My R package ‘cricketr!!!’

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package can handle all formats of the game including Test, ODI and Twenty20 cricket.

You should be able to install the package from CRAN and use  many of the functions available in the package. Please be mindful of  ESPN Cricinfo Terms of Use

(Note: This page is also hosted as a GitHub page at cricketr and also at RPubs as cricketr: A R package for analyzing performances of cricketers

You can download this analysis as a PDF file from Introducing cricketr

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed.

You can clone the cricketr code from Github at cricketr

(Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial)

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Please look at my recent post, which includes updates to this post, and 8 new functions added to the cricketr package “Re-introducing cricketr: An R package to analyze the performances of cricketers

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

 The cricketr package

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecast and a function to check whether the batsman/bowler is in in-form or out-of-form.

The data for a particular player can be obtained with the getPlayerData() function from the package. To do this you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below

The cricketr package is now available from  CRAN!!!.  You should be able to install directly with

if (!require("cricketr")){ 
    install.packages("cricketr",lib = "c:/test") 
} 
library(cricketr)
?getPlayerData
## 
## getPlayerData(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2], result=[1, 2, 4], create=True)
##     Get the player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a .csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerData(profile,opposition="",host="",dir="./data",file="player001.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Sachin Tendulkar this turns out to be http://www.espncricinfo.com/india/content/player/35320.html. Hence the profile for Sachin is 35320
##     opposition  
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     host        
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is a vector with either 1,2 or both. 1 is for home 2 is for away
##     result      
##     This is a vector that can take values 1,2,4. 1 - won match 2- lost match 4- draw
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp
##     
##     Examples
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar1.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     
##     # Only away. Get data only for won and lost innings
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar2.csv",
##     type="batting",homeOrAway=c(2),result=c(1,2))
##     
##     # Get bowling data and store in file for future
##     kumble = getPlayerData(30176,dir=".",file="kumble1.csv",
##     type="bowling",homeOrAway=c(1),result=c(1,2))
##     
##     #Get the Tendulkar's Performance against Australia in Australia
##     tendulkar = getPlayerData(35320, opposition = 2,host=2,dir=".", 
##     file="tendulkarVsAusInAus.csv",type="batting")

The cricketr package includes some pre-packaged sample (.csv) files. You can use these sample to test functions  as shown below

# Retrieve the file path of a data file installed with cricketr
pathToFile ,"Sachin Tendulkar")

unnamed-chunk-2-1

Alternatively, the cricketr package can be installed from GitHub with

if (!require("cricketr")){ 
    library(devtools) 
    install_github("tvganesh/cricketr") 
}
library(cricketr)

The pre-packaged files can be accessed as shown above.
To get the data of any player use the function getPlayerData()

tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2),
                           result=c(1,2,4))

Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

Sachin Tendulkar’s performance – Basic Analyses

The 3 plots below provide the following for Tendulkar

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Sachin Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Sachin Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Sachin Tendulkar")

tendulkar-batting-1

dev.off()
## null device 
##           1

More analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")

tendulkar-4s6sout-1

 

3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Sachin’s Runs versus Balls Faced and Minutes at crease. A linear regression model is then fitted between Runs and Balls Faced + Minutes at crease

battingPerf3d("./tendulkar.csv","Sachin Tendulkar")

tendulkar-3d-1

Average runs at different venues

The plot below gives the average runs scored by Tendulkar at different grounds. The plot also displays the number of innings at each ground as a label at x-axis. It can be seen Tendulkar did great in Colombo (SSC), Melbourne ifor matches overseas and Mumbai, Mohali and Bangalore at home

batsmanAvgRunsGround("./tendulkar.csv","Sachin Tendulkar")
tendulkar-avggrd-1

Average runs against different opposing teams

This plot computes the average runs scored by Tendulkar against different countries. The x-axis also gives the number of innings against each team

batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
tendulkar-avgopn-1

Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease using. K-Means. The centroids of 3 clusters are computed and plotted. In this plot. Sachin Tendulkar’s highest tendencies are computed and plotted using K-Means

batsmanRunsLikelihood("./tendulkar.csv","Sachin Tendulkar")

tendulkar-kmeans-1

## Summary of  Sachin Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Sachin Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 58.41 % likelihood that Sachin Tendulkar  will make  16 Runs in  31 balls over  44  Minutes 
## There is a 25.08 % likelihood that Sachin Tendulkar  will make  66 Runs in  122 balls over 167  Minutes

A look at the Top 4 batsman – Tendulkar, Kallis, Ponting and Sangakkara

The batsmen with the most hundreds in test cricket are

  1. Sachin Tendulkar :Average:53.78,100’s – 51, 50’s – 68
  2. Jacques Kallis : Average: 55.47, 100’s – 45, 50’s – 58
  3. Ricky Ponting : Average: 51.85, 100’s – 41 , 50’s – 62
  4. Kumara Sangakarra: Average: 58.04 ,100’s – 38 , 50’s – 52

in that order.

The following plots take a closer at their performances. The box plots show the mean (red line) and median (blue line). The two ends of the boxplot display the 25th and 75th percentile.

Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency. The calculated Mean differ from the stated means possibly because of data cleaning. Also not sure how the means were arrived at ESPN Cricinfo for e.g. when considering not out..

batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")

tkps-boxhist-1

batsmanPerfBoxHist("./kallis.csv","Jacques Kallis")

tkps-boxhist-2

batsmanPerfBoxHist("./ponting.csv","Ricky Ponting")

tkps-boxhist-3

batsmanPerfBoxHist("./sangakkara.csv","K Sangakkara")

tkps-boxhist-4

Contribution to won and lost matches

The plot below shows the contribution of Tendulkar, Kallis, Ponting and Sangakarra in matches won and lost. The plots show the range of runs scored as a boxplot (25th & 75th percentile) and the mean scored. The total matches won and lost are also printed in the plot.

All the players have scored more in the matches they won than the matches they lost. Ricky Ponting is the only batsman who seems to have more matches won to his credit than others. This could also be because he was a member of strong Australian team

For the next 2 functions below you will have to use the getPlayerDataSp() function. I
have commented this as I already have these files

tendulkarsp 
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("kallissp.csv","Kallis")
batsmanContributionWonLost("pontingsp.csv","Ponting")
batsmanContributionWonLost("sangakkarasp.csv","Sangakarra")

tkps-wonlost-1

dev.off()
## null device 
##           1

Performance at home and overseas

From the plot below it can be seen
Tendulkar has more matches overseas than at home and his performance is consistent in all venues at home or abroad. Ponting has lesser innings than Tendulkar and has an equally good performance at home and overseas.Kallis and Sangakkara’s performance abroad is lower than the performance at home.

This function also requires the use of getPlayerDataSp() as shown above

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("kallissp.csv","Kallis")
batsmanPerfHomeAway("pontingsp.csv","Ponting")
batsmanPerfHomeAway("sangakkarasp.csv","Sangakarra")
dev.off()
tkps-homeaway-1
dev.off()
## null device 
##           1
 

Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 0 – 50 Runs – Ponting leads followed by Tendulkar Range 50 -100 Runs – Ponting followed by Sangakkara Range 100 – 150 – Ponting and then Tendulkar

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeBatsmanSR(frames,names)

tkps-relSR-1

Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

Sangakkara leads followed by Ponting

frames <- list("./tendulkar.csv","./kallis.csv","ponting.csv","sangakkara.csv")
names <- list("Tendulkar","Kallis","Ponting","Sangakkara")
relativeRunsFreqPerf(frames,names)

tkps-relRunFreq-1

Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4. Clearly . Kallis and Sangakkara have a few more years of great batting ahead. They seem to average on 50. . Tendulkar and Ponting definitely show a slump in the later years

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkar.csv","Sachin Tendulkar")
batsmanMovingAverage("./kallis.csv","Jacques Kallis")
batsmanMovingAverage("./ponting.csv","Ricky Ponting")
batsmanMovingAverage("./sangakkara.csv","K Sangakkara")

tkps-ma-1

dev.off()
## null device 
##           1

Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

  • Tendulkar’s forecasted performance seems to tally with his actual performance with an average of 50
  • Kallis the forecasted runs are higher than the actual runs he scored
  • Ponting seems to have a good run in the future
  • Sangakkara has a decent run in the future averaging 50 runs
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kallis.csv","Jacques Kallis")
batsmanPerfForecast("./ponting.csv","Ricky Ponting")
batsmanPerfForecast("./sangakkara.csv","K Sangakkara")

tkps-perffcst-1

dev.off()
## null device 
##           1

Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## *******************************************************************************************
## 
## Population size: 294  Mean of population: 50.48 
## Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 
## 
## Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./kallis.csv","Jacques Kallis")
## *******************************************************************************************
## 
## Population size: 240  Mean of population: 47.5 
## Sample size: 27  Mean of sample: 47.11 SD of sample: 59.19 
## 
## Null hypothesis H0 : Jacques Kallis 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Jacques Kallis 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Jacques Kallis 's Form Status: In-Form because the p value: 0.48647  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./ponting.csv","Ricky Ponting")
## *******************************************************************************************
## 
## Population size: 251  Mean of population: 47.5 
## Sample size: 28  Mean of sample: 36.25 SD of sample: 48.11 
## 
## Null hypothesis H0 : Ricky Ponting 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Ricky Ponting 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Ricky Ponting 's Form Status: In-Form because the p value: 0.113115  is greater than alpha=  0.05"
## *******************************************************************************************
checkBatsmanInForm("./sangakkara.csv","K Sangakkara")
## *******************************************************************************************
## 
## Population size: 193  Mean of population: 51.92 
## Sample size: 22  Mean of sample: 71.73 SD of sample: 82.87 
## 
## Null hypothesis H0 : K Sangakkara 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : K Sangakkara 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "K Sangakkara 's Form Status: In-Form because the p value: 0.862862  is greater than alpha=  0.05"
## *******************************************************************************************

3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Tendulkar")
battingPerf3d("./kallis.csv","Kallis")
plot-3-1par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./ponting.csv","Ponting")
battingPerf3d("./sangakkara.csv","Sangakkara")
plot-4-1dev.off()
## null device 
##           1

Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. A sample sequence of Balls Faced(BF) and Minutes at crease (Mins) is setup as shown below. The fitted model is used to predict the runs for these values

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kallis <- batsmanRunsPredict("./kallis.csv","Kallis",newdataframe=newDF)
ponting <- batsmanRunsPredict("./ponting.csv","Ponting",newdataframe=newDF)
sangakkara <- batsmanRunsPredict("./sangakkara.csv","Sangakkara",newdataframe=newDF)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen Ponting has the will score the highest for a given Balls Faced and Minutes at crease.

Ponting is followed by Tendulkar who has Sangakkara close on his heels and finally we have Kallis. This is intuitive as we have already seen that Ponting has a highest strike rate.

batsmen <-cbind(round(tendulkar$Runs),round(kallis$Runs),round(ponting$Runs),round(sangakkara$Runs))
colnames(batsmen) <- c("Tendulkar","Kallis","Ponting","Sangakkara")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kallis Ponting Sangakkara
## 1          10           30         7      6       9          2
## 2          38           71        23     20      25         18
## 3          66          111        39     34      42         34
## 4          94          152        54     48      59         50
## 5         121          193        70     62      76         66
## 6         149          234        86     76      93         82
## 7         177          274       102     90     110         98
## 8         205          315       118    104     127        114
## 9         233          356       134    118     144        130
## 10        261          396       150    132     161        146
## 11        289          437       165    146     178        162
## 12        316          478       181    159     194        178
## 13        344          519       197    173     211        194
## 14        372          559       213    187     228        210
## 15        400          600       229    201     245        226

Checkout my book ‘Deep Learning from first principles Second Edition- In vectorized Python, R and Octave’.  My book is available on Amazon  as paperback ($18.99) and in kindle version($9.99/Rs449).

You may also like my companion book “Practical Machine Learning with R and Python:Second Edition- Machine Learning in stereo” available in Amazon in paperback($12.99) and Kindle($9.99/Rs449) versions.

Analysis of Top 3 wicket takers

The top 3 wicket takes in test history are
1. M Muralitharan:Wickets: 800, Average = 22.72, Economy Rate – 2.47
2. Shane Warne: Wickets: 708, Average = 25.41, Economy Rate – 2.65
3. Anil Kumble: Wickets: 619, Average = 29.65, Economy Rate – 2.69

How do Anil Kumble, Shane Warne and M Muralitharan compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

Wicket Frequency Plot

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./kumble.csv","Anil Kumble")
bowlerWktsFreqPercent("./warne.csv","Shane Warne")
bowlerWktsFreqPercent("./murali.csv","M Muralitharan")

relBowlFP-1

dev.off()
## null device 
##           1

Wickets Runs plot

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./kumble.csv","Kumble")
bowlerWktsRunsPlot("./warne.csv","Warne")
bowlerWktsRunsPlot("./murali.csv","Muralitharan")
wktsrun-1
dev.off()
## null device 
##           1

Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. Muralitharan has taken an average of 8 and 6 wickets at Oval & Wellington respectively in 2 different innings. His best performances are at Kandy and Colombo (SSC)

bowlerAvgWktsGround("./murali.csv","Muralitharan")
avgWktshrg-1

Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

bowlerAvgWktsOpposition("./murali.csv","Muralitharan")
avgWktoppn-1

Relative Wickets Frequency Percentage

The Relative Wickets Percentage plot shows that M Muralitharan has a large percentage of wickets in the 3-8 wicket range

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlingPerf(frames,names)

relBowlPerf-1

Relative Economy Rate against wickets taken

Clearly from the plot below it can be seen that Muralitharan has the best Economy Rate among the three

frames <- list("./kumble.csv","./murali.csv","warne.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")
relativeBowlingER(frames,names)

relBowlER-1

Wickets taken moving average

From th eplot below it can be see 1. Shane Warne’s performance at the time of his retirement was still at a peak of 3 wickets 2. M Muralitharan seems to have become ineffective over time with his peak years being 2004-2006 3. Anil Kumble also seems to slump down and become less effective.

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./kumble.csv","Anil Kumble")
bowlerMovingAverage("./warne.csv","Shane Warne")
bowlerMovingAverage("./murali.csv","M Muralitharan")

tkps-bowlma-1

dev.off()
## null device 
##           1

Future Wickets forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the wickets forecasted for the bowlers below. – Shane Warne and Muralitharan have a fairly consistent forecast – Kumble forecast shows a small dip

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./kumble.csv","Anil Kumble")
bowlerPerfForecast("./warne.csv","Shane Warne")
bowlerPerfForecast("./murali.csv","M Muralitharan")

kwm-perffcst-1

dev.off()
## null device 
##           1

Contribution to matches won and lost

The plot below is extremely interesting
1. Kumble wickets range from 2 to 4 wickets in matches wons with a mean of 3
2. Warne wickets in won matches range from 1 to 4 with more matches won. Clearly there are other bowlers contributing to the wins, possibly the pacers
3. Muralitharan wickets range in winning matches is more than the other 2 and ranges ranges 3 to 5 and clearly had a hand (pun unintended) in Sri Lanka’s wins

As discussed above the next 2 charts require the use of getPlayerDataSp()

kumblesp 
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerContributionWonLost("kumblesp.csv","Kumble")
bowlerContributionWonLost("warnesp.csv","Warne")
bowlerContributionWonLost("muralisp.csv","Murali")

kwm-wl-1

dev.off()
## null device 
##           1

Performance home and overseas

From the plot below it can be seen that Kumble & Warne have played more matches overseas than Muralitharan. Both Kumble and Warne show an average of 2 wickers overseas,  Murali on the other hand has an average of 2.5 wickets overseas but a slightly less number of matches than Kumble & Warne

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerPerfHomeAway("kumblesp.csv","Kumble")
bowlerPerfHomeAway("warnesp.csv","Warne")
bowlerPerfHomeAway("muralisp.csv","Murali")

kwm-ha-1
dev.off()
## null device 
##           1
 

Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

Note: The check for the form status of the bowlers indicate 1. That both Kumble and Muralitharan were out of form. This also shows in the moving average plot 2. Warne is still in great form and could have continued for a few more years. Too bad we didn’t see the magic later

checkBowlerInForm("./kumble.csv","Anil Kumble")
## *******************************************************************************************
## 
## Population size: 212  Mean of population: 2.69 
## Sample size: 24  Mean of sample: 2.04 SD of sample: 1.55 
## 
## Null hypothesis H0 : Anil Kumble 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Anil Kumble 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Anil Kumble 's Form Status: Out-of-Form because the p value: 0.02549  is less than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./warne.csv","Shane Warne")
## *******************************************************************************************
## 
## Population size: 240  Mean of population: 2.55 
## Sample size: 27  Mean of sample: 2.56 SD of sample: 1.8 
## 
## Null hypothesis H0 : Shane Warne 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : Shane Warne 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "Shane Warne 's Form Status: In-Form because the p value: 0.511409  is greater than alpha=  0.05"
## *******************************************************************************************
checkBowlerInForm("./murali.csv","M Muralitharan")
## *******************************************************************************************
## 
## Population size: 207  Mean of population: 3.55 
## Sample size: 23  Mean of sample: 2.87 SD of sample: 1.74 
## 
## Null hypothesis H0 : M Muralitharan 's sample average is within 95% confidence interval 
##         of population average
## Alternative hypothesis Ha : M Muralitharan 's sample average is below the 95% confidence
##         interval of population average
## 
## [1] "M Muralitharan 's Form Status: Out-of-Form because the p value: 0.036828  is less than alpha=  0.05"
## *******************************************************************************************
dev.off()
## null device 
##           1

Key Findings

The plots above capture some of the capabilities and features of my cricketr package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.
Here are the main findings from the analysis above

Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing

  1. Sangakkara has the highest average, followed by Tendulkar, Kallis and then Ponting.
  2. Ponting has the highest strike rate followed by Tendulkar,Sangakkara and then Kallis
  3. The predicted runs for a given Balls faced and Minutes at crease is highest for Ponting, followed by Tendulkar, Sangakkara and Kallis
  4. The moving average for Tendulkar and Ponting shows a downward trend while Kallis and Sangakkara retired too soon
  5. Tendulkar was out of form about the time of retirement while the rest were in-form. But this result has to be taken along with the moving average plot. Ponting was clearly on the way out.
  6. The home and overseas performance indicate that Tendulkar is the clear leader. He has the highest number of matches played overseas and his performance has been consistent. He is followed by Ponting, Kallis and finally Sangakkara

Analysis of Top 3 legs spinners

The analysis of Anil Kumble, Shane Warne and M Muralitharan show the following

  1. Muralitharan has the highest wickets and best economy rate followed by Warne and Kumble
  2. Muralitharan has higher wickets frequency percentage between 3 to 8 wickets
  3. Muralitharan has the best Economy Rate for wickets between 2 to 7
  4. The moving average plot shows that the time was up for Kumble and Muralitharan but Warne had a few years ahead
  5. The check for form status shows that Muralitharan and Kumble time was over while Warne still in great form
  6. Kumble’s has more matches abroad than the other 2, yet Kumble averages of 3 wickets at home and 2 wickets overseas liek Warne . Murali has played few matches but has an average of 4 wickets at home and 3 wickets overseas.

Final thoughts

Here are my final thoughts

Batting

Among the 4 batsman Tendulkar, Kallis, Ponting and Sangakkara the clear leader is Tendulkar for the following reasons

  1. Tendulkar has the highest test centuries and runs of all time.Tendulkar’s average is 2nd to Sangakkara, Tendulkar’s predicted runs for a given Balls faced and Minutes at Crease is 2nd and is behind Ponting. Also Tendulkar’s performance at home and overseas are consistent throughtout despite the fact that he has a highest number of overseas matches
  2. Ponting takes the 2nd spot with the 2nd highest number of centuries, 1st in Strike Rate and 2nd in home and away performance.
  3. The 3rd spot goes to Sangakkara, with the highest average, 3rd highest number of centuries, reasonable run frequency percentage in different run ranges. However he has a fewer number of matches overseas and his performance overseas is significantly lower than at home
  4. Kallis has the 2nd highest number of centuries but his performance overseas and strike rate are behind others
  5. Finally Kallis and Sangakkara had a few good years of batting still left in them (pity they retired!) while Tendulkar and Ponting’s time was up

Bowling

Muralitharan leads the way followed closely by Warne and finally Kumble. The reasons are

  1. Muralitharan has the highest number of test wickets with the best Wickets percentage and the best Economy Rate. Murali on average gas taken 4 wickets at home and 3 wickets overseas
  2. Warne follows Murali in the highest wickets taken, however Warne has less matches overseas than Murali and average 3 wickets home and 2 wickets overseas
  3. Kumble has the 3rd highest wickets, with 3 wickets on an average at home and 2 wickets overseas. However Kumble has played more matches overseas than the other two. In that respect his performance is great. Also Kumble has played less matches at home otherwise his numbers would have looked even better.
  4. Also while Kumble and Muralitharan’s career was on the decline , Warne was going great and had a couple of years ahead.

You can download this analysis at Introducing cricketr

Hope you have fun using the cricketr package as I had in developing it. Do take a look at  my follow up post Taking cricketr for a spin – Part 1

Do take a look at my 2nd package “The making of cricket package  yorkr – Part 1

Also see
1. My book “Deep Learning from first principles” now on Amazon
2. My book ‘Practical Machine Learning with R and Python’ on Amazon
3. Taking cricketr for a spin – Part 1
4. cricketr plays the ODIs
5. cricketr adapts to the Twenty20 International
6. Analyzing cricket’s batting legends – Through the mirage with R
7. Masters of spin: Unraveling the web with R
8. Mirror,mirror …best batsman of them all

You may also like
1. A crime map of India in R: Crimes against women
2.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
3.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
5. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
6. Deblurring with OpenCV:Weiner filter reloaded
7. Fun simulation of a Chain in Androidhttp://www.r-bloggers.com/introducing-cricketr-an-r-package-to-analyze-performances-of-cricketers/