Pitching yorkpy … short of good length to IPL – Part 1


I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.
Bruce Lee

I’ve missed more than 9000 shots in my career. I’ve lost almost 300 games. 26 times, I’ve been trusted to take the game winning shot and missed. I’ve failed over and over and over again in my life. And that is why I succeed.
Michael Jordan

Man, it doesn’t matter where you come in to bat, the score is still zero
Viv Richards

Introduction

“If cricketr is to cricpy, then yorkr is to _____?”. Yes, you guessed it right, it is yorkpy. In this post, I introduce my 2nd python package, yorkpy, which is a python clone of my R package yorkr. This package is based on data from Cricsheet. yorkpy currently handles IPL T20 matches.

When I created cricpy, the python avatar, of my R package cricketr, see Introducing cricpy:A python package to analyze performances of cricketers, I had decided that I should avoid doing a python avatar of my R package yorkr (see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!) , as it was more involved, and required the parsing of match data available as yaml files.

Just out of curiosity, I tried the python package ‘yaml’ to read the match data, and lo and behold, I was sucked into the developing the package and so, yorkpy was born. Of course, it goes without saying that, usually when I am in the thick of developing something, I occasionally wonder, why I am doing it, for whom and for what purpose? Maybe it is the joy of ideation, the problem-solving,  the programmer’s high, for sharing my ideas etc. Anyway, whatever be the reason, I hope you enjoy this post and also find yorkpy useful.

You can clone/download the code at Github yorkpy
This post has been published to RPubs at yorkpy-Part1
You can download this post as PDF at IPLT20-yorkpy-part1

The IPL T20 functions in yorkpy are

2. Install the package using ‘pip install’

import pandas as pd
import yorkpy.analytics as yka
#pip install yorkpy

3. Load a yaml file from Cricsheet

There are 2 functions that can be to convert the IPL Twenty20 yaml files to pandas dataframeare

  1. convertYaml2PandasDataframeT20
  2. convertAllYaml2PandasDataframesT20

Note 1: While I have already converted the IPL T20 files, you will need to use these functions for future IPL matches

4. Convert and save IPL T20 yaml file to pandas dataframe

This function will convert a IPL T20 IPL yaml file, in the format as specified in Cricsheet to pandas dataframe. This will be saved as as CSV file in the target directory. The name of the file wil have the following format team1-team2-date.csv. The IPL T20 zip file can be downloaded from Indian Premier League matches.  An example of how a yaml file can be converted to a dataframe and saved is shown below.

import pandas as pd
import yorkpy.analytics as yka
#convertYaml2PandasDataframe(".\\1082593.yaml","..\ipl", ..\\data")

5. Convert and save all IPL T20 yaml files to dataframes

This function will convert all IPL T20 yaml files from a source directory to dataframes, and save it in the target directory, with the names as mentioned above. Since I have already done this, I will not be executing this again. You can download the zip of all the converted RData files from Github at yorkpyData

import pandas as pd
import yorkpy.analytics as yka
#convertAllYaml2PandasDataframes("..\\ipl", "..\\data")

You can download the the zip of the files and use it directly in the functions as follows.For the analysis below I chosen a set of random IPL matches

The randomly selected IPL T20 matches are

  • Chennai Super Kings vs Kings Xi Punjab, 2014-05-30
  • Deccan Chargers vs Delhi Daredevils, 2012-05-10
  • Gujarat Lions vs Mumbai Indians, 2017-04-29
  • Kolkata Knight Riders vs Rajasthan Royals, 2010-04-17
  • Rising Pune Supergiants vs Royal Challengers Bangalore, 2017-04-29

6. Team batting scorecard

The function below computes the batting score card of a team in an IPL match. The scorecard gives the balls faced, the runs scored, 4s, 6s and strike rate. The example below is based on the CSK KXIP match on 30 May 2014.

You can check against the actual scores in this match Chennai Super Kings-Kings XI Punjab-2014-05-30

import pandas as pd
import yorkpy.analytics as yka
csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv")
scorecard,extras=yka.teamBattingScorecardMatch(csk_kxip,"Chennai Super Kings")
print(scorecard)
##         batsman  runs  balls  4s  6s          SR
## 0      DR Smith     7     12   0   0   58.333333
## 1  F du Plessis     0      1   0   0    0.000000
## 2      SK Raina    87     26  12   0  334.615385
## 3   BB McCullum    11     16   0   0   68.750000
## 4     RA Jadeja    27     22   2   0  122.727273
## 5     DJ Hussey     1      3   0   0   33.333333
## 6      MS Dhoni    42     34   3   0  123.529412
## 7      R Ashwin    10     11   0   0   90.909091
## 8     MM Sharma     1      3   0   0   33.333333
print(extras)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    428     14        3        5     5        0      27
print("\n\n")
scorecard1,extras1=yka.teamBattingScorecardMatch(csk_kxip,"Kings XI Punjab")
print(scorecard1)
##       batsman  runs  balls  4s  6s          SR
## 0    V Sehwag   122     62  12   0  196.774194
## 1     M Vohra    34     33   1   0  103.030303
## 2  GJ Maxwell    13      8   1   0  162.500000
## 3   DA Miller    38     19   5   0  200.000000
## 4   GJ Bailey     1      2   0   0   50.000000
## 5     WP Saha     6      4   0   0  150.000000
## 6  MG Johnson     1      1   0   0  100.000000
print(extras1)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    428     14        3        5     5        0      27

Let’s take another random match between Gujarat Lions and Mumbai Indian on 29 Apr 2017 Gujarat Lions-Mumbai Indians-2017-04-29

import pandas as pd
gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv")
import yorkpy.analytics as yka
scorecard,extras=yka.teamBattingScorecardMatch(gl_mi,"Gujarat Lions")
print(scorecard)
##          batsman  runs  balls  4s  6s          SR
## 0   Ishan Kishan    48     38   6   0  126.315789
## 1    BB McCullum     6      4   1   0  150.000000
## 2       SK Raina     1      3   0   0   33.333333
## 3       AJ Finch     0      3   0   0    0.000000
## 4     KD Karthik     2      9   0   0   22.222222
## 5      RA Jadeja    28     22   2   0  127.272727
## 6    JP Faulkner    21     29   2   0   72.413793
## 7      IK Pathan     2      3   0   0   66.666667
## 8         AJ Tye    25     12   2   0  208.333333
## 9   Basil Thampi     2      4   0   0   50.000000
## 10    Ankit Soni     7      2   0   0  350.000000
print(extras)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    306      8        3        1     0        0      12
print("\n\n")
scorecard1,extras1=yka.teamBattingScorecardMatch(gl_mi,"Mumbai Indians")
print(scorecard1)
##             batsman  runs  balls  4s  6s          SR
## 0          PA Patel    70     45   9   0  155.555556
## 1        JC Buttler     9      7   2   0  128.571429
## 2            N Rana    19     16   1   0  118.750000
## 3         RG Sharma     5     13   0   0   38.461538
## 4        KA Pollard    15     11   2   0  136.363636
## 5         KH Pandya    29     20   2   0  145.000000
## 6         HH Pandya     4      5   0   0   80.000000
## 7   Harbhajan Singh     0      1   0   0    0.000000
## 8    MJ McClenaghan     1      1   0   0  100.000000
## 9         JJ Bumrah     0      1   0   0    0.000000
## 10       SL Malinga     0      1   0   0    0.000000
print(extras1)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    306      8        3        1     0        0      12

7. Plot the team batting partnerships

The functions below plot the team batting partnership in the match. It shows what the partnership were in the mtach

Note: Many of the plots include an additional parameters plot which is either True or False. The default value is plot=True. When plot=True the plot will be displayed. When plot=False the data frame will be returned to the user. The user can use this to create an interactive chart using one of the packages like rcharts, ggvis,googleVis or plotly.

import pandas as pd
import yorkpy.analytics as yka
dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv")
yka.teamBatsmenPartnershipMatch(dc_dd,'Deccan Chargers','Delhi Daredevils')

yka.teamBatsmenPartnershipMatch(dc_dd,'Delhi Daredevils','Deccan Chargers',plot=True)
# Print partnerships as a dataframe

rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv")
m=yka.teamBatsmenPartnershipMatch(rps_rcb,'Royal Challengers Bangalore','Rising Pune Supergiant',plot=False)
print(m)
##            batsman     non_striker  runs
## 0   AB de Villiers         V Kohli     3
## 1         AF Milne         V Kohli     5
## 2        KM Jadhav         V Kohli     7
## 3           P Negi         V Kohli     3
## 4        S Aravind         V Kohli     0
## 5        S Aravind       YS Chahal     8
## 6         S Badree         V Kohli     2
## 7        STR Binny         V Kohli     1
## 8      Sachin Baby         V Kohli     2
## 9          TM Head         V Kohli     2
## 10         V Kohli  AB de Villiers    17
## 11         V Kohli        AF Milne     5
## 12         V Kohli       KM Jadhav     4
## 13         V Kohli          P Negi     9
## 14         V Kohli       S Aravind     2
## 15         V Kohli        S Badree     8
## 16         V Kohli     Sachin Baby     1
## 17         V Kohli         TM Head     9
## 18       YS Chahal       S Aravind     4

8. Batsmen vs Bowler

The function below computes and plots the performances of the batsmen vs the bowlers. As before the plot parameter can be set to True or False. By default it is plot=True

import pandas as pd
import yorkpy.analytics as yka
gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv")
yka.teamBatsmenVsBowlersMatch(gl_mi,"Gujarat Lions","Mumbai Indians", plot=True)
# Print 

csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv")
m=yka.teamBatsmenVsBowlersMatch(csk_kxip,'Chennai Super Kings','Kings XI Punjab',plot=False)
print(m)
##          batsman           bowler  runs
## 0    BB McCullum         AR Patel     4
## 1    BB McCullum       GJ Maxwell     1
## 2    BB McCullum  Karanveer Singh     6
## 3      DJ Hussey          P Awana     1
## 4       DR Smith       MG Johnson     7
## 5       DR Smith          P Awana     0
## 6       DR Smith   Sandeep Sharma     0
## 7   F du Plessis       MG Johnson     0
## 8      MM Sharma         AR Patel     0
## 9      MM Sharma       MG Johnson     0
## 10     MM Sharma          P Awana     1
## 11      MS Dhoni         AR Patel    12
## 12      MS Dhoni  Karanveer Singh     2
## 13      MS Dhoni       MG Johnson    11
## 14      MS Dhoni          P Awana    15
## 15      MS Dhoni   Sandeep Sharma     2
## 16      R Ashwin         AR Patel     1
## 17      R Ashwin  Karanveer Singh     4
## 18      R Ashwin       MG Johnson     1
## 19      R Ashwin          P Awana     1
## 20      R Ashwin   Sandeep Sharma     3
## 21     RA Jadeja         AR Patel     5
## 22     RA Jadeja       GJ Maxwell     3
## 23     RA Jadeja  Karanveer Singh    19
## 24     RA Jadeja          P Awana     0
## 25      SK Raina       MG Johnson    21
## 26      SK Raina          P Awana    40
## 27      SK Raina   Sandeep Sharma    26

9. Bowling Scorecard

This function provides the bowling performance, the number of overs bowled, maidens, runs conceded. wickets taken and economy rate for the IPL match

import pandas as pd
import yorkpy.analytics as yka
dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv")
a=yka.teamBowlingScorecardMatch(dc_dd,'Deccan Chargers')
print(a)
##        bowler  overs  runs  maidens  wicket  econrate
## 0  AD Russell      4    39        0       0      9.75
## 1   IK Pathan      4    46        0       1     11.50
## 2    M Morkel      4    32        0       1      8.00
## 3    S Nadeem      4    39        0       0      9.75
## 4    VR Aaron      4    30        0       2      7.50
rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv")
b=yka.teamBowlingScorecardMatch(rps_rcb,'Royal Challengers Bangalore')
print(b)
##               bowler  overs  runs  maidens  wicket  econrate
## 0          DL Chahar      2    18        0       0      9.00
## 1       DT Christian      4    25        0       1      6.25
## 2        Imran Tahir      4    18        0       3      4.50
## 3         JD Unadkat      4    19        0       1      4.75
## 4        LH Ferguson      4     7        1       3      1.75
## 5  Washington Sundar      2     7        0       1      3.50

10. Wicket Kind

The plots below provide the kind of wicket taken by the bowler (caught, bowled, lbw etc.) for the IPL match

import pandas as pd
import yorkpy.analytics as yka
kkr_rr=pd.read_csv(".\\Kolkata Knight Riders-Rajasthan Royals-2010-04-17.csv")
yka.teamBowlingWicketKindMatch(kkr_rr,'Kolkata Knight Riders','Rajasthan Royals')

csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv")
m = yka.teamBowlingWicketKindMatch(csk_kxip,'Chennai Super Kings','Kings-Kings XI Punjab',plot=False)
print(m)
##             bowler     kind  player_out
## 0         AR Patel  run out           1
## 1         AR Patel  stumped           1
## 2  Karanveer Singh  run out           1
## 3       MG Johnson   caught           1
## 4          P Awana   caught           2
## 5   Sandeep Sharma   bowled           1

11. Wicket vs Runs conceded

The plots below provide the wickets taken and the runs conceded by the bowler in the IPL T20 match

import pandas as pd
import yorkpy.analytics as yka
dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv")
yka.teamBowlingWicketMatch(dc_dd,"Deccan Chargers", "Delhi Daredevils",plot=True)

print("\n\n")
rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv")
a=yka.teamBowlingWicketMatch(rps_rcb,"Royal Challengers Bangalore", "Rising Pune Supergiant",plot=False)
print(a)
##               bowler      player_out  kind
## 0       DT Christian         V Kohli     1
## 1        Imran Tahir        AF Milne     1
## 2        Imran Tahir          P Negi     1
## 3        Imran Tahir        S Badree     1
## 4         JD Unadkat         TM Head     1
## 5        LH Ferguson  AB de Villiers     1
## 6        LH Ferguson       KM Jadhav     1
## 7        LH Ferguson       STR Binny     1
## 8  Washington Sundar     Sachin Baby     1

12. Bowler Vs Batsmen

The functions compute and display how the different bowlers of the IPL team performed against the batting opposition.

import pandas as pd
import yorkpy.analytics as yka
csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv")
yka.teamBowlersVsBatsmenMatch(csk_kxip,"Chennai Super Kings","Kings XI Punjab")

print("\n\n")
kkr_rr=pd.read_csv(".\\Kolkata Knight Riders-Rajasthan Royals-2010-04-17.csv")
m =yka.teamBowlersVsBatsmenMatch(kkr_rr,"Rajasthan Royals","Kolkata Knight Riders",plot=False)
print(m)
##        batsman      bowler  runs
## 0     AC Voges    AB Dinda     1
## 1     AC Voges  JD Unadkat     1
## 2     AC Voges   LR Shukla     1
## 3     AC Voges    M Kartik     5
## 4     AJ Finch    AB Dinda     3
## 5     AJ Finch  JD Unadkat     3
## 6     AJ Finch   LR Shukla    13
## 7     AJ Finch    M Kartik     2
## 8     AJ Finch     SE Bond     0
## 9      AS Raut    AB Dinda     1
## 10     AS Raut  JD Unadkat     1
## 11    FY Fazal    AB Dinda     1
## 12    FY Fazal   LR Shukla     3
## 13    FY Fazal    M Kartik     3
## 14    FY Fazal     SE Bond     6
## 15     NV Ojha    AB Dinda    10
## 16     NV Ojha  JD Unadkat     5
## 17     NV Ojha   LR Shukla     0
## 18     NV Ojha    M Kartik     1
## 19     NV Ojha     SE Bond     2
## 20     P Dogra  JD Unadkat     2
## 21     P Dogra   LR Shukla     5
## 22     P Dogra    M Kartik     1
## 23     P Dogra     SE Bond     0
## 24  SK Trivedi    AB Dinda     4
## 25    SK Warne    AB Dinda     2
## 26    SK Warne    M Kartik     1
## 27    SK Warne     SE Bond     0
## 28   SR Watson    AB Dinda     2
## 29   SR Watson  JD Unadkat    13
## 30   SR Watson   LR Shukla     1
## 31   SR Watson    M Kartik    18
## 32   SR Watson     SE Bond    10
## 33   YK Pathan  JD Unadkat     1
## 34   YK Pathan   LR Shukla     7

13. Match worm chart

The plots below provide the match worm graph for the IPL Twenty 20 matches

import pandas as pd
import yorkpy.analytics as yka
dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv")
yka.matchWormChart(dc_dd,"Deccan Chargers", "Delhi Daredevils")

gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv")
yka.matchWormChart(gl_mi,"Mumbai Indians","Gujarat Lions")

Feel free to clone/download the code from Github yorkpy

Conclusion

This post included all functions between 2 IPL teams from the package yorkpy for IPL Twenty20 matches. As mentioned above the yaml match files have been already converted to dataframes and are available for download from Github at yorkpyData

After having used Python and R for analytics, Machine Learning and Deep Learning, I have now realized that neither language is superior or inferior. Both have, some good packages and some that are not so well suited.

To be continued. Watch this space!

You may also like
1.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2.My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon
2. Cricpy takes a swing at the ODIs
3. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!
4. Big Data-1: Move into the big league:Graduate from Python to Pyspark
5. Simulating an Edge Shape in Android

To see all posts click Index of posts

Analyzing batsmen and bowlers with cricpy template


Introduction

This post shows how you can analyze batsmen and bowlers of Test, ODI and T20s using cricpy templates, using data from ESPN Cricinfo.

The cricpy package

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli  etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below

1. For Test players use batting and bowling.
2. For ODI use batting and bowling
3. For T20 use T20 Batting T20 Bowling

Please mindful of the  ESPN Cricinfo Terms of Use

My posts on Cripy were
a. Introducing cricpy:A python package to analyze performances of cricketers
b. Cricpy takes a swing at the ODIs
c. Cricpy takes guard for the Twenty20s

You can clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template. You can uncomment the functions and use them.

The cricpy package is now available with pip install cricpy!!!

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
##   from pandas.core import datetools

2. Invoking functions with Python package cricpy

import cricpy.analytics as ca 
#ca.batsman4s("aplayer.csv","A Player")

3. Getting help from cricpy – Python

import cricpy.analytics as ca
#help(ca.getPlayerData)

The details below will introduce the different functions that are available in cricpy.

4. Get the player data for a player using the function getPlayerData()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

4a. For Test players

import cricpy.analytics as ca
#player1 =ca.getPlayerData(profileNo1,dir="..",file="player1.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#player1 =ca.getPlayerData(profileNo2,dir="..",file="player2.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])

4b. For ODI players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataOD(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataOD(profileNo2,dir="..",file="player2.csv",type="batting"")

4c For T20 players

import cricpy.analytics as ca
#player1 =ca.getPlayerDataTT(profileNo1,dir="..",file="player1.csv",type="batting")
#player1 =ca.getPlayerDataTT(profileNo2,dir="..",file="player2.csv",type="batting"")

5 A Player’s performance – Basic Analyses

The 3 plots below provide the following for Rahul Dravid

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
#ca.batsmanRunsFreqPerf("aplayer.csv","A Player")
#ca.batsmanMeanStrikeRate("aplayer.csv","A Player")
#ca.batsmanRunsRanges("aplayer.csv","A Player") 

6. More analyses

This gives details on the batsmen’s 4s, 6s and dismissals

import cricpy.analytics as ca
#ca.batsman4s("aplayer.csv","A Player")
#ca.batsman6s("aplayer.csv","A Player") 
#ca.batsmanDismissals("aplayer.csv","A Player")
# The below function is for ODI and T20 only
#ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")  

7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer.csv","A Player")

8. Average runs at different venues

The plot below gives the average runs scored at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
#ca.batsmanAvgRunsGround("aplayer.csv","A Player")

9. Average runs against different opposing teams

This plot computes the average runs scored against different countries.

import cricpy.analytics as ca
#ca.batsmanAvgRunsOpposition("aplayer.csv","A Player")

10. Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman.

import cricpy.analytics as ca
#ca.batsmanRunsLikelihood("aplayer.csv","A Player")

11. A look at the Top 4 batsman

Choose any number of players

1.Player1 2.Player2 3.Player3 …

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
#ca.batsmanPerfBoxHist("aplayer001.csv","A Player001")
#ca.batsmanPerfBoxHist("aplayer002.csv","A Player002")
#ca.batsmanPerfBoxHist("aplayer003.csv","A Player003")
#ca.batsmanPerfBoxHist("aplayer004.csv","A Player004")

13. Get Player Data special

import cricpy.analytics as ca
#player1sp = ca.getPlayerDataSp(profile1,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp = ca.getPlayerDataSp(profile2,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp = ca.getPlayerDataSp(profile3,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp = ca.getPlayerDataSp(profile4,tdir=".",tfile="player4sp.csv",ttype="batting")

14. Contribution to won and lost matches

Note:This can only be used for Test matches

import cricpy.analytics as ca
#ca.batsmanContributionWonLost("player1sp.csv","A Player001")
#ca.batsmanContributionWonLost("player2sp.csv","A Player002")
#ca.batsmanContributionWonLost("player3sp.csv","A Player003")
#ca.batsmanContributionWonLost("player4sp.csv","A Player004")

15. Performance at home and overseas

Note:This can only be used for Test matches This function also requires the use of getPlayerDataSp() as shown above

import cricpy.analytics as ca
#ca.batsmanPerfHomeAway("player1sp.csv","A Player001")
#ca.batsmanPerfHomeAway("player2sp.csv","A Player002")
#ca.batsmanPerfHomeAway("player3sp.csv","A Player003")
#ca.batsmanPerfHomeAway("player4sp.csv","A Player004")

16 Moving Average of runs in career

import cricpy.analytics as ca
#ca.batsmanMovingAverage("aplayer001.csv","A Player001")
#ca.batsmanMovingAverage("aplayer002.csv","A Player002")
#ca.batsmanMovingAverage("aplayer003.csv","A Player003")
#ca.batsmanMovingAverage("aplayer004.csv","A Player004")

17 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career.

import cricpy.analytics as ca
#ca.batsmanCumulativeAverageRuns("aplayer001.csv","A Player001")
#ca.batsmanCumulativeAverageRuns("aplayer002.csv","A Player002")
#ca.batsmanCumulativeAverageRuns("aplayer003.csv","A Player003")
#ca.batsmanCumulativeAverageRuns("aplayer004.csv","A Player004")

18 Cumulative Average strike rate of batsman in career

.

import cricpy.analytics as ca
#ca.batsmanCumulativeStrikeRate("aplayer001.csv","A Player001")
#ca.batsmanCumulativeStrikeRate("aplayer002.csv","A Player002")
#ca.batsmanCumulativeStrikeRate("aplayer003.csv","A Player003")
#ca.batsmanCumulativeStrikeRate("aplayer004.csv","A Player004")

19 Future Runs forecast

import cricpy.analytics as ca
#ca.batsmanPerfForecast("aplayer001.csv","A Player001")

20 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them.

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeAvgRuns(frames,names)

21 Plot of 4s and 6s

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.batsman4s6s(frames,names)

22. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

import cricpy.analytics as ca
frames = ["aplayer1.csv","aplayer2.csv","aplayer3.csv","aplayer4.csv"]
names = ["A Player1","A Player2","A Player3","A Player4"]
#ca.relativeBatsmanCumulativeStrikeRate(frames,names)

23. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

import cricpy.analytics as ca
#ca.battingPerf3d("aplayer001.csv","A Player001")
#ca.battingPerf3d("aplayer002.csv","A Player002")
#ca.battingPerf3d("aplayer003.csv","A Player003")
#ca.battingPerf3d("aplayer004.csv","A Player004")

24. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
#aplayer = ca.batsmanRunsPredict("aplayer.csv",newDF,"A Player")
#print(aplayer)

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

25 Analysis of Top 3 wicket takers

Take any number of bowlers from either Test, ODI or T20

  1. Bowler1
  2. Bowler2
  3. Bowler3 …

26. Get the bowler’s data (Test)

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#abowler1 =ca.getPlayerData(profileNo1,dir=".",file="abowler1.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler2 =ca.getPlayerData(profileNo2,dir=".",file="abowler2.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#abowler3 =ca.getPlayerData(profile3,dir=".",file="abowler3.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])

26b For ODI bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataOD(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataOD(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataOD(profile3,dir=".",file="abowler3.csv",type="bowling")

26c For T20 bowlers

import cricpy.analytics as ca
#abowler1 =ca.getPlayerDataTT(profileNo1,dir=".",file="abowler1.csv",type="bowling")
#abowler2 =ca.getPlayerDataTT(profileNo2,dir=".",file="abowler2.csv",type="bowling")
#abowler3 =ca.getPlayerDataTT(profile3,dir=".",file="abowler3.csv",type="bowling")

27. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
#ca.bowlerWktsFreqPercent("abowler1.csv","A Bowler1")
#ca.bowlerWktsFreqPercent("abowler2.csv","A Bowler2")
#ca.bowlerWktsFreqPercent("abowler3.csv","A Bowler3")

28. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken

import cricpy.analytics as ca
#ca.bowlerWktsRunsPlot("abowler1.csv","A Bowler1")
#ca.bowlerWktsRunsPlot("abowler2.csv","A Bowler2")
#ca.bowlerWktsRunsPlot("abowler3.csv","A Bowler3")

29 Average wickets at different venues

The plot gives the average wickets taken bat different venues.

import cricpy.analytics as ca
#ca.bowlerAvgWktsGround("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsGround("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsGround("abowler3.csv","A Bowler3")

30 Average wickets against different opposition

The plot gives the average wickets taken against different countries.

import cricpy.analytics as ca
#ca.bowlerAvgWktsOpposition("abowler1.csv","A Bowler1")
#ca.bowlerAvgWktsOpposition("abowler2.csv","A Bowler2")
#ca.bowlerAvgWktsOpposition("abowler3.csv","A Bowler3")

31 Wickets taken moving average

import cricpy.analytics as ca
#ca.bowlerMovingAverage("abowler1.csv","A Bowler1")
#ca.bowlerMovingAverage("abowler2.csv","A Bowler2")
#ca.bowlerMovingAverage("abowler3.csv","A Bowler3")

32 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgWickets("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgWickets("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgWickets("abowler3.csv","A Bowler3")

33 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers.

import cricpy.analytics as ca
#ca.bowlerCumulativeAvgEconRate("abowler1.csv","A Bowler1")
#ca.bowlerCumulativeAvgEconRate("abowler2.csv","A Bowler2")
#ca.bowlerCumulativeAvgEconRate("abowler3.csv","A Bowler3")

34 Future Wickets forecast

import cricpy.analytics as ca
#ca.bowlerPerfForecast("abowler1.csv","A bowler1")

35 Get player data special

import cricpy.analytics as ca
#abowler1sp =ca.getPlayerDataSp(profile1,tdir=".",tfile="abowler1sp.csv",ttype="bowling")
#abowler2sp =ca.getPlayerDataSp(profile2,tdir=".",tfile="abowler2sp.csv",ttype="bowling")
#abowler3sp =ca.getPlayerDataSp(profile3,tdir=".",tfile="abowler3sp.csv",ttype="bowling")

36 Contribution to matches won and lost

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerContributionWonLost("abowler1sp.csv","A Bowler1")
#ca.bowlerContributionWonLost("abowler2sp.csv","A Bowler2")
#ca.bowlerContributionWonLost("abowler3sp.csv","A Bowler3")

37 Performance home and overseas

Note:This can be done only for Test cricketers

import cricpy.analytics as ca
#ca.bowlerPerfHomeAway("abowler1sp.csv","A Bowler1")
#ca.bowlerPerfHomeAway("abowler2sp.csv","A Bowler2")
#ca.bowlerPerfHomeAway("abowler3sp.csv","A Bowler3")

38 Relative cumulative average economy rate of bowlers

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgEconRate(frames,names)

39 Relative Economy Rate against wickets taken

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlingER(frames,names)

40 Relative cumulative average wickets of bowlers in career

import cricpy.analytics as ca
frames = ["abowler1.csv","abowler2.csv","abowler3.csv"]
names = ["A Bowler1","A Bowler2","A Bowler3"]
#ca.relativeBowlerCumulativeAvgWickets(frames,names)

Clone/download this cricpy template for your own analysis of players. This can be done using RStudio or IPython notebooks from Github at cricpy-template

Key Findings

Analysis of Top 4 batsman

Analysis of Top 3 bowlers

You may also like
1. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2. Presentation on ‘Evolution to LTE’
3. Stacks of protocol stacks – A primer
4. Taking baby steps in Lisp
5. Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

To see all posts click Index of posts

Cricpy takes guard for the Twenty20s


There are two ways to write error-free programs; only the third one works.”” Alan J. Perlis

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning. ” Rick Cook

My software never has bugs. It just develops random features.” Anon

If you make an ass out of yourself, there will always be someone to ride you.” Bruce Lee

Introduction

This is the 3rd and final post on cricpy, and is a continuation to my 2 earlier posts

1. Introducing cricpy:A python package to analyze performances of cricketers
2.Cricpy takes a swing at the ODIs

Cricpy, is the python avatar of my R package ‘cricketr’. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers

With this post  cricpy, like cricketr, now becomes omnipotent, and is now capable of handling Test, ODI and T20 matches.

Cricpy uses the statistics info available in ESPN Cricinfo Statsguru.

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

This post is also hosted on Rpubs at Cricpy takes guard for the Twenty 20s. You can also download the pdf version of this post at cricpy-TT.pdf

You can fork/clone the package at Github cricpy

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

The cricpy package

The data for a particular player in Twenty20s can be obtained with the getPlayerDataTT() function. To do this you will need to go to T20 Batting and T20 Bowling and click the player you are interested in This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence,this can be used to get the data for Virat Kohlias shown below

The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the languages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the package at Github cricpy

Note: The charts are self-explanatory and I have not added much of my own interpretation to it. Do look at the plots closely and check out the performances for yourself.

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 

2. Invoking functions with Python package cricpy

import cricpy.analytics as ca 
ca.batsman4s("./kohli.csv","Virat Kohli")

3. Getting help from cricpy – Python

import cricpy.analytics as ca 
help(ca.getPlayerDataTT)
## Help on function getPlayerDataTT in module cricpy.analytics:
## 
## getPlayerDataTT(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
##     Get the Twenty20 International player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory~
##     
##     Description
##     
##     Get the Twenty20 player data given the profile of the batsman/bowler. The allowed inputs are home,away, neutralboth and won,lost,tied or no result of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerDataTT(profile, opposition="",host="",dir = "./data", file = "player001.csv", 
##     type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virat Kohli this turns out to be 253802 http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
##     opposition  
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Afghanistan:40,Australia:2,Bangladesh:25,England:1,Hong Kong:19,India:6,Ireland:29, New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Note: If no value is entered for opposition then all teams are considered
##     host        
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5, South Africa:3,Sri Lanka:8,United States of America:11,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file        
##     Name of the file to store the data into for e.g. kohli.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
##     result      
##     This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     bowlerWktRateTT getPlayerData
##     
##     Examples
##     
##     ## Not run: 
##     # Only away. Get data only for won and lost innings
##     kohli =getPlayerDataTT(253802,dir="../cricketr/data", file="kohli1.csv",
##     type="batting")
##     
##     # Get bowling data and store in file for future
##     ashwin = getPlayerDataTT(26421,dir="../cricketr/data",file="ashwin1.csv",
##     type="bowling")
##     
##     kohli =getPlayerDataTT(253802,opposition = 2,host=2,dir="../cricketr/data", 
##     file="kohli1.csv",type="batting")

The details below will introduce the different functions that are available in cricpy.

4. Get the Twenty20 player data for a player using the function getPlayerDataOD()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataTT for all subsequent analyses

import cricpy.analytics as ca
#kohli=ca.getPlayerDataTT(253802,dir=".",file="kohli.csv",type="batting")
#guptill=ca.getPlayerDataTT(226492,dir=".",file="guptill.csv",type="batting")
#shahzad=ca.getPlayerDataTT(419873,dir=".",file="shahzad.csv",type="batting")
#mccullum=ca.getPlayerDataTT(37737,dir=".",file="mccullum.csv",type="batting")

Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test, ODI and Twenty20 records

5 Virat Kohli’s performance – Basic Analyses

The 3 plots below provide the following for Virat Kohli in T20s

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")

ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")

6. More analyses

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

ca.batsman6s("./kohli.csv","Virat Kohli")

ca.batsmanDismissals("./kohli.csv","Virat Kohli")

ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")

7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

8. Average runs at different venues

The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")

9. Average runs against different opposing teams

This plot computes the average runs scored by Kohli against different countries.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")

10 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")

11. A look at the Top 4 batsman – Kohli,  Guptill, Shahzad and McCullum

The following batsmen have been very prolific in Twenty20 cricket and will be used for the analyses

  1. Virat Kohli: Runs – 2167, Average:49.25 ,Strike rate-136.11
  2. MJ Guptill : Runs -2271, Average:34.4 ,Strike rate-132.88
  3. Mohammed Shahzad :Runs – 1936, Average:31.22 ,Strike rate-134.81
  4. BB McCullum : Runs – 2140, Average:35.66 ,Strike rate-136.21

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")

ca.batsmanPerfBoxHist("./guptill.csv","M J Guptill")

ca.batsmanPerfBoxHist("./shahzad.csv","M Shahzad")

ca.batsmanPerfBoxHist("./mccullum.csv","BB McCullum")

13 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 Twenty20 batsmen.

import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")

ca.batsmanMovingAverage("./guptill.csv","M J Guptill")
#ca.batsmanMovingAverage("./shahzad.csv","M Shahzad") # Gives error. Check!

ca.batsmanMovingAverage("./mccullum.csv","BB McCullum")

14 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career.Kohli’s average tops around 45 runs around 43 innings, though there is a dip downwards

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeAverageRuns("./guptill.csv","M J Guptill")

ca.batsmanCumulativeAverageRuns("./shahzad.csv","M Shahzad")

ca.batsmanCumulativeAverageRuns("./mccullum.csv","BB McCullum")

15 Cumulative Average strike rate of batsman in career

Kohli, Guptill and McCullum average a strike rate of 125+

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeStrikeRate("./guptill.csv","M J Guptill")

ca.batsmanCumulativeStrikeRate("./shahzad.csv","M Shahzad")

ca.batsmanCumulativeStrikeRate("./mccullum.csv","BB McCullum")

16 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman. Kohli is way above all the other 3 batsmen. Behind Kohli is McCullum and then Guptill

import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullumn"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

17. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show that Kohli tops the overall strike rate followed by McCullum and then Guptill

import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullum"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

18. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

ca.battingPerf3d("./guptill.csv","M J Guptill")

ca.battingPerf3d("./shahzad.csv","M Shahzad")

ca.battingPerf3d("./mccullum.csv","BB McCullum")

19. 3D plot of Runs vs Balls Faced and Minutes at Crease

Guptill and McCullum have a large percentage of sixes in comparison to the 4s. Kohli has a relative lower number of 6s

import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullum"]
ca.batsman4s6s(frames,names)

20. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
##             BF        Mins        Runs
## 0    10.000000   30.000000   14.753153
## 1    37.857143   70.714286   55.963333
## 2    65.714286  111.428571   97.173513
## 3    93.571429  152.142857  138.383693
## 4   121.428571  192.857143  179.593873
## 5   149.285714  233.571429  220.804053
## 6   177.142857  274.285714  262.014233
## 7   205.000000  315.000000  303.224414
## 8   232.857143  355.714286  344.434594
## 9   260.714286  396.428571  385.644774
## 10  288.571429  437.142857  426.854954
## 11  316.428571  477.857143  468.065134
## 12  344.285714  518.571429  509.275314
## 13  372.142857  559.285714  550.485494
## 14  400.000000  600.000000  591.695674

21 Analysis of Top Bowlers

The following 4 bowlers have had an excellent career and will be used for the analysis

  1. Shakib Hasan:Wickets: 80, Average = 21.07, Economy Rate – 6.74
  2. Mohammed Nabi : Wickets: 67, Average = 24.25, Economy Rate – 7.13
  3. Rashid Khan: Wickets: 64, Average = 12.40, Economy Rate – 6.01
  4. Imran Tahir : Wickets:62, Average – 14.95, Economy Rate – 6.77

22. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#shakib=ca.getPlayerDataTT(56143,dir=".",file="shakib.csv",type="bowling")
#nabi=ca.getPlayerDataOD(25913,dir=".",file="nabi.csv",type="bowling")
#rashid=ca.getPlayerDataOD(793463,dir=".",file="rashid.csv",type="bowling")
#tahir=ca.getPlayerDataOD(40618,dir=".",file="tahir.csv",type="bowling")

23. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./shakib.csv","Shakib Al Hasan")

ca.bowlerWktsFreqPercent("./nabi.csv","Mohammad Nabi")

ca.bowlerWktsFreqPercent("./rashid.csv","Rashid Khan")

ca.bowlerWktsFreqPercent("./tahir.csv","Imran Tahir")

24. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken.

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./shakib.csv","Shakib Al Hasan")

ca.bowlerWktsRunsPlot("./nabi.csv","Mohammad Nabi")

ca.bowlerWktsRunsPlot("./rashid.csv","Rashid Khan")

ca.bowlerWktsRunsPlot("./tahir.csv","Imran Tahir")

25 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues.

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./shakib.csv","Shakib Al Hasan")

ca.bowlerAvgWktsGround("./nabi.csv","Mohammad Nabi")

ca.bowlerAvgWktsGround("./rashid.csv","Rashid Khan")

ca.bowlerAvgWktsGround("./tahir.csv","Imran Tahir")

26 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./shakib.csv","Shakib Al Hasan")

ca.bowlerAvgWktsOpposition("./nabi.csv","Mohammad Nabi")

ca.bowlerAvgWktsOpposition("./rashid.csv","Rashid Khan")

ca.bowlerAvgWktsOpposition("./tahir.csv","Imran Tahir")

27 Wickets taken moving average

From the plot below it can be see

import cricpy.analytics as ca
ca.bowlerMovingAverage("./shakib.csv","Shakib Al Hasan")

ca.bowlerMovingAverage("./nabi.csv","Mohammad Nabi")

ca.bowlerMovingAverage("./rashid.csv","Rashid Khan")

ca.bowlerMovingAverage("./tahir.csv","Imran Tahir")

28 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. Rashid Khan has been the most effective with almost 2.28 wickets per match

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./shakib.csv","Shakib Al Hasan")

ca.bowlerCumulativeAvgWickets("./nabi.csv","Mohammad Nabi")

ca.bowlerCumulativeAvgWickets("./rashid.csv","Rashid Khan")

ca.bowlerCumulativeAvgWickets("./tahir.csv","Imran Tahir")

29 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. Rashid Khan has the nest economy rate followed by Mohammed Nabi

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./shakib.csv","Shakib Al Hasan")

ca.bowlerCumulativeAvgEconRate("./nabi.csv","Mohammad Nabi")

ca.bowlerCumulativeAvgEconRate("./rashid.csv","Rashid Khan")

ca.bowlerCumulativeAvgEconRate("./tahir.csv","Imran Tahir")

30 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate is given below. It can be seen that Rashid Khan has the best economy rate followed by Mohammed Nabi and then Imran Tahir

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

31 Relative Economy Rate against wickets taken

Rashid Khan has the best figures for wickets between 2-3.5 wickets. Mohammed Nabi pips Rashid Khan when takes a haul of 4 wickets.

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlingER(frames,names)

32 Relative cumulative average wickets of bowlers in career

Rashid has the best performance with cumulative average wickets. He is followed by Imran Tahir in the wicket haul, followed by Shakib Al Hasan

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

33. Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Kohli, Guptill, Shahzad and McCullum
1.Kohli has the best overall cumulative average runs and towers over everybody else
2. Kohli, Guptill and McCullum has a very good strike rate of around 125+
3. Guptill and McCullum have a larger percentage of sixes as compared to Kohli
4. Rashid Khan has the best cumulative average wickets, followed by Imran Tahir and then Shakib Al Hasan
5. Rashid Khan is the most economical bowler, followed by Mohammed Nabi

You can fork/clone the package at Github cricpy

Conclusion

Cricpy now has almost all the functions and functionalities of my R package cricketr. There are still a few more features that need to be added to cricpy. I intend to do this as and when I find time.

Go ahead, take cricpy for a spin! Hope you enjoy the ride!

Watch this space!!!

You may also like
1. A method for optimal bandwidth usage by auctioning available bandwidth using the OpenFlow protocol
2. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
3. Dabbling with Wiener filter using OpenCV
4. Deep Learning from first principles in Python, R and Octave – Part 5
5. Latency, throughput implications for the Cloud
6. Bend it like Bluemix, MongoDB using Auto-scale – Part 1!
7. Sea shells on the seashore
8. Practical Machine Learning with R and Python – Part 4

To see all posts click Index of Posts

Cricpy takes a swing at the ODIs


No computer has ever been designed that is ever aware of what it’s doing; but most of the time, we aren’t either.” Marvin Minksy

“The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague” Edgser Djikstra

Introduction

In this post, cricpy, the Python avatar of my R package cricketr, learns some new tricks to be able to handle ODI matches. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers

Cricpy uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

To know how to use cricpy see Introducing cricpy:A python package to analyze performances of cricketers. To the original version of cricpy, I have added 3 new functions for ODI. The earlier functions work for Test and ODI.

This post is also hosted on Rpubs at Cricpy takes a swing at the ODIs. You can also down the pdf version of this post at cricpy-odi.pdf

You can fork/clone the package at Github cricpy

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

The cricpy package

The data for a particular player in ODI can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Virendar Sehwag, Chris Gayle etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohli’s profile is 253802. This can be used to get the data for Virat Kohlis shown below

The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the lanuguages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the package at Github cricpy

Note: The charts are self-explanatory and I have not added much of my owy interpretation to it. Do look at the plots closely and check out the performances for yourself.

1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 

2. Invoking functions with Python package crlcpy

import cricpy.analytics as ca 
ca.batsman4s("./kohli.csv","Virat Kohli")

3. Getting help from cricpy – Python

import cricpy.analytics as ca 
help(ca.getPlayerDataOD)
## Help on function getPlayerDataOD in module cricpy.analytics:
## 
## getPlayerDataOD(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
##     Get the One day player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerDataOD(profile, opposition="",host="",dir = "../", file = "player001.csv", 
##     type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virender Sehwag this turns out to be http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
##     opposition      The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,Bermuda:12, England:1,Hong Kong:19,India:6,Ireland:29, Netherlands:15,New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Africa XI:405 Note: If no value is entered for opposition then all teams are considered
##     host            The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,Ireland:29,Malaysia:16,New Zealand:5,Pakistan:7, Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "../data". Default="../data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
##     result      
##     This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp getPlayerData
##     
##     Examples
##     
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     sehwag =getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag1.csv",
##     type="batting", homeOrAway=[1,2],result=[1,2,3,4])
##     
##     # Only away. Get data only for won and lost innings
##     sehwag = getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag2.csv",
##     type="batting",homeOrAway=[2],result=[1,2])
##     
##     # Get bowling data and store in file for future
##     malinga = getPlayerData(49758,dir="../cricketr/data",file="malinga1.csv",
##     type="bowling")
##     
##     # Get Dhoni's ODI record in Australia against Australua
##     dhoni = getPlayerDataOD(28081,opposition = 2,host=2,dir=".",
##     file="dhoniVsAusinAusOD",type="batting")
##     
##     ## End(Not run)

The details below will introduce the different functions that are available in cricpy.

4. Get the ODI player data for a player using the function getPlayerDataOD()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataOD for all subsequent analyses

import cricpy.analytics as ca
#sehwag=ca.getPlayerDataOD(35263,dir=".",file="sehwag.csv",type="batting")
#kohli=ca.getPlayerDataOD(253802,dir=".",file="kohli.csv",type="batting")
#jayasuriya=ca.getPlayerDataOD(49209,dir=".",file="jayasuriya.csv",type="batting")
#gayle=ca.getPlayerDataOD(51880,dir=".",file="gayle.csv",type="batting")

Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test & ODI records

5 Virat Kohli’s performance – Basic Analyses

The 3 plots below provide the following for Virat Kohli

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")

ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")

6. More analyses

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

ca.batsman6s("./kohli.csv","Virat Kohli")

ca.batsmanDismissals("./kohli.csv","Virat Kohli")

ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")


7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

Average runs at different venues

The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")

9. Average runs against different opposing teams

This plot computes the average runs scored by Kohli against different countries.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")

10 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")

A look at the Top 4 batsman – Kohli, Jayasuriya, Sehwag and Gayle

The following batsmen have been very prolific in ODI cricket and will be used for the analyses

  1. Virat Kohli: Runs – 10232, Average:59.83 ,Strike rate-92.88
  2. Sanath Jayasuriya : Runs – 13430, Average:32.36 ,Strike rate-91.2
  3. Virendar Sehwag :Runs – 8273, Average:35.05 ,Strike rate-104.33
  4. Chris Gayle : Runs – 9727, Average:37.12 ,Strike rate-85.82

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")

ca.batsmanPerfBoxHist("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanPerfBoxHist("./gayle.csv","Chris Gayle")

ca.batsmanPerfBoxHist("./sehwag.csv","Virendar Sehwag")

13 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Kohli’s performance has been steadily improving over the years, so has Sehwag. Gayle seems to be on the way down

import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")

ca.batsmanMovingAverage("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanMovingAverage("./gayle.csv","Chris Gayle")

ca.batsmanMovingAverage("./sehwag.csv","Virendar Sehwag")

14 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Kohli seems to be getting better with time and reaches a cumulative average of 45+. Sehwag improves with time and reaches around 35+. Chris Gayle drops from 42 to 35

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeAverageRuns("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeAverageRuns("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeAverageRuns("./sehwag.csv","Virendar Sehwag")

15 Cumulative Average strike rate of batsman in career

Sehwag has the best strike rate of almost 90. Kohli and Jayasuriya have a cumulative strike rate of 75.

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeStrikeRate("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeStrikeRate("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeStrikeRate("./sehwag.csv","Virendar Sehwag")

16 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman . It can be seen that Virat Kohli towers above all others in the runs. He is followed by Chris Gayle and then Sehwag

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percentages for each 10 run bucket. The plot below show Sehwag has the best strike rate, followed by Jayasuriya

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

18. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

ca.battingPerf3d("./jayasuriya.csv","Sanath jayasuriya")

ca.battingPerf3d("./gayle.csv","Chris Gayle")

ca.battingPerf3d("./sehwag.csv","Virendar Sehwag")

3D plot of Runs vs Balls Faced and Minutes at Crease

From the plot below it can be seen that Sehwag has more runs by way of 4s than 1’s,2’s or 3s. Gayle and Jayasuriya have large number of 6s

import cricpy.analytics as ca
frames = ["./sehwag.csv","./kohli.csv","./gayle.csv","./jayasuriya.csv"]
names = ["Sehwag","Kohli","Gayle","Jayasuriya"]
ca.batsman4s6s(frames,names)

20. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
##             BF        Mins        Runs
## 0    10.000000   30.000000    6.807407
## 1    37.857143   70.714286   36.034833
## 2    65.714286  111.428571   65.262259
## 3    93.571429  152.142857   94.489686
## 4   121.428571  192.857143  123.717112
## 5   149.285714  233.571429  152.944538
## 6   177.142857  274.285714  182.171965
## 7   205.000000  315.000000  211.399391
## 8   232.857143  355.714286  240.626817
## 9   260.714286  396.428571  269.854244
## 10  288.571429  437.142857  299.081670
## 11  316.428571  477.857143  328.309096
## 12  344.285714  518.571429  357.536523
## 13  372.142857  559.285714  386.763949
## 14  400.000000  600.000000  415.991375

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

21 Analysis of Top Bowlers

The following 4 bowlers have had an excellent career and will be used for the analysis

  1. Muthiah Muralitharan:Wickets: 534, Average = 23.08, Economy Rate – 3.93
  2. Wasim Akram : Wickets: 502, Average = 23.52, Economy Rate – 3.89
  3. Shaun Pollock: Wickets: 393, Average = 24.50, Economy Rate – 3.67
  4. Javagal Srinath : Wickets:315, Average – 28.08, Economy Rate – 4.44

How do Muralitharan, Akram, Pollock and Srinath compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

22. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#akram=ca.getPlayerDataOD(43547,dir=".",file="akram.csv",type="bowling")
#murali=ca.getPlayerDataOD(49636,dir=".",file="murali.csv",type="bowling")
#pollock=ca.getPlayerDataOD(46774,dir=".",file="pollock.csv",type="bowling")
#srinath=ca.getPlayerDataOD(34105,dir=".",file="srinath.csv",type="bowling")

23. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./murali.csv","M Muralitharan")

ca.bowlerWktsFreqPercent("./akram.csv","Wasim Akram")

ca.bowlerWktsFreqPercent("./pollock.csv","Shaun Pollock")

ca.bowlerWktsFreqPercent("./srinath.csv","J Srinath")

24. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken. Murali’s median runs for wickets ia around 40 while Akram, Pollock and Srinath it is around 32+ runs. The spread around the median is larger for these 3 bowlers in comparison to Murali

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./murali.csv","M Muralitharan")

ca.bowlerWktsRunsPlot("./akram.csv","Wasim Akram")

ca.bowlerWktsRunsPlot("./pollock.csv","Shaun Pollock")

ca.bowlerWktsRunsPlot("./srinath.csv","J Srinath")

25 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsGround("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsGround("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsGround("./srinath.csv","J Srinath")

26 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsOpposition("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsOpposition("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsOpposition("./srinath.csv","J Srinath")

27 Wickets taken moving average

From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets

import cricpy.analytics as ca
ca.bowlerMovingAverage("./murali.csv","M Muralitharan")

ca.bowlerMovingAverage("./akram.csv","Wasim Akram")

ca.bowlerMovingAverage("./pollock.csv","Shaun Pollock")

ca.bowlerMovingAverage("./srinath.csv","J Srinath")

28 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. Muralitharan has consistently taken wickets at an average of 1.6 wickets per game. Shaun Pollock has an average of 1.5

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgWickets("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgWickets("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgWickets("./srinath.csv","J Srinath")

29 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. Pollock is the most economical, followed by Akram and then Murali

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgEconRate("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgEconRate("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgEconRate("./srinath.csv","J Srinath")

30 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate shows that Pollock is the most economical of the 4 bowlers. He is followed by Akram and then Murali

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

31 Relative Economy Rate against wickets taken

Pollock is most economical vs number of wickets taken. Murali has the best figures for 4 wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlingER(frames,names)

32 Relative cumulative average wickets of bowlers in career

The plot below shows that McGrath has the best overall cumulative average wickets. While the bowlers are neck to neck around 130 innings, you can see Muralitharan is most consistent and leads the pack after 150 innings in the number of wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

33. Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing

  1. Kohli is a mean run machine and has been consistently piling on runs. Clearly records will lay shattered in days to come for Kohli
  2. Virendar Sehwag has the best strike rate of the 4, followed by Jayasuriya and then Kohli
  3. Shaun Pollock is the most economical of the bowlers followed by Wasim Akram
  4. Muralitharan is the most consistent wicket of the lot.

Also see
1. Architecting a cloud based IP Multimedia System (IMS)
2. Exploring Quantum Gate operations with QCSimulator
3. Dabbling with Wiener filter using OpenCV
4. Deep Learning from first principles in Python, R and Octave – Part 5
5. Big Data-2: Move into the big league:Graduate from R to SparkR
6. Singularity
7. Practical Machine Learning with R and Python – Part 4
8. Literacy in India – A deepR dive
9. Modeling a Car in Android

To see all posts click Index of Posts

Introducing cricpy:A python package to analyze performances of cricketers


Full many a gem of purest ray serene,
The dark unfathomed caves of ocean bear;
Full many a flower is born to blush unseen,
And waste its sweetness on the desert air.

            Thomas Gray, An Elegy Written In A Country Churchyard
            

Introduction

It is finally here! cricpy, the python avatar , of my R package cricketr is now ready to rock-n-roll! My R package cricketr had its genesis about 3 and some years ago and went through a couple of enhancements. During this time I have always thought about creating an equivalent python package like cricketr. Now I have finally done it.

So here it is. My python package ‘cricpy!!!’

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

This post is also hosted on Rpubs at Introducing cricpy. You can also download the pdf version of this post at cricpy.pdf

Do check out my post on R package cricketr at Re-introducing cricketr! : An R package to analyze performances of cricketers

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This package uses the statistics info available in ESPN Cricinfo Statsguru.

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

The cricpy package

The cricpy package has several functions that perform several different analyses on both batsman and bowlers. The package has functions that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecasting, performance of a player against different oppositions, contribution to wins and losses etc.

The data for a particular player can be obtained with the getPlayerData() function. To do this you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Rahul Dravid, Virat Kohli, Alastair Cook etc. This will bring up a page which have the profile number for the player e.g. for Rahul Dravid this would be http://www.espncricinfo.com/india/content/player/28114.html. Hence, Dravid’s profile is 28114. This can be used to get the data for Rahul Dravid as shown below

The cricpy package is almost a clone of my R package cricketr. The signature of all the python functions are identical with that of its R avatar namely  ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familiar with one of the languages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the cricpy package at Github cricpy

The following 2 examples show the similarity between cricketr and cricpy packages

1a.Importing cricketr – R

Importing cricketr in R

#install.packages("cricketr")
library(cricketr)

2a. Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy
# You could either do
#1.  
import cricpy.analytics as ca 
#ca.batsman4s("../dravid.csv","Rahul Dravid")
# Or
#2.
from cricpy.analytics import *
#batsman4s("../dravid.csv","Rahul Dravid")

I would recommend using option 1 namely ca.batsman4s() as I may add an advanced analytics module in the future to cricpy.

2 Invoking functions

You can seen how the 2 calls are identical for both the R package cricketr and the Python package cricpy

2a. Invoking functions with R package ‘cricketr’

library(cricketr)
batsman4s("../dravid.csv","Rahul Dravid")

2b. Invoking functions with Python package ‘cricpy’

import cricpy.analytics as ca 
ca.batsman4s("../dravid.csv","Rahul Dravid")

 

3a. Getting help from cricketr – R

#help("getPlayerData")

3b. Getting help from cricpy – Python

help(ca.getPlayerData)
## Help on function getPlayerData in module cricpy.analytics:
## 
## getPlayerData(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2], result=[1, 2, 4], create=True)
##     Get the player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##     
##     Description
##     
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##     
##     Usage
##     
##     getPlayerData(profile,opposition="",host="",dir="./data",file="player001.csv",
##     type="batting", homeOrAway=c(1,2),result=c(1,2,4))
##     Arguments
##     
##     profile     
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Sachin Tendulkar this turns out to be http://www.espncricinfo.com/india/content/player/35320.html. Hence the profile for Sachin is 35320
##     opposition  
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     host        
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5,Pakistan:7,South Africa:3,Sri Lanka:8, West Indies:4, Zimbabwe:9
##     dir 
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file        
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type        
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway  
##     This is a list with either 1,2 or both. 1 is for home 2 is for away
##     result      
##     This is a list that can take values 1,2,4. 1 - won match 2- lost match 4- draw
##     Details
##     
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##     
##     Value
##     
##     Returns the player's dataframe
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://www.espncricinfo.com/ci/content/stats/index.html
##     https://gigadom.wordpress.com/
##     
##     See Also
##     
##     getPlayerDataSp
##     
##     Examples
##     
##     ## Not run: 
##     # Both home and away. Result = won,lost and drawn
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar1.csv",
##     type="batting", homeOrAway=[1,2],result=[1,2,4])
##     
##     # Only away. Get data only for won and lost innings
##     tendulkar = getPlayerData(35320,dir=".", file="tendulkar2.csv",
##     type="batting",homeOrAway=[2],result=[1,2])
##     
##     # Get bowling data and store in file for future
##     kumble = getPlayerData(30176,dir=".",file="kumble1.csv",
##     type="bowling",homeOrAway=[1],result=[1,2])
##     
##     #Get the Tendulkar's Performance against Australia in Australia
##     tendulkar = getPlayerData(35320, opposition = 2,host=2,dir=".", 
##     file="tendulkarVsAusInAus.csv",type="batting")

The details below will introduce the different functions that are available in cricpy.

3. Get the player data for a player using the function getPlayerData()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. dravid.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

import cricpy.analytics as ca
#dravid =ca.getPlayerData(28114,dir="..",file="dravid.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
#acook =ca.getPlayerData(11728,dir="..",file="acook.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])
import cricpy.analytics as ca
#lara =ca.getPlayerData(52337,dir="..",file="lara.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])253802
#kohli =ca.getPlayerData(253802,dir="..",file="kohli.csv",type="batting",homeOrAway=[1,2], result=[1,2,4])

4 Rahul Dravid’s performance – Basic Analyses

The 3 plots below provide the following for Rahul Dravid

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("../dravid.csv","Rahul Dravid")

ca.batsmanMeanStrikeRate("../dravid.csv","Rahul Dravid")

ca.batsmanRunsRanges("../dravid.csv","Rahul Dravid") 

5. More analyses

import cricpy.analytics as ca
ca.batsman4s("../dravid.csv","Rahul Dravid")

ca.batsman6s("../dravid.csv","Rahul Dravid") 

ca.batsmanDismissals("../dravid.csv","Rahul Dravid")

6. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Dravid Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")

7. Average runs at different venues

The plot below gives the average runs scored by Dravid at different grounds. The plot also the number of innings at each ground as a label at x-axis. It can be seen Dravid did great in Rawalpindi, Leeds, Georgetown overseas and , Mohali and Bangalore at home

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("../dravid.csv","Rahul Dravid")

 

8. Average runs against different opposing teams

This plot computes the average runs scored by Dravid against different countries. Dravid has an average of 50+ in England, New Zealand, West Indies and Zimbabwe.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("../dravid.csv","Rahul Dravid")

9 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Sachin is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Dravid’s  highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("../dravid.csv","Rahul Dravid")

10. A look at the Top 4 batsman – Rahul Dravid, Alastair Cook, Brian Lara and Virat Kohli

The following batsmen have been very prolific in test cricket and will be used for teh analyses

  1. Rahul Dravid :Average:52.31,100’s – 36, 50’s – 63
  2. Alastair Cook : Average: 45.35, 100’s – 33, 50’s – 57
  3. Brian Lara : Average: 52.88, 100’s – 34 , 50’s – 48
  4. Virat Kohli: Average: 54.57 ,100’s – 24 , 50’s – 19

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

11. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("../dravid.csv","Rahul Dravid")

ca.batsmanPerfBoxHist("../acook.csv","Alastair Cook")

ca.batsmanPerfBoxHist("../lara.csv","Brian Lara")


ca.batsmanPerfBoxHist("../kohli.csv","Virat Kohli")


12. Contribution to won and lost matches

The plot below shows the contribution of Dravid, Cook, Lara and Kohli in matches won and lost. It can be seen that in matches where India has won Dravid and Kohli have scored more and must have been instrumental in the win

For the 2 functions below you will have to use the getPlayerDataSp() function as shown below. I have commented this as I already have these files

import cricpy.analytics as ca
#dravidsp = ca.getPlayerDataSp(28114,tdir=".",tfile="dravidsp.csv",ttype="batting")
#acooksp = ca.getPlayerDataSp(11728,tdir=".",tfile="acooksp.csv",ttype="batting")
#larasp = ca.getPlayerDataSp(52337,tdir=".",tfile="larasp.csv",ttype="batting")
#kohlisp = ca.getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
import cricpy.analytics as ca
ca.batsmanContributionWonLost("../dravidsp.csv","Rahul Dravid")

ca.batsmanContributionWonLost("../acooksp.csv","Alastair Cook")

ca.batsmanContributionWonLost("../larasp.csv","Brian Lara")

ca.batsmanContributionWonLost("../kohlisp.csv","Virat Kohli")


13. Performance at home and overseas

From the plot below it can be seen

Dravid has a higher median overseas than at home.Cook, Lara and Kohli have a lower median of runs overseas than at home.

This function also requires the use of getPlayerDataSp() as shown above

import cricpy.analytics as ca
ca.batsmanPerfHomeAway("../dravidsp.csv","Rahul Dravid")

ca.batsmanPerfHomeAway("../acooksp.csv","Alastair Cook")

ca.batsmanPerfHomeAway("../larasp.csv","Brian Lara")

ca.batsmanPerfHomeAway("../kohlisp.csv","Virat Kohli")

14 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Lara’s performance seems to have been quite good before his retirement(wonder why retired so early!). Kohli’s performance has been steadily improving over the years

import cricpy.analytics as ca
ca.batsmanMovingAverage("../dravid.csv","Rahul Dravid")

ca.batsmanMovingAverage("../acook.csv","Alastair Cook")

ca.batsmanMovingAverage("../lara.csv","Brian Lara")

ca.batsmanMovingAverage("../kohli.csv","Virat Kohli")

15 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Dravid averages around 48, Cook around 44, Lara around 50 and Kohli shows a steady improvement in his cumulative average. Kohli seems to be getting better with time.

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("../dravid.csv","Rahul Dravid")

ca.batsmanCumulativeAverageRuns("../acook.csv","Alastair Cook")

ca.batsmanCumulativeAverageRuns("../lara.csv","Brian Lara")

ca.batsmanCumulativeAverageRuns("../kohli.csv","Virat Kohli")

16 Cumulative Average strike rate of batsman in career

Lara has a terrific strike rate of 52+. Cook has a better strike rate over Dravid. Kohli’s strike rate has improved over the years.

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("../dravid.csv","Rahul Dravid")

ca.batsmanCumulativeStrikeRate("../acook.csv","Alastair Cook")

ca.batsmanCumulativeStrikeRate("../lara.csv","Brian Lara")

ca.batsmanCumulativeStrikeRate("../kohli.csv","Virat Kohli")


17 Future Runs forecast

Here are plots that forecast how the batsman will perform in future. Currently ARIMA has been used for the forecast. (To do:  Perform Holt-Winters forecast!)

import cricpy.analytics as ca
ca.batsmanPerfForecast("../dravid.csv","Rahul Dravid")
##                              ARIMA Model Results                              
## ==============================================================================
## Dep. Variable:                 D.runs   No. Observations:                  284
## Model:                 ARIMA(5, 1, 0)   Log Likelihood               -1522.837
## Method:                       css-mle   S.D. of innovations             51.488
## Date:                Sun, 28 Oct 2018   AIC                           3059.673
## Time:                        09:47:39   BIC                           3085.216
## Sample:                    07-04-1996   HQIC                          3069.914
##                          - 01-24-2012                                         
## ================================================================================
##                    coef    std err          z      P>|z|      [0.025      0.975]
## --------------------------------------------------------------------------------
## const           -0.1336      0.884     -0.151      0.880      -1.867       1.599
## ar.L1.D.runs    -0.7729      0.058    -13.322      0.000      -0.887      -0.659
## ar.L2.D.runs    -0.6234      0.071     -8.753      0.000      -0.763      -0.484
## ar.L3.D.runs    -0.5199      0.074     -7.038      0.000      -0.665      -0.375
## ar.L4.D.runs    -0.3490      0.071     -4.927      0.000      -0.488      -0.210
## ar.L5.D.runs    -0.2116      0.058     -3.665      0.000      -0.325      -0.098
##                                     Roots                                    
## =============================================================================
##                  Real           Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## AR.1            0.5789           -1.1743j            1.3093           -0.1771
## AR.2            0.5789           +1.1743j            1.3093            0.1771
## AR.3           -1.3617           -0.0000j            1.3617           -0.5000
## AR.4           -0.7227           -1.2257j            1.4230           -0.3348
## AR.5           -0.7227           +1.2257j            1.4230            0.3348
## -----------------------------------------------------------------------------
##                 0
## count  284.000000
## mean    -0.306769
## std     51.632947
## min   -106.653589
## 25%    -33.835148
## 50%     -8.954253
## 75%     21.024763
## max    223.152901
## 
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:646: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
##   if issubdtype(paramsdtype, float):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:650: FutureWarning: Conversion of the second argument of issubdtype from `complex` to `np.complexfloating` is deprecated. In future, it will be treated as `np.complex128 == np.dtype(complex).type`.
##   elif issubdtype(paramsdtype, complex):
## C:\Users\Ganesh\ANACON~1\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:577: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
##   if issubdtype(paramsdtype, float):

18 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following Range 30 – 100 innings – Lara leads followed by Dravid Range 100+ innings – Kohli races ahead of the rest

import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

19. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

Brian Lara towers over the Dravid, Cook and Kohli. However you will notice that Kohli’s strike rate is going up

import cricpy.analytics as ca
frames = ["../dravid.csv","../acook.csv","../lara.csv","../kohli.csv"]
names = ["Dravid","A Cook","Brian Lara","V Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

20. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("../dravid.csv","Rahul Dravid")

ca.battingPerf3d("../acook.csv","Alastair Cook")

ca.battingPerf3d("../lara.csv","Brian Lara")

ca.battingPerf3d("../kohli.csv","Virat Kohli")

21. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
dravid = ca.batsmanRunsPredict("../dravid.csv",newDF,"Dravid")
print(dravid)
##             BF        Mins        Runs
## 0    10.000000   30.000000    0.519667
## 1    37.857143   70.714286   13.821794
## 2    65.714286  111.428571   27.123920
## 3    93.571429  152.142857   40.426046
## 4   121.428571  192.857143   53.728173
## 5   149.285714  233.571429   67.030299
## 6   177.142857  274.285714   80.332425
## 7   205.000000  315.000000   93.634552
## 8   232.857143  355.714286  106.936678
## 9   260.714286  396.428571  120.238805
## 10  288.571429  437.142857  133.540931
## 11  316.428571  477.857143  146.843057
## 12  344.285714  518.571429  160.145184
## 13  372.142857  559.285714  173.447310
## 14  400.000000  600.000000  186.749436

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

22 Analysis of Top 3 wicket takers

The following 3 bowlers have had an excellent career and will be used for the analysis

  1. Glenn McGrath:Wickets: 563, Average = 21.64, Economy Rate – 2.49
  2. Kapil Dev : Wickets: 434, Average = 29.64, Economy Rate – 2.78
  3. James Anderson: Wickets: 564, Average = 28.64, Economy Rate – 2.88

How do Glenn McGrath, Kapil Dev and James Anderson compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

23. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#mcgrath =ca.getPlayerData(6565,dir=".",file="mcgrath.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#kapil =ca.getPlayerData(30028,dir=".",file="kapil.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])
#anderson =ca.getPlayerData(8608,dir=".",file="anderson.csv",type="bowling",homeOrAway=[1,2], result=[1,2,4])

24. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("../mcgrath.csv","Glenn McGrath")

ca.bowlerWktsFreqPercent("../kapil.csv","Kapil Dev")

ca.bowlerWktsFreqPercent("../anderson.csv","James Anderson")

25. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("../mcgrath.csv","Glenn McGrath")

ca.bowlerWktsRunsPlot("../kapil.csv","Kapil Dev")

ca.bowlerWktsRunsPlot("../anderson.csv","James Anderson")

26 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("../mcgrath.csv","Glenn McGrath")

ca.bowlerAvgWktsGround("../kapil.csv","Kapil Dev")

ca.bowlerAvgWktsGround("../anderson.csv","James Anderson")

27 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("../mcgrath.csv","Glenn McGrath")

ca.bowlerAvgWktsOpposition("../kapil.csv","Kapil Dev")

ca.bowlerAvgWktsOpposition("../anderson.csv","James Anderson")

28 Wickets taken moving average

From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets

import cricpy.analytics as ca
ca.bowlerMovingAverage("../mcgrath.csv","Glenn McGrath")

ca.bowlerMovingAverage("../kapil.csv","Kapil Dev")

ca.bowlerMovingAverage("../anderson.csv","James Anderson")

29 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. mcGrath plateaus around 2.4 wickets, Kapil Dev’s performance deteriorates over the years. Anderson holds on rock steady around 2 wickets

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("../mcgrath.csv","Glenn McGrath")

ca.bowlerCumulativeAvgWickets("../kapil.csv","Kapil Dev")

ca.bowlerCumulativeAvgWickets("../anderson.csv","James Anderson")

30 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. McGrath’s was very expensive early in his career conceding about 2.8 runs per over which drops to around 2.5 runs towards the end. Kapil Dev’s economy rate drops from 3.6 to 2.8. Anderson is probably more expensive than the other 2.

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("../mcgrath.csv","Glenn McGrath")

ca.bowlerCumulativeAvgEconRate("../kapil.csv","Kapil Dev")

ca.bowlerCumulativeAvgEconRate("../anderson.csv","James Anderson")

31 Future Wickets forecast

import cricpy.analytics as ca
ca.bowlerPerfForecast("../mcgrath.csv","Glenn McGrath")
##                              ARIMA Model Results                              
## ==============================================================================
## Dep. Variable:              D.Wickets   No. Observations:                  236
## Model:                 ARIMA(5, 1, 0)   Log Likelihood                -480.815
## Method:                       css-mle   S.D. of innovations              1.851
## Date:                Sun, 28 Oct 2018   AIC                            975.630
## Time:                        09:28:32   BIC                            999.877
## Sample:                    11-12-1993   HQIC                           985.404
##                          - 01-02-2007                                         
## ===================================================================================
##                       coef    std err          z      P>|z|      [0.025      0.975]
## -----------------------------------------------------------------------------------
## const               0.0037      0.033      0.113      0.910      -0.061       0.068
## ar.L1.D.Wickets    -0.9432      0.064    -14.708      0.000      -1.069      -0.818
## ar.L2.D.Wickets    -0.7254      0.086     -8.469      0.000      -0.893      -0.558
## ar.L3.D.Wickets    -0.4827      0.093     -5.217      0.000      -0.664      -0.301
## ar.L4.D.Wickets    -0.3690      0.085     -4.324      0.000      -0.536      -0.202
## ar.L5.D.Wickets    -0.1709      0.064     -2.678      0.008      -0.296      -0.046
##                                     Roots                                    
## =============================================================================
##                  Real           Imaginary           Modulus         Frequency
## -----------------------------------------------------------------------------
## AR.1            0.5630           -1.2761j            1.3948           -0.1839
## AR.2            0.5630           +1.2761j            1.3948            0.1839
## AR.3           -0.8433           -1.0820j            1.3718           -0.3554
## AR.4           -0.8433           +1.0820j            1.3718            0.3554
## AR.5           -1.5981           -0.0000j            1.5981           -0.5000
## -----------------------------------------------------------------------------
##                 0
## count  236.000000
## mean    -0.005142
## std      1.856961
## min     -3.457002
## 25%     -1.433391
## 50%     -0.080237
## 75%      1.446149
## max      5.840050

32 Get player data special

As discussed above the next 2 charts require the use of getPlayerDataSp()

import cricpy.analytics as ca
#mcgrathsp =ca.getPlayerDataSp(6565,tdir=".",tfile="mcgrathsp.csv",ttype="bowling")
#kapilsp =ca.getPlayerDataSp(30028,tdir=".",tfile="kapilsp.csv",ttype="bowling")
#andersonsp =ca.getPlayerDataSp(8608,tdir=".",tfile="andersonsp.csv",ttype="bowling")

33 Contribution to matches won and lost

The plot below is extremely interesting Glenn McGrath has been more instrumental in Australia winning than Kapil and Anderson as seems to have taken more wickets when Australia won.

import cricpy.analytics as ca
ca.bowlerContributionWonLost("../mcgrathsp.csv","Glenn McGrath")

ca.bowlerContributionWonLost("../kapilsp.csv","Kapil Dev")

ca.bowlerContributionWonLost("../andersonsp.csv","James Anderson")

34 Performance home and overseas

McGrath and Kapil Dev have performed better overseas than at home. Anderson has performed about the same home and overseas

import cricpy.analytics as ca
ca.bowlerPerfHomeAway("../mcgrathsp.csv","Glenn McGrath")

ca.bowlerPerfHomeAway("../kapilsp.csv","Kapil Dev")

ca.bowlerPerfHomeAway("../andersonsp.csv","James Anderson")

35 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate shows that McGrath has the best economy rate followed by Kapil Dev and then Anderson.

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

36 Relative Economy Rate against wickets taken

McGrath has been economical regardless of the number of wickets taken. Kapil Dev has been slightly more expensive when he takes more wickets

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlingER(frames,names)

37 Relative cumulative average wickets of bowlers in career

The plot below shows that McGrath has the best overall cumulative average wickets. Kapil’s leads Anderson till about 150 innings after which Anderson takes over

import cricpy.analytics as ca
frames = ["../mcgrath.csv","../kapil.csv","../anderson.csv"]
names = ["Glenn McGrath","Kapil Dev","James Anderson"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

Key insights

1. Brian Lara is head and shoulders above the rest in the overall strike rate
2. Kohli performance has been steadily improving over the years and with the way he is going he will shatter all records.
3. Kohli and Dravid have scored more in matches where India has won than the other two.
4. Dravid has performed very well overseas
5. The cumulative average runs has Kohli just edging out the other 3. Kohli is probably midway in his career but considering that his moving average is improving strongly, we can expect great things of him with the way he is going.
6. McGrath has had some great performances overseas
7. Mcgrath has the best economy rate and has contributed significantly to Australia’s wins.
8.In the cumulative average wickets race McGrath leads the pack. Kapil leads Anderson till about 150 matches after which Anderson takes over.

The code for cricpy can be accessed at Github at cricpy

Do let me know if you run into issues.

Conclusion

I have long wanted to make a python equivalent of cricketr and I have been able to make it. cricpy is still work in progress. I have add the necessary functions for ODI and Twenty20.  Go ahead give ‘cricpy’ a spin!!

Stay tuned!

The 3rd paperback & kindle editions of my books on Cricket, now on Amazon


The 3rd  paperback & kindle edition of both my books on cricket is now available on Amazon

a) Cricket analytics with cricketr, Third Edition. The paperback edition is $12.99 and the kindle edition is $4.99/Rs320.  This book is based on my R package ‘cricketr‘, available on CRAN and uses ESPN Cricinfo Statsguru

b) Beaten by sheer pace! Cricket analytics with yorkr, 3rd edition . The paperback is $12.99 and the kindle version is $6.99/Rs448. This is based on my R package ‘yorkr‘ on CRAN and uses data from Cricsheet
Pick up your copies today!!

Note: In the 3rd edition of  the paperback book, the charts will be in black and white. If you would like the charts to be in color, please check out the 2nd edition of these books see More book, more cricket! 2nd edition of my books now on Amazon

You may also like
1. My book ‘Practical Machine Learning with R and Python’ on Amazon
2. A crime map of India in R: Crimes against women
3.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
4.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
5. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
6. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data

To see all posts see Index of posts

cricketr flexes new muscles: The final analysis


Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

       Jabberwocky by Lewis Carroll
                   

No analysis of cricket is complete, without determining how players would perform in the host country. Playing Test cricket on foreign pitches, in the host country, is a ‘real test’ for both batsmen and bowlers. Players, who can perform consistently both on domestic and foreign pitches are the genuinely ‘class’ players. Player performance on foreign pitches lets us differentiate the paper tigers, and home ground bullies among batsmen. Similarly, spinners who perform well, only on rank turners in home ground or pace bowlers who can only swing and generate bounce on specially prepared pitches are neither  genuine spinners nor  real pace bowlers.

So this post, helps in identifying those with real strengths, and those who play good only when the conditions are in favor, in home grounds. This post brings a certain level of finality to the analysis of players with my R package ‘cricketr’

Besides, I also meant ‘final analysis’ in the literal sense, as I intend to take a long break from cricket analysis/analytics and focus on some other domains like Neural Networks, Deep Learning and Spark.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

As already mentioned, my R package ‘cricketr’ uses the statistics info available in ESPN Cricinfo Statsguru. You should be able to install the package from CRAN and use many of the functions available in the package. Please be mindful of ESPN Cricinfo Terms of Use

(Note: This page is also hosted at RPubs as cricketrFinalAnalysis. You can download the PDF file at cricketrFinalAnalysis.

For getting data of a player against a particular country for the match played in the host country, I just had to add 2 extra parameters to the getPlayerData() function. The cricketr package has been updated with the changed functions for getPlayerData() – Tests, getPlayerDataOD() – ODI and getPlayerDataTT() for the Twenty20s. The updated functions will be available in cricketr Version -0.0.14

The data for the following players have already been obtained with the new, changed getPlayerData() function and have been saved as *.csv files. I will be re-using these files, instead of getting them all over again. Hence the getPlayerData() lines have been commented below

library(cricketr)

1. Performance of a batsman against a host ountry in the host country

For e.g We can the get the data for Sachin Tendulkar for matches played against Australia and in Australia Here opposition=2 and host =2 indicate that the opposition is Australia and the host country is also Australia

#tendulkarAus=getPlayerData(35320,opposition=2,host=2,file="tendulkarVsAusInAus.csv",type="batting")

All cricketr functions can be used with this data frame, as before. All the charts show the performance of Tendulkar in Australia against Australia.

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsman6s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanRunsRanges("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanDismissals("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanAvgRunsGround("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanMovingAverage("./data/tendulkarVsAusInAus.csv","Tendulkar")

dev.off()
## null device 
##           1

2. Relative performances of international batsmen against England in England

While we can analyze the performance of a player against an opposition in some host country, I wanted to compare the relative performances of players, to see how players from different nations play in a host country which is not their home ground.

The following lines gets player’s data of matches played in England and against England.The Oval, Lord’s are famous for generating some dangerous swing and bounce. I chose the following players

  1. Sir Don Bradman (Australia)
  2. Steve Waugh (Australia)
  3. Rahul Dravid (India)
  4. Vivian Richards (West Indies)
  5. Sachin Tendulkar (India)
#tendulkarEng=getPlayerData(35320,opposition=1,host=1,file="tendulkarVsEngInEng.csv",type="batting")
#bradmanEng=getPlayerData(4188,opposition=1,host=1,file="bradmanVsEngInEng.csv",type="batting")
#srwaughEng=getPlayerData(8192,opposition=1,host=1,file="srwaughVsEngInEng.csv",type="batting")
#dravidEng=getPlayerData(28114,opposition=1,host=1,file="dravidVsEngInEng.csv",type="batting")
#vrichardEng=getPlayerData(52812,opposition=1,host=1,file="vrichardsEngInEng.csv",type="batting")
frames <- list("./data/tendulkarVsEngInEng.csv","./data/bradmanVsEngInEng.csv","./data/srwaughVsEngInEng.csv",
               "./data/dravidVsEngInEng.csv","./data/vrichardsEngInEng.csv")
names <- list("S Tendulkar","D Bradman","SR Waugh","R Dravid","Viv Richards")

The Lords and the Oval in England are some of the best pitches in the world. Scoring on these pitches and weather conditions, where there is both swing and bounce really requires excellent batting skills. It can be easily seen that Don Bradman stands heads and shoulders over everybody else, averaging close a cumulative average of 100+. He is followed by Viv Richards, who averages around ~60. Interestingly in English conditions, Rahul Dravid edges out Sachin Tendulkar.

relativeBatsmanCumulativeAvgRuns(frames,names)

# The other 2 plots on relative strike rate and cumulative average strike rate,
shows Viv Richards really  blasts the bowling. Viv Richards has a strike rate 
of 70, while Bradman 62+, followed by Tendulkar.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

3. Relative performances of international batsmen against Australia in Australia

The following players from these countries were chosen

  1. Sachin Tendulkar (India)
  2. Viv Richard (West Indies)
  3. David Gower (England)
  4. Jacques Kallis (South Africa)
  5. Alastair Cook (Emgland)
frames <- list("./data/tendulkarVsAusInAus.csv","./data/vrichardsVAusInAus.csv","./data/dgowerVsAusInAus.csv",
               "./data/kallisVsAusInAus.csv","./data/ancookVsWIInWI.csv")
names <- list("S Tendulkar","Viv Richards","David Gower","J Kallis","AN Cook")

Alastair Cook of England has fantastic cumulative average of 55+ on the pitches of Australia. There is a dip towards the end, but we cannot predict whether it would have continued. AN Cook is followed by Tendulkar who has a steady average of 50+ runs, after which there is Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to cumulative or relative strike rate Viv Richards is a class apart.He seems to really
#tear into bowlers. David Gower has an excellent strike rate and is followed by Tendulkar
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

4. Relative performances of international batsmen against India in India

While England & Australia are famous for bouncy tracks with swing, Indian pitches are renowed for being extraordinary turners. Also India has always thrown up world class spinners, from the spin quartet of BS Chandraskehar, Bishen Singh Bedi, EAS Prasanna, S Venkatraghavan, to the times of dangerous Anil Kumble, and now to the more recent Ravichander Ashwon and Harbhajan Singh.

A batsmen who can score runs in India against Indian spinners has to be really adept in handling all kinds of spin.

While Clive Lloyd & Alvin Kallicharan had the best performance against India, they have not been included as ESPN Cricinfo had many of the columns missing.

So I chose the following international players for the analysis against India

  1. Hashim Amla (South Africa)
  2. Alastair Cook (England)
  3. Matthew Hayden (Australia)
  4. Viv Richards (West Indies)
frames <- list("./data/amlaVsIndInInd.csv","./data/ancookVsIndInInd.csv","./data/mhaydenVsIndInInd.csv",
               "./data/vrichardsVsIndInInd.csv")
names <- list("H Amla","AN Cook","M Hayden","Viv Riachards")

Excluding Clive Lloyd & Alvin Kallicharan the next best performer against India is Hashim Amla,followed by Alastair Cook, Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to strike rate, there is no contest when Viv Richards is around. He is clearly the best 
#striker of the ball regardless of whether it is the pacy wickets of 
#Australia/England or the spinning tracks of the subcontinent. After 
#Viv Richards, Hayden and Alastair Cook have good cumulative strike rates
#in India
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

5. All time greats of Indian batting

I couldn’t resist checking out how the top Indian batsmen perform when playing in host countries So here is a look at how the top Indian batsmen perform against different host countries

6. Top Indian batsmen against Australia in Australia

The following Indian batsmen were chosen

  1. Sunil Gavaskar
  2. Sachin Tendulkar
  3. Virat Kohli
  4. Virendar Sehwag
  5. VVS Laxman
frames <- list("./data/tendulkarVsAusInAus.csv","./data/gavaskarVsAusInAus.csv","./data/kohliVsAusInAus.csv",
               "./data/sehwagVsAusInAus.csv","./data/vvslaxmanVsAusInAus.csv")
names <- list("S Tendulkar","S Gavaskar","V Kohli","V Sehwag","VVS Laxman")

Virat Kohli has the best overall performance against Australia, with a current cumulative average of 60+ runs for the total number of innings played by him (15). With 15 matches the 2nd best is Virendar Sehwag, followed by VVS Laxman. Tendulkar maintains a cumulative average of 48+ runs for an excess of 30+ innings.

relativeBatsmanCumulativeAvgRuns(frames,names)

# Sehwag leads the strike rate against host Australia, followed by 
# Tendulkar in Australia and then Kohli
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

7. Top Indian batsmen against England in England

The top Indian batmen’s performances against England are shown below

  1. Rahul Dravid
  2. Dilip Vengsarkar
  3. Rahul Dravid
  4. Sourav Ganguly
  5. Virat Kohli
frames <- list("./data/tendulkarVsEngInEng.csv","./data/dravidVsEngInEng.csv","./data/vengsarkarVsEngInEng.csv",
               "./data/gangulyVsEngInEng.csv","./data/gavaskarVsEngInEng.csv","./data/kohliVsEngInEng.csv")
names <- list("S Tendulkar","R Dravid","D Vengsarkar","S Ganguly","S Gavaskar","V Kohli")

Rahul Dravid has the best performance against England and edges out Tendulkar. He is followed by Tendulkar and then Sourav Ganguly. Note:Incidentally Virat Kohli’s performance against England in England so far has been extremely poor and he averages around 13-15 runs per innings. However he has a long way to go and I hope he catches up. In any case it will be an uphill climb for Kohli in England.

relativeBatsmanCumulativeAvgRuns(frames,names)

#Tendulkar, Ganguly and Dravid have the best strike rate and in that order.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

8. Top Indian batsmen against West Indies in West Indies

frames <- list("./data/tendulkarVsWInWI.csv","./data/dravidVsWInWI.csv","./data/vvslaxmanVsWIInWI.csv",
               "./data/gavaskarVsWIInWI.csv")
names <- list("S Tendulkar","R Dravid","VVS Laxman","S Gavaskar")

Against the West Indies Sunil Gavaskar is heads and shoulders above the rest. Gavaskar has a very impressive cumulative average against West Indies

relativeBatsmanCumulativeAvgRuns(frames,names)

# VVS Laxman followed by  Tendulkar & then Dravid have a very 
# good strike rate against the West Indies
relativeBatsmanCumulativeStrikeRate(frames,names)

9. World’s best spinners on tracks suited for pace & bounce

In this part I compare the performances of the top 3 spinners in recent years and check out how they perform on surfaces that are known for pace, and bounce. I have taken the following 3 spinners

  1. Anil Kumble (India)
  2. M Muralitharan (Sri Lanka)
  3. Shane Warne (Australia)
#kumbleEng=getPlayerData(30176  ,opposition=3,host=3,file="kumbleVsEngInEng.csv",type="bowling")
#muraliEng=getPlayerData(49636  ,opposition=3,host=3,file="muraliVsEngInEng.csv",type="bowling")
#warneEng=getPlayerData(8166  ,opposition=3,host=3,file="warneVsEngInEng.csv",type="bowling")

10. Top international spinners against England in England

frames <- list("./data/kumbleVsEngInEng.csv","./data/muraliVsEngInEng.csv","./data/warneVsEngInEng.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")

Against England and in England, Muralitharan shines with a cumulative average of nearly 5 wickets per match with a peak of almost 8 wickets. Shane Warne has a steady average at 5 wickets and then Anil Kumble.

relativeBowlerCumulativeAvgWickets(frames,names)

# The order relative cumulative Economy rate, Warne has the best figures,followed by Anil Kumble. Muralitharan
# is much more expensive.
relativeBowlerCumulativeAvgEconRate(frames,names)

11. Top international spinners against South Africa in South Africa

frames <- list("./data/kumbleVsSAInSA.csv","./data/muraliVsSAInSA.csv","./data/warneVsSAInSA.csv")
names <- list("Anil Kumble","M Muralitharan","Shane Warne")

In South Africa too, Muralitharan has the best wicket taking performance averaging about 4 wickets. Warne averages around 3 wickets and Kumble around 2 wickets

relativeBowlerCumulativeAvgWickets(frames,names)

# Muralitharan is expensive in South Africa too, while Kumble and Warne go neck-to-neck in the economy rate.
# Kumble edges out Warne and has a better cumulative average economy rate
relativeBowlerCumulativeAvgEconRate(frames,names)

11. Top international pacers against India in India

As a final analysis I check how the world’s pacers perform in India against India. India pitches are supposed to be flat devoid of bounce, while being terrific turners. Hence Indian pitches are more suited to spin bowling than pace bowling. This is changing these days.

The best performers against India in India are mostly the deadly pacemen of yesteryears

For this I have chosen the following bowlers

  1. Courtney Walsh (West Indies)
  2. Andy Roberts (West Indies)
  3. Malcolm Marshall
  4. Glenn McGrath
#cawalshInd=getPlayerData(53216  ,opposition=6,host=6,file="cawalshVsIndInInd.csv",type="bowling")
#arobertsInd=getPlayerData(52817  ,opposition=6,host=6,file="arobertsIndInInd.csv",type="bowling")
#mmarshallInd=getPlayerData(52419  ,opposition=6,host=6,file="mmarshallVsIndInInd.csv",type="bowling")
#gmccgrathInd=getPlayerData(6565  ,opposition=6,host=6,file="mccgrathVsIndInInd.csv",type="bowling")
frames <- list("./data/cawalshVsIndInInd.csv","./data/arobertsIndInInd.csv","./data/mmarshallVsIndInInd.csv",
               "./data/mccgrathVsIndInInd.csv")
names <- list("C Walsh","A Roberts","M Marshall","G McGrath")

Courtney Walsh has the best performance, followed by Andy Roberts followed by Andy Roberts and then Malcom Marshall who tips ahead of Glenn McGrath

relativeBowlerCumulativeAvgWickets(frames,names)

#On the other hand McGrath has the best economy rate, followed by A Roberts and then Courtney Walsh
relativeBowlerCumulativeAvgEconRate(frames,names)

12. ODI performance of a player against a specific country in the host country

This gets the data for MS Dhoni in ODI matches against Australia and in Australia

#dhoniAusODI=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusODI.csv",type="batting")

13. Twenty 20 performance of a player against a specific country in the host country

#dhoniAusTT=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusTT.csv",type="batting")

All the ODI and Twenty20 functions of cricketr can be used on the above dataframes of MS Dhoni.

Some key observations

Here are some key observations

  1. At the top of the batting spectrum is Don Bradman with a very impressive average 100-120 in matches played in England and Australia. Unfortunately there weren’t matches he played in other countries and different pitches. 2.Viv Richard has the best cumulative strike rate overall.
  2. Muralitharan strikes more often than Kumble or Warne even in pitches at ENgland, South Africa and West Indies. However Muralitharan is also the most expensive
  3. Warne and Kumble have a much better economy rate than Muralitharan.
  4. Sunil Gavaskar has an extremely impressive performance in West Indies.
  5. Rahul Dravid performs much better than Tendulkar in both England and West Indies.
  6. Virat Kohli has the best performance against Australia so far and hope he maintains his stellar performance followed by Sehwag. However Kohli’s performance in England has been very poor
  7. West Indies batsmen and bowlers seem to thrive on Indian pitches, with Clive Lloyd and Alvin Kalicharan at the top of the list.

You may like my Shiny apps on cricket

  1. Inswinger- Analyzing International. T20s
  2. GooglyPlus – An app for analyzing IPL
  3. Sixer – App based on R package cricketr

Also see

  1. Exploring Quantum Gate operations with QCSimulator
  2. Neural Networks: The mechanics of backpropagation
  3. Re-introducing cricketr! : An R package to analyze performances of cricketers
  4. yorkr crashes the IPL party ! – Part 1
  5. cricketr and yorkr books – Paperback now in Amazon
  6.  Hand detection through Haartraining: A hands-on approach

To see all my posts see Index of posts

Analysis of IPL T20 matches with yorkr templates


Introduction

In this post I create RMarkdown templates for end-to-end analysis of IPL T20 matches, that are available on Cricsheet based on my R package yorkr.  With these templates you can convert all IPL data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing IPL matches, teams and players. All of these can be accessed at yorkrIPLTemplate.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

9/Rs 320 and $6.99/Rs448 respectively

The templates are

  1. Template for conversion and setup – IPLT20Template.Rmd
  2. Any IPL match – IPLMatchtemplate.Rmd
  3. IPL matches between 2 nations – IPLMatches2TeamTemplate.Rmd
  4. A IPL nations performance against all other IPL nations – IPLAllMatchesAllOppnTemplate.Rmd
  5. Analysis of IPL batsmen and bowlers of all IPL nations – IPLBatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all IPL matches I downloaded from Cricsheet in Dec 2016. So this data is complete till the 2016 IPL season. You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future!

See my earlier posts where I analyze IPL T20
1. yorkr crashes the IPL party ! – Part 1
2. yorkr crashes the IPL party! – Part 2
3. yorkr crashes the IPL party! – Part 3!
4. yorkr crashes the IPL party! – Part 4

There will be 5 folders at the root

  1. IPLdata – Match files as yaml from Cricsheet
  2. IPLMatches – Yaml match files converted to dataframes
  3. IPLMatchesBetween2Teams – All Matches between any 2 IPL teams
  4. allMatchesAllOpposition – An IPL teams’s performance against all other teams
  5. BattingBowlingDetails – Batting and bowling details of all IPL teams
library(yorkr)
library(dplyr)

The first few steps take care of the data setup. This needs to be done before any of the analysis of IPL batsmen, bowlers, any IPL match, matches between any 2 IPL countries or analysis of a teams performance against all other countries

There will be 5 folders at the root

  1. data
  2. IPLMatches
  3. IPLMatchesBetween2Teams
  4. allMatchesAllOpposition
  5. BattingBowlingDetails

The source YAML files will be in IPLData folder

1.Create directory of IPLMatches

Some files may give conversions errors. You could try to debug the problem or just remove it from the IPLdata folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted.

Also take a look at my GooglyPlus shiny app which was created after performing the same conversion on the Dec 16 data .

convertAllYaml2RDataframesT20("data","IPLMatches")

2.Save all matches between all combinations of IPL nations

This function will create the set of all matches between each IPL team against every other IPL team. This uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("./IPLMatchesBetween2Teams")
saveAllMatchesBetween2IPLTeams("../IPLMatches")

3.Save all matches against all opposition

This will create a consolidated dataframe of all matches played by every IPL playing nation against all other nattions. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../allMatchesAllOpposition")
saveAllMatchesAllOppositionIPLT20("../IPLMatches")

4. Create batting and bowling details for each IPL team

These are the current IPL playing teams. You can add to this vector as newer IPL teams start playing IPL. You will get to know all IPL teams by also look at the directory created above namely allMatchesAllOpposition. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../BattingBowlingDetails")
ipl_teams <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings XI Punjab", 
              "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
              "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                 "Rising Pune Supergiants")

for(i in seq_along(ipl_teams)){
    print(ipl_teams[i])
    val <- paste(ipl_teams[i],"-details",sep="")
    val <- getTeamBattingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

for(i in seq_along(ipl_teams)){
    print(ipl_teams[i])
    val <- paste(ipl_teams[i],"-details",sep="")
    val <- getTeamBowlingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

5. Get the list of batsmen for a particular IPL team

The following code is needed for analyzing individual IPL batsmen. In IPL a player could have played in multiple IPL teams.

getBatsmen <- function(df){
    bmen <- df %>% distinct(batsman) 
    bmen <- as.character(bmen$batsman)
    batsmen <- sort(bmen)
}
load("Chennai Super Kings-BattingDetails.RData")
csk_details <- battingDetails
load("Deccan Chargers-BattingDetails.RData")
dc_details <- battingDetails
load("Delhi Daredevils-BattingDetails.RData")
dd_details <- battingDetails
load("Kings XI Punjab-BattingDetails.RData")
kxip_details <- battingDetails
load("Kochi Tuskers Kerala-BattingDetails.RData")
ktk_details <- battingDetails
load("Kolkata Knight Riders-BattingDetails.RData")
kkr_details <- battingDetails
load("Mumbai Indians-BattingDetails.RData")
mi_details <- battingDetails
load("Pune Warriors-BattingDetails.RData")
pw_details <- battingDetails
load("Rajasthan Royals-BattingDetails.RData")
rr_details <- battingDetails
load("Royal Challengers Bangalore-BattingDetails.RData")
rcb_details <- battingDetails
load("Sunrisers Hyderabad-BattingDetails.RData")
sh_details <- battingDetails
load("Gujarat Lions-BattingDetails.RData")
gl_details <- battingDetails
load("Rising Pune Supergiants-BattingDetails.RData")
rps_details <- battingDetails

#Get the batsmen for each IPL team
csk_batsmen <- getBatsmen(csk_details)
dc_batsmen <- getBatsmen(dc_details)
dd_batsmen <- getBatsmen(dd_details)
kxip_batsmen <- getBatsmen(kxip_details)
ktk_batsmen <- getBatsmen(ktk_details)
kkr_batsmen <- getBatsmen(kkr_details)
mi_batsmen <- getBatsmen(mi_details)
pw_batsmen <- getBatsmen(pw_details)
rr_batsmen <- getBatsmen(rr_details)
rcb_batsmen <- getBatsmen(rcb_details)
sh_batsmen <- getBatsmen(sh_details)
gl_batsmen <- getBatsmen(gl_details)
rps_batsmen <- getBatsmen(rps_details)

# Save the dataframes
save(csk_batsmen,file="csk.RData")
save(dc_batsmen, file="dc.RData")
save(dd_batsmen, file="dd.RData")
save(kxip_batsmen, file="kxip.RData")
save(ktk_batsmen, file="ktk.RData")
save(kkr_batsmen, file="kkr.RData")
save(mi_batsmen , file="mi.RData")
save(pw_batsmen, file="pw.RData")
save(rr_batsmen, file="rr.RData")
save(rcb_batsmen, file="rcb.RData")
save(sh_batsmen, file="sh.RData")
save(gl_batsmen, file="gl.RData")
save(rps_batsmen, file="rps.RData")

6. Get the list of bowlers for a particular IPL team

The method below can get the list of bowler names for any IPL team.The following code is needed for analyzing individual IPL bowlers. In IPL a player could have played in multiple IPL teams.

getBowlers <- function(df){
    bwlr <- df %>% distinct(bowler) 
    bwlr <- as.character(bwlr$bowler)
    bowler <- sort(bwlr)
}

load("Chennai Super Kings-BowlingDetails.RData")
csk_details <- bowlingDetails
load("Deccan Chargers-BowlingDetails.RData")
dc_details <- bowlingDetails
load("Delhi Daredevils-BowlingDetails.RData")
dd_details <- bowlingDetails
load("Kings XI Punjab-BowlingDetails.RData")
kxip_details <- bowlingDetails
load("Kochi Tuskers Kerala-BowlingDetails.RData")
ktk_details <- bowlingDetails
load("Kolkata Knight Riders-BowlingDetails.RData")
kkr_details <- bowlingDetails
load("Mumbai Indians-BowlingDetails.RData")
mi_details <- bowlingDetails
load("Pune Warriors-BowlingDetails.RData")
pw_details <- bowlingDetails
load("Rajasthan Royals-BowlingDetails.RData")
rr_details <- bowlingDetails
load("Royal Challengers Bangalore-BowlingDetails.RData")
rcb_details <- bowlingDetails
load("Sunrisers Hyderabad-BowlingDetails.RData")
sh_details <- bowlingDetails
load("Gujarat Lions-BowlingDetails.RData")
gl_details <- bowlingDetails
load("Rising Pune Supergiants-BowlingDetails.RData")
rps_details <- bowlingDetails

# Get the bowlers for each team
csk_bowlers <- getBowlers(csk_details)
dc_bowlers <- getBowlers(dc_details)
dd_bowlers <- getBowlers(dd_details)
kxip_bowlers <- getBowlers(kxip_details)
ktk_bowlers <- getBowlers(ktk_details)
kkr_bowlers <- getBowlers(kkr_details)
mi_bowlers <- getBowlers(mi_details)
pw_bowlers <- getBowlers(pw_details)
rr_bowlers <- getBowlers(rr_details)
rcb_bowlers <- getBowlers(rcb_details)
sh_bowlers <- getBowlers(sh_details)
gl_bowlers <- getBowlers(gl_details)
rps_bowlers <- getBowlers(rps_details)

#Save the dataframes
save(csk_bowlers,file="csk1.RData")
save(dc_bowlers, file="dc1.RData")
save(dd_bowlers, file="dd1.RData")
save(kxip_bowlers, file="kxip1.RData")
save(ktk_bowlers, file="ktk1.RData")
save(kkr_bowlers, file="kkr1.RData")
save(mi_bowlers , file="mi1.RData")
save(pw_bowlers, file="pw1.RData")
save(rr_bowlers, file="rr1.RData")
save(rcb_bowlers, file="rcb1.RData")
save(sh_bowlers, file="sh1.RData")
save(gl_bowlers, file="gl1.RData")
save(rps_bowlers, file="rps1.RData")

Now we are all set

A)  IPL T20 Match Analysis

1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
load("IPLTeam1-IPLTeam2-Date.Rdata")
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

B)  IPL  Matches between 2  IPL teams

1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
load("IPLTeam1-IPLTeam2-Date.Rdata")
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

C)  IPL Matches for a team against all other teams

1. IPL Matches for a team against all other teams

Load the data between for a IPL team against all other countries ./allMatchesAllOpposition for e.g all matches of Kolkata Knight Riders

load("allMatchesAllOpposition-Kolkata Knight Riders.RData")
kkr_matches <- matches
IPLTeam="IPLTeam1"
allMatches <- paste("allMatchesAllOposition-",IPLTeam,".RData",sep="")
load(allMatches)
IPLTeam1AllMatches <- matches

2. Team’s batting scorecard all Matches

m <-teamBattingScorecardAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m

3. Batting scorecard of opposing team

m <-teamBattingScorecardAllOppnAllMatches(matches=IPLTeam1AllMatches,theTeam="IPLTeam2")

4. Team batting partnerships

m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="detailed")
head(m,30)
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="summary")
m

5. Team batting partnerships plot

teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam1")
teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam2")

6, Team batsmen vs bowlers report

m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=0)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=30)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam2",rank=1,dispRows=25)
m

7. Team batsmen vs bowler plot

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=50)
d
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

8. Team bowling scorecard

teamBowlingScorecardAllOppnAllMatchesMain(matches=IPLTeam1AllMatches,theTeam="IPLTeam1")
teamBowlingScorecardAllOppnAllMatches(IPLTeam1AllMatches,'IPLTeam2')

9. Team bowler vs batsmen

teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=2)
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)

10. Team Bowler vs bastmen

df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"IPLTeam1","IPLTeam1")

11. Team bowler wicket kind

teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All")
teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2")

12.

teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All",plot=TRUE)
teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2",plot=TRUE)

1 IPL Batsman setup functions

Get the batsman’s details for a batsman

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", 
                  "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
                  "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                  "Rising Pune Supergiants")           


# Check and get the team indices of IPL teams in which the batsman has played
getTeamIndex <- function(batsman){
    setwd("./BattingBowlingDetails")
    load("csk.RData")
    load("dc.RData")
    load("dd.RData")
    load("kxip.RData")
    load("ktk.RData")
    load("kkr.RData")
    load("mi.RData")
    load("pw.RData")
    load("rr.RData")
    load("rcb.RData")
    load("sh.RData")
    load("gl.RData")
    load("rps.RData")
    setwd("..")
    getwd()
    print(ls())
    teams_batsmen = list(csk_batsmen,dc_batsmen,dd_batsmen,kxip_batsmen,ktk_batsmen,kkr_batsmen,mi_batsmen,
                         pw_batsmen,rr_batsmen,rcb_batsmen,sh_batsmen,gl_batsmen,rps_batsmen)
    b <- NULL
    for (i in 1:length(teams_batsmen)){
        a <- which(teams_batsmen[[i]] == batsman)

        if(length(a) != 0)
            b <- c(b,i)
    }
    b
}

# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

    l <- NULL
    # Get the teams passed in as indexes
    for (i in seq_along(x)){

        l <- c(l, IPLTeamNames[[x[i]]]) 

    }
    l
}

# Create a consolidated data frame with all teams the IPL batsman has played for
getIPLBatsmanDF <- function(teamNames){
    batsmanDF <- NULL
   # Create a consolidated Data frame of batsman for all IPL teams played
    for (i in seq_along(teamNames)){
       df <- getBatsmanDetails(team=teamNames[i],name=IPLBatsman,dir="./BattingBowlingDetails")
       batsmanDF <- rbind(batsmanDF,df) 

    }
    batsmanDF
}

2. Create a consolidated IPL batsman data frame

# Since an IPL batsman coculd have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check MS Dhoni we need to do the following

IPLBatsman = "MS Dhoni"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBatsman)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
    # Check if file exists in the directory. This check is necessary when moving between matchType


############## Create a consolidated IPL batsman dataframe for analysis
batsmanDF <- getIPLBatsmanDF(teamNames)

3. Runs vs deliveries

# For e.g. batsmanName="MS Dhoni""
#batsmanRunsVsDeliveries(batsmanDF, "MS Dhoni")
batsmanRunsVsDeliveries(batsmanDF,"batsmanName")

4. Batsman 4s & 6s

batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(batsman46,"batsmanName")

5. Batsman dismissals

batsmanDismissals(batsmanDF,"batsmanName")

6. Runs vs Strike rate

batsmanRunsVsStrikeRate(batsmanDF,"batsmanName")

7. Batsman Moving Average

batsmanMovingAverage(batsmanDF,"batsmanName")

8. Batsman cumulative average

batsmanCumulativeAverageRuns(batsmanDF,"batsmanName")

9. Batsman cumulative strike rate

batsmanCumulativeStrikeRate(batsmanDF,"batsmanName")

10. Batsman runs against oppositions

batsmanRunsAgainstOpposition(batsmanDF,"batsmanName")

11. Batsman runs vs venue

batsmanRunsVenue(batsmanDF,"batsmanName")

12. Batsman runs predict

batsmanRunsPredict(batsmanDF,"batsmanName")

13.Bowler set up functions

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab", 
                  "Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
                  "Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
                  "Rising Pune Supergiants")    



# Get the team indices of IPL teams for which the bowler as played
getTeamIndex_bowler <- function(bowler){
    # Load IPL Bowlers
    setwd("./data")
    load("csk1.RData")
    load("dc1.RData")
    load("dd1.RData")
    load("kxip1.RData")
    load("ktk1.RData")
    load("kkr1.RData")
    load("mi1.RData")
    load("pw1.RData")
    load("rr1.RData")
    load("rcb1.RData")
    load("sh1.RData")
    load("gl1.RData")
    load("rps1.RData")
    setwd("..")
    teams_bowlers = list(csk_bowlers,dc_bowlers,dd_bowlers,kxip_bowlers,ktk_bowlers,kkr_bowlers,mi_bowlers,
                         pw_bowlers,rr_bowlers,rcb_bowlers,sh_bowlers,gl_bowlers,rps_bowlers)
    b <- NULL
    for (i in 1:length(teams_bowlers)){
        a <- which(teams_bowlers[[i]] == bowler)
        if(length(a) != 0){
            b <- c(b,i)
        }
    }
    b
}


# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

    l <- NULL
    # Get the teams passed in as indexes
    for (i in seq_along(x)){

        l <- c(l, IPLTeamNames[[x[i]]]) 

    }
    l
}

# Get the team names
teamNames <- getTeams(i)

getIPLBowlerDF <- function(teamNames){
    bowlerDF <- NULL

    # Create a consolidated Data frame of batsman for all IPL teams played
    for (i in seq_along(teamNames)){
          df <- getBowlerWicketDetails(team=teamNames[i],name=IPLBowler,dir="./BattingBowlingDetails")
          bowlerDF <- rbind(bowlerDF,df) 

    }
    bowlerDF
}

14. Get the consolidated data frame for an IPL bowler

# Since an IPL bowler could have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check R Ashwin we need to do the following

IPLBowler = "R Ashwin"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBowler)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
    # Check if file exists in the directory. This check is necessary when moving between matchType


############## Create a consolidated IPL batsman dataframe for analysis
bowlerDF <- getIPLBowlerDF(teamNames)

15. Bowler Mean Economy rate

# For e.g. to get the details of R Ashwin do
#bowlerMeanEconomyRate(bowlerDF,"R Ashwin")
bowlerMeanEconomyRate(bowlerDF,"bowlerName")

16. Bowler mean runs conceded

bowlerMeanRunsConceded(bowlerDF,"bowlerName")

17. Bowler Moving Average

bowlerMovingAverage(bowlerDF,"bowlerName")

18. Bowler cumulative average wickets

bowlerCumulativeAvgWickets(bowlerDF,"bowlerName")

19. Bowler cumulative Economy Rate (ER)

bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName")

20. Bowler wicket plot

bowlerWicketPlot(bowlerDF,"bowlerName")

21. Bowler wicket against opposition

bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName")

22. Bowler wicket at cricket grounds

bowlerWicketsVenue(bowlerDF,"bowlerName")

23. Predict number of deliveries to wickets

setwd("./IPLMatches")
bowlerDF1 <- getDeliveryWickets(team="IPLTeam1",dir=".",name="bowlerName",save=FALSE)
bowlerWktsPredict(bowlerDF1,"bowlerName")

You may like
1. Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket
2. Neural Networks: The mechanics of backpropagation
3. More book, more cricket! 2nd edition of my books now on Amazon
4. Spicing up a IBM Bluemix cloud app with MongoDB and NodeExpress
5. Introducing cricket package yorkr:Part 4-In the block hole!

Analysis of International T20 matches with yorkr templates


Introduction

In this post I create yorkr templates for International T20 matches that are available on Cricsheet. With these templates you can convert all T20 data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing. All of these templates can be accessed from Github at yorkrT20Template. The templates are

  1. Template for conversion and setup – T20Template.Rmd
  2. Any T20 match – T20Matchtemplate.Rmd
  3. T20 matches between 2 nations – T20Matches2TeamTemplate.Rmd
  4. A T20 nations performance against all other T20 nations – T20AllMatchesAllOppnTemplate.Rmd
  5. Analysis of T20 batsmen and bowlers of all T20 nations – T20BatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all T20 matches I downloaded from Cricsheet in Dec 2016, You can recreate the files as more matches are added to Cricsheet site. This post contains all the steps needed for T20 analysis, as more matches are played around the World and more data is added to Cricsheet. This will also be my reference in future if I decide to analyze T20 in future!

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

Feel free to download/clone these templates  from Github yorkrT20Template and perform your own analysis

There will be 5 folders at the root

  1. T20data – Match files as yaml from Cricsheet
  2. T20Matches – Yaml match files converted to dataframes
  3. T20MatchesBetween2Teams – All Matches between any 2 T20 teams
  4. allMatchesAllOpposition – A T20 countries match data against all other teams
  5. BattingBowlingDetails – Batting and bowling details of all countries
library(yorkr)
library(dplyr)

The first few steps take care of the data setup. This needs to be done before any of the analysis of T20 batsmen, bowlers, any T20 match, matches between any 2 T20 countries or analysis of a teams performance against all other countries

There will be 5 folders at the root

  1. T20data
  2. T20Matches
  3. T20MatchesBetween2Teams
  4. allMatchesAllOpposition
  5. BattingBowlingDetails

The source YAML files will be in T20Data folder

1.Create directory T20Matches

Some files may give conversions errors. You could try to debug the problem or just remove it from the T20data folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted.

Also take a look at my Inswinger shiny app which was created after performing the same conversion on the Dec 16 data .

convertAllYaml2RDataframesT20("T20Data","T20Matches")

2.Save all matches between all combinations of T20 nations

This function will create the set of all matches between every T20 country against every other T20 country. This uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("./T20MatchesBetween2Teams")
saveAllMatchesBetweenTeams("../T20Matches")

3.Save all matches against all opposition

This will create a consolidated dataframe of all matches played by every T20 playing nation against all other nattions. This also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("../allMatchesAllOpposition")
saveAllMatchesAllOpposition("../T20Matches")

4. Create batting and bowling details for each T20 country

These are the current T20 playing nations. You can add to this vector as more countries start playing T20. You will get to know all T20 nations by also look at the directory created above namely allMatchesAllOpposition. his also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function.

setwd("../BattingBowlingDetails")
teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka',
          "England", "Bangladesh","Netherlands","Scotland", "Afghanistan",
          "Zimbabwe","Ireland","New Zealand","South Africa","Canada",
          "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea",
          "United Arab Emirates")

for(i in seq_along(teams)){
    print(teams[i])
    val <- paste(teams[i],"-details",sep="")
    val <- getTeamBattingDetails(teams[i],dir="../T20Matches", save=TRUE)

}

for(i in seq_along(teams)){
    print(teams[i])
    val <- paste(teams[i],"-details",sep="")
    val <- getTeamBowlingDetails(teams[i],dir="../T20Matches", save=TRUE)

}

5. Get the list of batsmen for a particular country

For e.g. if you wanted to get the batsmen of Canada you would do the following. By replacing Canada for any other country you can get the batsmen of that country. These batsmen names can then be used in the batsmen analysis

country="Canada"
teamData <- paste(country,"-BattingDetails.RData",sep="")
load(teamData)
countryDF <- battingDetails
bmen <- countryDF %>% distinct(batsman) 
bmen <- as.character(bmen$batsman)
batsmen <- sort(bmen)
batsmen

6. Get the list of bowlers for a particular country

The method below can get the list of bowler names for any T20 nation. These names can then be used in the bowler analysis below

country="Netherlands"
teamData <- paste(country,"-BowlingDetails.RData",sep="")
load(teamData)
countryDF <- bowlingDetails
bwlr <- countryDF %>% distinct(bowler) 
bwlr <- as.character(bwlr$bowler)
bowler <- sort(bwlr)
bowler

Now we are all set

A)  International T20 Match Analysis

Load any match data from the ./T20Matches folder for e.g. Afganistan-England-2016-03-23.RData

setwd("./T20Matches")
load("Afghanistan-England-2016-03-23.RData")
afg_eng<- overs
#The steps are
load("Country1-Country2-Date.Rdata")
country1_country2 <- overs

All analysis for this match can be done now

2. Scorecard

teamBattingScorecardMatch(country1_country2,"Country1")
teamBattingScorecardMatch(country1_country2,"Country2")

3.Batting Partnerships

teamBatsmenPartnershipMatch(country1_country2,"Country1","Country2")
teamBatsmenPartnershipMatch(country1_country2,"Country2","Country1")

4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=TRUE)
teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=FALSE)

5. Team bowling scorecard

teamBowlingScorecardMatch(country1_country2,"Country1")
teamBowlingScorecardMatch(country1_country2,"Country2")

6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(country1_country2,"Country1","Country2")
m <-teamBowlingWicketKindMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2")
m <-teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(country1_country2,"Country1","Country2",plot=FALSE)
m
teamBowlingWicketMatch(country1_country2,"Country1","Country2")

9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2")
m <- teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2",plot=FALSE)
m

10. Match Worm chart

matchWormGraph(country1_country2,"Country1","Country2")

B)  International T20 Matches between 2 teams

Load match data between any 2 teams from ./T20MatchesBetween2Teams for e.g.Australia-India-allMatches

setwd("./T20MatchesBetween2Teams")
load("Australia-India-allMatches.RData")
aus_ind_matches <- matches
#Replace below with your own countries
country1<-"England"
country2 <- "South Africa"
country1VsCountry2 <- paste(country1,"-",country2,"-allMatches.RData",sep="")
load(country1VsCountry2)
country1_country2_matches <- matches

2.Batsmen partnerships

m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="summary")
m
m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country2",report="summary")
m
m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="detailed")
m
teamBatsmenPartnershipOppnAllMatchesChart(country1_country2_matches,"country1","country2")

3. Team batsmen vs bowlers

teamBatsmenVsBowlersOppnAllMatches(country1_country2_matches,"country1","country2")

4. Bowling scorecard

a <-teamBattingScorecardOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")
a

5. Team bowling performance

teamBowlingPerfOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")

6. Team bowler wickets

teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2")
m <-teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2",plot=FALSE)
teamBowlersWicketsOppnAllMatches(country1_country2_matches,"country1","country2",top=3)
m

7. Team bowler vs batsmen

teamBowlersVsBatsmenOppnAllMatches(country1_country2_matches,"country1","country2",top=5)

8. Team bowler wicket kind

teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=TRUE)
m <- teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=FALSE)
m[1:30,]

9. Team bowler wicket runs

teamBowlersWicketRunsOppnAllMatches(country1_country2_matches,"country1","country2")

10. Plot wins and losses

setwd("./T20Matches")
plotWinLossBetweenTeams("country1","country2")

C)  International T20 Matches for a team against all other teams

Load the data between for a T20 team against all other countries ./allMatchesAllOpposition for e.g all matches of India

load("allMatchesAllOpposition-India.RData")
india_matches <- matches
country="country1"
allMatches <- paste("allMatchesAllOposition-",country,".RData",sep="")
load(allMatches)
country1AllMatches <- matches

2. Team’s batting scorecard all Matches

m <-teamBattingScorecardAllOppnAllMatches(country1AllMatches,theTeam="country1")
m

3. Batting scorecard of opposing team

m <-teamBattingScorecardAllOppnAllMatches(matches=country1AllMatches,theTeam="country2")

4. Team batting partnerships

m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam="country1")
m
m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="detailed")
head(m,30)
m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="summary")
m

5. Team batting partnerships plot

teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country1")
teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country2")

6, Team batsmen vs bowlers report

m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=0)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=30)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country2",rank=1,dispRows=25)
m

7. Team batsmen vs bowler plot

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=50)
d
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

8. Team bowling scorecard

teamBowlingScorecardAllOppnAllMatchesMain(matches=country1AllMatches,theTeam="country1")
teamBowlingScorecardAllOppnAllMatches(country1AllMatches,'country2')

9. Team bowler vs batsmen

teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=2)
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country1",rank=0)

10. Team Bowler vs bastmen

df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(country1AllMatches,theTeam="country1",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"country1","country1")

11. Team bowler wicket kind

teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="All")
teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2")

12.

teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="All",plot=TRUE)
teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2",plot=TRUE)

D) Batsman functions

Get the batsman’s details for a batsman

setwd("../BattingBowlingDetails")
kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".")
batsmanDF <- getBatsmanDetails(team="country1",name="batsmanName",dir=".")

2. Runs vs deliveries

batsmanRunsVsDeliveries(batsmanDF,"batsmanName")

3. Batsman 4s & 6s

batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(batsman46,"batsmanName")

4. Batsman dismissals

batsmanDismissals(batsmanDF,"batsmanName")

5. Runs vs Strike rate

batsmanRunsVsStrikeRate(batsmanDF,"batsmanName")

6. Batsman Moving Average

batsmanMovingAverage(batsmanDF,"batsmanName")

7. Batsman cumulative average

batsmanCumulativeAverageRuns(batsmanDF,"batsmanName")

8. Batsman cumulative strike rate

batsmanCumulativeStrikeRate(batsmanDF,"batsmanName")

9. Batsman runs against oppositions

batsmanRunsAgainstOpposition(batsmanDF,"batsmanName")

10. Batsman runs vs venue

batsmanRunsVenue(batsmanDF,"batsmanName")

11. Batsman runs predict

batsmanRunsPredict(batsmanDF,"batsmanName")

12. Bowler functions

For example to get Ravicahnder Ashwin’s bowling details

setwd("../BattingBowlingDetails")
ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".")
bowlerDF <- getBatsmanDetails(team="country1",name="bowlerName",dir=".")

13. Bowler Mean Economy rate

bowlerMeanEconomyRate(bowlerDF,"bowlerName")

14. Bowler mean runs conceded

bowlerMeanRunsConceded(bowlerDF,"bowlerName")

15. Bowler Moving Average

bowlerMovingAverage(bowlerDF,"bowlerName")

16. Bowler cumulative average wickets

bowlerCumulativeAvgWickets(bowlerDF,"bowlerName")

17. Bowler cumulative Economy Rate (ER)

bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName")

18. Bowler wicket plot

bowlerWicketPlot(bowlerDF,"bowlerName")

19. Bowler wicket against opposition

bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName")

20. Bowler wicket at cricket grounds

bowlerWicketsVenue(bowlerDF,"bowlerName")

21. Predict number of deliveries to wickets

setwd("./T20Matches")
bowlerDF1 <- getDeliveryWickets(team="country1",dir=".",name="bowlerName",save=FALSE)
bowlerWktsPredict(bowlerDF1,"bowlerName")

cricketr and yorkr books – Paperback now in Amazon


My books
– Cricket Analytics with cricketr
– Beaten by sheer pace!: Cricket analytics with yorkr
are now available on Amazon in both Paperback and Kindle versions

The cricketr and yorkr packages are written in R, and both are available in CRAN. The books contain details on how to use these R packages to analyze performance of cricketers.

cricketr is based on data from ESPN Cricinfo Statsguru, and can analyze Test, ODI and T20 batsmen & bowlers. yorkr is based on data from Cricsheet, and can analyze ODI, T20 and IPL. yorkr can analyze batsmen, bowlers, matches and teams.

Cricket Analytics with cricketr
You can access the paperback at Cricket analytics with cricketr
untitled1

Beaten by sheer pace! Cricket Analytics with yorkr
You can buy the paperback from Amazon at Beaten by sheer pace: Cricket analytics with yorkr
untitled

Order your copy today! Hope you have a great time reading!