“There are two ways to write error-free programs; only the third one works.”” Alan J. Perlis
“Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning. ” Rick Cook
“My software never has bugs. It just develops random features.” Anon
“If you make an ass out of yourself, there will always be someone to ride you.” Bruce Lee
Introduction
This is the 3rd and final post on cricpy, and is a continuation to my 2 earlier posts
1. Introducing cricpy:A python package to analyze performances of cricketers
2.Cricpy takes a swing at the ODIs
Cricpy, is the python avatar of my R package ‘cricketr’. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers
With this post cricpy, like cricketr, now becomes omnipotent, and is now capable of handling Test, ODI and T20 matches.
Cricpy uses the statistics info available in ESPN Cricinfo Statsguru.
You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use
Cricpy can now analyze performances of teams in Test, ODI and T20 cricket see Cricpy adds team analytics to its arsenal!!
This post is also hosted on Rpubs at Int
This post is also hosted on Rpubs at Cricpy takes guard for the Twenty 20s. You can also download the pdf version of this post at cricpy-TT.pdf
You can fork/clone the package at Github cricpy
Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.
If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!
The cricpy package
The data for a particular player in Twenty20s can be obtained with the getPlayerDataTT() function. To do this you will need to go to T20 Batting and T20 Bowling and click the player you are interested in This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence,this can be used to get the data for Virat Kohlias shown below
The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the languages you can look up the package in the other and you will notice the parallel constructs.
You can fork/clone the package at Github cricpy
Note: The charts are self-explanatory and I have not added much of my own interpretation to it. Do look at the plots closely and check out the performances for yourself.
1 Importing cricpy – Python
# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca
2. Invoking functions with Python package cricpy
import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")
3. Getting help from cricpy – Python
import cricpy.analytics as ca
help(ca.getPlayerDataTT)
## Help on function getPlayerDataTT in module cricpy.analytics:
##
## getPlayerDataTT(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
## Get the Twenty20 International player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory~
##
## Description
##
## Get the Twenty20 player data given the profile of the batsman/bowler. The allowed inputs are home,away, neutralboth and won,lost,tied or no result of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##
## Usage
##
## getPlayerDataTT(profile, opposition="",host="",dir = "./data", file = "player001.csv",
## type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
## Arguments
##
## profile
## This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virat Kohli this turns out to be 253802 http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
## opposition
## The numerical value of the opposition country e.g.Australia,India, England etc. The values are Afghanistan:40,Australia:2,Bangladesh:25,England:1,Hong Kong:19,India:6,Ireland:29, New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Note: If no value is entered for opposition then all teams are considered
## host
## The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5, South Africa:3,Sri Lanka:8,United States of America:11,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
## dir
## Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
## file
## Name of the file to store the data into for e.g. kohli.csv. This can be used for subsequent functions. Default="player001.csv"
## type
## type of data required. This can be "batting" or "bowling"
## homeOrAway
## This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
## result
## This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
## Details
##
## More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##
## Value
##
## Returns the player's dataframe
##
## Note
##
## Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##
## Author(s)
##
## Tinniam V Ganesh
##
## References
##
## http://www.espncricinfo.com/ci/content/stats/index.html
## https://gigadom.wordpress.com/
##
## See Also
##
## bowlerWktRateTT getPlayerData
##
## Examples
##
## ## Not run:
## # Only away. Get data only for won and lost innings
## kohli =getPlayerDataTT(253802,dir="../cricketr/data", file="kohli1.csv",
## type="batting")
##
## # Get bowling data and store in file for future
## ashwin = getPlayerDataTT(26421,dir="../cricketr/data",file="ashwin1.csv",
## type="bowling")
##
## kohli =getPlayerDataTT(253802,opposition = 2,host=2,dir="../cricketr/data",
## file="kohli1.csv",type="batting")
The details below will introduce the different functions that are available in cricpy.
4. Get the Twenty20 player data for a player using the function getPlayerDataOD()
Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataTT for all subsequent analyses
import cricpy.analytics as ca
#kohli=ca.getPlayerDataTT(253802,dir=".",file="kohli.csv",type="batting")
#guptill=ca.getPlayerDataTT(226492,dir=".",file="guptill.csv",type="batting")
#shahzad=ca.getPlayerDataTT(419873,dir=".",file="shahzad.csv",type="batting")
#mccullum=ca.getPlayerDataTT(37737,dir=".",file="mccullum.csv",type="batting")
Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test, ODI and Twenty20 records
5 Virat Kohli’s performance – Basic Analyses
The 3 plots below provide the following for Virat Kohli in T20s
- Frequency percentage of runs in each run range over the whole career
- Mean Strike Rate for runs scored in the given range
- A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")
ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")
ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")
6. More analyses
import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")
ca.batsman6s("./kohli.csv","Virat Kohli")
ca.batsmanDismissals("./kohli.csv","Virat Kohli")
ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")
7. 3D scatter plot and prediction plane
The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease
import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")
8. Average runs at different venues
The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.
import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")
9. Average runs against different opposing teams
This plot computes the average runs scored by Kohli against different countries.
import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")
10 . Highest Runs Likelihood
The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means
import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")
11. A look at the Top 4 batsman – Kohli, Guptill, Shahzad and McCullum
The following batsmen have been very prolific in Twenty20 cricket and will be used for the analyses
- Virat Kohli: Runs – 2167, Average:49.25 ,Strike rate-136.11
- MJ Guptill : Runs -2271, Average:34.4 ,Strike rate-132.88
- Mohammed Shahzad :Runs – 1936, Average:31.22 ,Strike rate-134.81
- BB McCullum : Runs – 2140, Average:35.66 ,Strike rate-136.21
The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs
12. Box Histogram Plot
This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency
import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")
ca.batsmanPerfBoxHist("./guptill.csv","M J Guptill")
ca.batsmanPerfBoxHist("./shahzad.csv","M Shahzad")
ca.batsmanPerfBoxHist("./mccullum.csv","BB McCullum")
13 Moving Average of runs in career
Take a look at the Moving Average across the career of the Top 4 Twenty20 batsmen.
import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")
ca.batsmanMovingAverage("./guptill.csv","M J Guptill")
#ca.batsmanMovingAverage("./shahzad.csv","M Shahzad") # Gives error. Check!
ca.batsmanMovingAverage("./mccullum.csv","BB McCullum")
14 Cumulative Average runs of batsman in career
This function provides the cumulative average runs of the batsman over the career.Kohli’s average tops around 45 runs around 43 innings, though there is a dip downwards
import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")
ca.batsmanCumulativeAverageRuns("./guptill.csv","M J Guptill")
ca.batsmanCumulativeAverageRuns("./shahzad.csv","M Shahzad")
ca.batsmanCumulativeAverageRuns("./mccullum.csv","BB McCullum")
15 Cumulative Average strike rate of batsman in career
Kohli, Guptill and McCullum average a strike rate of 125+
import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")
ca.batsmanCumulativeStrikeRate("./guptill.csv","M J Guptill")
ca.batsmanCumulativeStrikeRate("./shahzad.csv","M Shahzad")
ca.batsmanCumulativeStrikeRate("./mccullum.csv","BB McCullum")
16 Relative Batsman Cumulative Average Runs
The plot below compares the Relative cumulative average runs of the batsman. Kohli is way above all the other 3 batsmen. Behind Kohli is McCullum and then Guptill
import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullumn"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)
17. Relative Batsman Strike Rate
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show that Kohli tops the overall strike rate followed by McCullum and then Guptill
import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullum"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)
18. 3D plot of Runs vs Balls Faced and Minutes at Crease
The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted
import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")
ca.battingPerf3d("./guptill.csv","M J Guptill")
ca.battingPerf3d("./shahzad.csv","M Shahzad")
ca.battingPerf3d("./mccullum.csv","BB McCullum")
19. 3D plot of Runs vs Balls Faced and Minutes at Crease
Guptill and McCullum have a large percentage of sixes in comparison to the 4s. Kohli has a relative lower number of 6s
import cricpy.analytics as ca
frames = ["./kohli.csv","./guptill.csv","./shahzad.csv","./mccullum.csv"]
names = ["Kohli","Guptill","Shahzad","McCullum"]
ca.batsman4s6s(frames,names)
20. Predicting Runs given Balls Faced and Minutes at Crease
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
## BF Mins Runs
## 0 10.000000 30.000000 14.753153
## 1 37.857143 70.714286 55.963333
## 2 65.714286 111.428571 97.173513
## 3 93.571429 152.142857 138.383693
## 4 121.428571 192.857143 179.593873
## 5 149.285714 233.571429 220.804053
## 6 177.142857 274.285714 262.014233
## 7 205.000000 315.000000 303.224414
## 8 232.857143 355.714286 344.434594
## 9 260.714286 396.428571 385.644774
## 10 288.571429 437.142857 426.854954
## 11 316.428571 477.857143 468.065134
## 12 344.285714 518.571429 509.275314
## 13 372.142857 559.285714 550.485494
## 14 400.000000 600.000000 591.695674
21 Analysis of Top Bowlers
The following 4 bowlers have had an excellent career and will be used for the analysis
- Shakib Hasan:Wickets: 80, Average = 21.07, Economy Rate – 6.74
- Mohammed Nabi : Wickets: 67, Average = 24.25, Economy Rate – 7.13
- Rashid Khan: Wickets: 64, Average = 12.40, Economy Rate – 6.01
- Imran Tahir : Wickets:62, Average – 14.95, Economy Rate – 6.77
22. Get the bowler’s data
This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line
import cricpy.analytics as ca
#shakib=ca.getPlayerDataTT(56143,dir=".",file="shakib.csv",type="bowling")
#nabi=ca.getPlayerDataOD(25913,dir=".",file="nabi.csv",type="bowling")
#rashid=ca.getPlayerDataOD(793463,dir=".",file="rashid.csv",type="bowling")
#tahir=ca.getPlayerDataOD(40618,dir=".",file="tahir.csv",type="bowling")
23. Wicket Frequency Plot
This plot below plots the frequency of wickets taken for each of the bowlers
import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./shakib.csv","Shakib Al Hasan")
ca.bowlerWktsFreqPercent("./nabi.csv","Mohammad Nabi")
ca.bowlerWktsFreqPercent("./rashid.csv","Rashid Khan")
ca.bowlerWktsFreqPercent("./tahir.csv","Imran Tahir")
24. Wickets Runs plot
The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken.
import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./shakib.csv","Shakib Al Hasan")
ca.bowlerWktsRunsPlot("./nabi.csv","Mohammad Nabi")
ca.bowlerWktsRunsPlot("./rashid.csv","Rashid Khan")
ca.bowlerWktsRunsPlot("./tahir.csv","Imran Tahir")
25 Average wickets at different venues
The plot gives the average wickets taken by Muralitharan at different venues.
import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./shakib.csv","Shakib Al Hasan")
ca.bowlerAvgWktsGround("./nabi.csv","Mohammad Nabi")
ca.bowlerAvgWktsGround("./rashid.csv","Rashid Khan")
ca.bowlerAvgWktsGround("./tahir.csv","Imran Tahir")
26 Average wickets against different opposition
The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team
import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./shakib.csv","Shakib Al Hasan")
ca.bowlerAvgWktsOpposition("./nabi.csv","Mohammad Nabi")
ca.bowlerAvgWktsOpposition("./rashid.csv","Rashid Khan")
ca.bowlerAvgWktsOpposition("./tahir.csv","Imran Tahir")
27 Wickets taken moving average
From the plot below it can be see
import cricpy.analytics as ca
ca.bowlerMovingAverage("./shakib.csv","Shakib Al Hasan")
ca.bowlerMovingAverage("./nabi.csv","Mohammad Nabi")
ca.bowlerMovingAverage("./rashid.csv","Rashid Khan")
ca.bowlerMovingAverage("./tahir.csv","Imran Tahir")
28 Cumulative average wickets taken
The plots below give the cumulative average wickets taken by the bowlers. Rashid Khan has been the most effective with almost 2.28 wickets per match
import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./shakib.csv","Shakib Al Hasan")
ca.bowlerCumulativeAvgWickets("./nabi.csv","Mohammad Nabi")
ca.bowlerCumulativeAvgWickets("./rashid.csv","Rashid Khan")
ca.bowlerCumulativeAvgWickets("./tahir.csv","Imran Tahir")
29 Cumulative average economy rate
The plots below give the cumulative average economy rate of the bowlers. Rashid Khan has the nest economy rate followed by Mohammed Nabi
import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./shakib.csv","Shakib Al Hasan")
ca.bowlerCumulativeAvgEconRate("./nabi.csv","Mohammad Nabi")
ca.bowlerCumulativeAvgEconRate("./rashid.csv","Rashid Khan")
ca.bowlerCumulativeAvgEconRate("./tahir.csv","Imran Tahir")
30 Relative cumulative average economy rate of bowlers
The Relative cumulative economy rate is given below. It can be seen that Rashid Khan has the best economy rate followed by Mohammed Nabi and then Imran Tahir
import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)
31 Relative Economy Rate against wickets taken
Rashid Khan has the best figures for wickets between 2-3.5 wickets. Mohammed Nabi pips Rashid Khan when takes a haul of 4 wickets.
import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlingER(frames,names)
32 Relative cumulative average wickets of bowlers in career
Rashid has the best performance with cumulative average wickets. He is followed by Imran Tahir in the wicket haul, followed by Shakib Al Hasan
import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)
33. Key Findings
The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.
Here are the main findings from the analysis above
Analysis of Top 4 batsman
The analysis of the Top 4 test batsman Kohli, Guptill, Shahzad and McCullum
1.Kohli has the best overall cumulative average runs and towers over everybody else
2. Kohli, Guptill and McCullum has a very good strike rate of around 125+
3. Guptill and McCullum have a larger percentage of sixes as compared to Kohli
4. Rashid Khan has the best cumulative average wickets, followed by Imran Tahir and then Shakib Al Hasan
5. Rashid Khan is the most economical bowler, followed by Mohammed Nabi
You can fork/clone the package at Github cricpy
Conclusion
Cricpy now has almost all the functions and functionalities of my R package cricketr. There are still a few more features that need to be added to cricpy. I intend to do this as and when I find time.
Go ahead, take cricpy for a spin! Hope you enjoy the ride!
Watch this space!!!
Important note: Do check out my other posts using cricpy at cricpy-posts
You may also like
1. A method for optimal bandwidth usage by auctioning available bandwidth using the OpenFlow protocol
2. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
3. Dabbling with Wiener filter using OpenCV
4. Deep Learning from first principles in Python, R and Octave – Part 5
5. Latency, throughput implications for the Cloud
6. Bend it like Bluemix, MongoDB using Auto-scale – Part 1!
7. Sea shells on the seashore
8. Practical Machine Learning with R and Python – Part 4
To see all posts click Index of Posts
4 thoughts on “Cricpy takes guard for the Twenty20s”