“He felt that his whole life was some kind of dream and he sometimes wondered whose it was and whether they were enjoying it.”
“The ships hung in the sky in much the same way that bricks don’t.”
“We demand rigidly defined areas of doubt and uncertainty!”
“For a moment, nothing happened. Then, after a second or so, nothing continued to happen.”
“The Answer to the Great Question… Of Life, the Universe and Everything… Is… Forty-two,’ said Deep Thought, with infinite majesty and calm.”
The Hitchhiker's Guide to the Galaxy - Douglas Adams
Introduction
In this post, I introduce 2 new functions in my R package ‘cricketr’ (cricketr v0.22) see Re-introducing cricketr! : An R package to analyze performances of cricketers which enable granular analysis of batsmen and bowlers. They are
- Step 1: getPlayerDataHA – This function is a wrapper around getPlayerData(), getPlayerDataOD() and getPlayerDataTT(), and adds an extra column ‘homeOrAway’ which says whether the match was played at home/away/neutral venues. A CSV file is created with this new column.
- Setp 2: getPlayerDataOppnHA – This function allows you to slice & dice the data for batsmen and bowlers against specific oppositions, at home/away/neutral venues and between certain periods. This reduced subset of data can be used to perform analyses. A CSV file is created as an output based on the parameters of opposition, home or away and the interval of time
Note All the existing cricketr functions can be used on this smaller fine-grained data set for a closer analysis of players
Note 1: You have to call the above functions only once. You can reuse the CSV files in other functions
Important note: Don’t go too fine-grained by choosing just one opposition, in one of home/away/neutral and for too short a period. Too small a dataset may defeat the purpose of the analysis!
This post has been published in Rpubs and can be accessed at Cricketr learns new tricks
You can download a PDF version of this post at Cricketr learns new tricks
If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

1. Analyzing Tendulkar at 3 different stages of his career
The following functions analyze Sachin Tendulkar during 3 different periods of his illustrious career. a) 1st Jan 2001-1st Jan 2002 b) 1st Jan 2005-1st Jan 2006 c) 1st Jan 2012-1st Jan 2013
#Note: I have commented the lines to getPlayerDataHA() as I already have
# CSV file
df1=getPlayerDataOppnHA(infile="tendulkarHA.csv",outfile="tendulkarTest2001.csv",
startDate="2001-01-01",endDate="2002-01-01")
df2=getPlayerDataOppnHA(infile="tendulkarHA.csv",outfile="tendulkarTest2005.csv",
startDate="2005-01-01",endDate="2006-01-01")
`
1a Mean strike rate of Tendulkar in 2001,2005,2012
Note: Any of the cricketr R functions can be used on the fine-grained subset of data as below. The mean strike rate of Tendulkar is of the order of 60+ in 2001 which decreases to 50 and later to around 45
batsmanMeanStrikeRate ("./tendulkarTest2001.csv","Tendulkar-2001")
![]()

batsmanMeanStrikeRate ("./tendulkarTest2005.csv","Tendulkar-2005")
![]()

batsmanMeanStrikeRate ("./tendulkarTest2012.csv","Tendulkar-2012")
![]()

1d. Plot the relative cumulative average and relative strike rate of Tendulkar in 2001,2005,2012
The plot below compares Tendulkar’s cumulative strike rate and cumulative average during 3 different stages of his career
- The cumulative average runs of Tendulkar is in the high 60+ in 2001, which drops to ~50 in 2005 and later plummets to the low 25s in 2012
- The strike rate in 2001 for Tendulkar is amazing 60+
frames=list("tendulkarTest2001.csv","tendulkarTest2005.csv","tendulkarTest2012.csv")
names=list("Tendulkar-2001","Tendulkar-2005","Tendulkar-2012")
relativeBatsmanCumulativeAvgRuns(frames,names)
![]()

relativeBatsmanCumulativeStrikeRate(frames,names)
![]()

2a. Kohli’s cumulative average runs in 2014 & 2018
Kohli’s cumulative average runs in 2014 is in the low 15s, while in 2018 it is 70+. Kohli stamps his class back again and undoes the bad memories of 2014
batsmanCumulativeAverageRuns("kohliTestEng2014.csv", "Kohli-Eng-2014")
![]()

batsmanCumulativeAverageRuns("kohliTestEng2018.csv", "Kohli-Eng-2018")
![]()

3b Plot the relative cumulative average runs and relative cumative strike rate
Plot the relative cumulative average runs and relative cumative strike rate of Ganguly, Dravid and Laxman
-Dravid towers over Laxman and Ganguly with respect to cumulative average runs. – Ganguly has a superior strike rate followed by Laxman and then Dravid
frames=list("gangulyTestAES2002-08.csv","dravidTestAES2002-08.csv","laxmanTestAES2002-08.csv")
names=list("GangulyAusEngSA2002-08","DravidAusEngSA2002-08","LaxmanAusEngSA2002-08")
relativeBatsmanCumulativeAvgRuns(frames,names)
![]()

relativeBatsmanCumulativeStrikeRate(frames,names)
![]()

4a. Compare cumulative strike rates and cumulative average runs of Rohit, Root and Williamson
The relative cumulative strike rate of all 3 are comparable
frames=list("rohitODIAusWISA.csv","joerootODIAusWISA.csv","williamsonODIAusWiSA.csv")
names=list("Rohit-ODI-AusWISA","Joe Root-ODI-AusWISA","Williamson-ODI-AusWISA")
relativeBatsmanCumulativeAvgRuns(frames,names)
![]()

relativeBatsmanCumulativeStrikeRate(frames,names)
![]()

Conclusion
By getting the homeOrAway data for players using the profileNo, you can slice and dice the data based on your choice of opposition, whether you want matches that were played at home/away/neutral venues. Finally by specifying the period for which the data has to be subsetted you can create fine grained analysis.
Hope you have a great time with cricketr!!!
Also see
1. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2. Cricpy takes a swing at the ODIs
3. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
4. Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr
5. Big Data-2: Move into the big league:Graduate from R to SparkR
6. Rock N’ Roll with Bluemix, Cloudant & NodeExpress
7. A method to crowd source pothole marking on (Indian) roads
8. De-blurring revisited with Wiener filter using OpenCV
To see all posts click Index of posts
Like this:
Like Loading...