Cricketr adds team analytics to its repertoire!!!

And she’s got brains enough for two, which is the exact quantity the girl who marries you will need.

“I’m not absolutely certain of the facts, but I rather fancy it’s Shakespeare who says that it’s always just when a fellow is feeling particularly braced with things in general that Fate sneaks up behind him with the bit of lead piping.”

“A melancholy-looking man, he had the appearance of one who has searched for the leak in life’s gas-pipe with a lighted candle.”

“It isn’t often that Aunt Dahlia lets her angry passions rise, but when she does, strong men climb trees and pull them up after them.”

“Some minds are like soup in a poor restaurant – better left unstirred.”

                                      P.G. Wodehouse

Introduction

My R package cricketr had its genesis about 4 years ago, sometime around June 2015. There were some minor updates afterwards and the package performed analytics on cricketers (Test, ODI and T20) based on data from ESPN Cricinfo see Re-introducing cricketr! : An R package to analyze performances of cricketers. Now, in the latest release of cricketr, I have included 8 functions which can perform Team analytics. Team analysis can be done for Test, ODI and T20 teams.

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package can handle all formats of the game including Test, ODI and Twenty20 cricket for players (batsmen & bowlers) and also teams (Test, ODI and T20)

You should be able to install the package directly from CRAN. Please be mindful of ESPN Cricinfo Terms of Use

A total of 8 new functions which deal with team analytics has been included in the latest release.

There are 5 functions which are used internally 1) getTeamData b) getTeamNumber c) getMatchType d) getTeamDataHomeAway e) cleanTeamData

and the external functions which are
a) teamWinLossStatusVsOpposition
b) teamWinLossStatusAtGrounds
c) plotTimelineofWinsLosses

All the above functions are common to Test, ODI and T20 teams

The data for a particular Team can be obtained with the getTeamDataHomeAway() function from the package. This will return a dataframe of the team’s win/loss status at home and away venues over a period of time. This can be saved as a CSV file. Once this is done, you can use this CSV file for all subsequent analysis

As before you can get the help for any of the cricketr functions as below

#help(teamWinLossStatusVsOpposition)
Compute the wins/losses/draw/tied etc for a Team in Test, ODI or T20 against opposition
Description
This function computes the won,lost,draw,tied or no result for a team against other teams in home/away or neutral venues and either returns a dataframe or plots it against opposition
Usage
teamWinLossStatusVsOpposition(file,teamName,opposition=c("all"),homeOrAway=c("all"),
      matchType="Test",plot=FALSE)
Arguments
file	
The CSV file for which the plot is required
teamName	
The name of the team for which plot is required
opposition	
Opposition is a vector namely c("all") or c("Australia", "India", "England")
homeOrAway	
This parameter is a vector which is either c("all") or a vector of venues c("home","away","neutral")
matchType	
Match type - Test, ODI or T20
plot	
If plot=FALSE then a data frame is returned, If plot=TRUE then a plot is generated
Value
None
Note
Maintainer: Tinniam V Ganesh tvganesh.85@gmail.com
Author(s)
Tinniam V Ganesh
References

http://www.espncricinfo.com/ci/content/stats/index.html
https://gigadom.in/
See Also
teamWinLossStatusVsOpposition teamWinLossStatusAtGrounds plotTimelineofWinsLosses
Examples
## Not run: 
#Get the team data for India for Tests
df <- getTeamDataHomeAway(teamName="India",file="indiaOD.csv",matchType="ODI")
teamWinLossStatusAtGrounds("india.csv",teamName="India",opposition=c("Australia","England","India"),
                          homeOrAway=c("home","away"),plot=TRUE)
## End(Not run)

This post has been published at RPubs and is available at TeamAnalyticsWithCricketr

You can download PDF version of this post at TeamAnalyticsWithCricketr

1. Get team data

1a. Test

The teams in Test cricket are included below

  1. Afghanistan 2.Bangladesh 3. England 4. World 5. India 6. Ireland 7. New Zealand 8. Pakistan 9. South Africa 10.Sri Lanka 11. West Indies 12.Zimbabwe

You can use this for the teamName paramater. This will return a dataframe and also save the file as a CSV , if save=TRUE

Note: – Since I have already got the data as CSV files I am not executing the lines below

# Get the data for the teams. Save as CSV
#indiaTest <-getTeamDataHomeAway(dir=".",teamView="bat",matchType="Test",file="indiaTest.csv",save=TRUE,teamName="India")
#australiaTest <- getTeamDataHomeAway(matchType="Test",file="australiaTest.csv",save=TRUE,teamName="Australia")
#pakistanTest <- getTeamDataHomeAway(matchType="Test",file="pakistanTest.csv",save=TRUE,teamName="Pakistan")
#newzealandTest <- getTeamDataHomeAway(matchType="Test",file="newzealandTest.csv",save=TRUE,teamName="New Zealand")

1b. ODI

The ODI teams in the world are below. The data for these teams can be got by names as shown below

  1. Afghanistan 2. Africa XI 3. Asia XI 4.Australia 5.Bangladesh
  2. Bermuda 7. England 8. ICC World X1 9. India 11.Ireland 12. New Zealand
  3. Pakistan 14. South Africa 15. Sri Lanka 17. West Indies 18. Zimbabwe
  4. Canada 21. East Africa 22. Hong Kong 23.Ireland 24. Kenya 25. Namibia
  5. Nepal 27.Netherlands 28. Oman 29.Papua New Guinea 30. Scotland
  6. United Arab Emirates 32. United States of America
#indiaODI <- getTeamDataHomeAway(matchType="ODI",file="indiaODI.csv",save=TRUE,teamName="India")
#englandODI <- getTeamDataHomeAway(matchType="ODI",file="englandODI.csv",save=TRUE,teamName="England")
#westindiesODI <- getTeamDataHomeAway(matchType="ODI",file="westindiesODI.csv",save=TRUE,teamName="West Indies")
#irelandODI <- getTeamDataHomeAway(matchType="ODI",file="irelandODI.csv",save=TRUE,teamName="Ireland")

1c T20

The T20 teams in the world are
1.Afghanistan 2. Australia 3. Bahrain 4. Bangladesh 5. Belgium 6. Belize
2.Bermuda 8.Botswana 9. Canada 11. Costa Rica 12. Germany 13. Ghana
14.Guernsey 15. Hong Kong 16. ICC World X1 17.India 18. Ireland 19.Italy
20.Jersey 21. Kenya 22.Kuwait 23.Maldives 24.Malta 25.Mexico 26.Namibia
27.Nepal 28.Netherlands 29. New Zealand 30.Nigeria 31.Oman 32. Pakistan
33.Panama 34.Papua New Guinea 35. Philippines 36.Qatar 37.Saudi Arabia
38.Scotland 39.South Africa 40.Spain 41.Sri Lanka 42.Uganda
43.United Arab Emirates United States of America 44.Vanuatu 45.West Indies

#southafricaT20 <- getTeamDataHomeAway(matchType="T20",file="southafricaT20.csv",save=TRUE,teamName="South Africa")
#srilankaT20 <- getTeamDataHomeAway(matchType="T20",file="srilankaT20.csv",save=TRUE,teamName="Sri Lanka")
#canadaT20 <- getTeamDataHomeAway(matchType="T20",file="canadaT20.csv",save=TRUE,teamName="Canada")
#afghanistanT20 <- getTeamDataHomeAway(matchType="T20",file="afghanistanT20.csv",save=TRUE,teamName="Afghanistan")

2 Analysis of Test matches

The functions below perform analysis of Test teams

2a. Wins vs Loss against opposition

This function performs analysis of Test teams against other teams at home/away or neutral venue. Note:- The opposition can be a vector of opposition teams. Similarly homeOrAway can also be a vector of home/away/neutral venues.

# Get the performance of Indian test team against all teams at all venues as a dataframe
df <- teamWinLossStatusVsOpposition("india.csv",teamName="India",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=FALSE)
head(df)
## # A tibble: 6 x 4
## # Groups:   Opposition, Result [4]
##   Opposition  Result ha    count
##   <chr>       <chr>  <chr> <int>
## 1 Afghanistan won    home      1
## 2 Australia   draw   away     20
## 3 Australia   draw   home     23
## 4 Australia   lost   away     58
## 5 Australia   lost   home     26
## 6 Australia   tied   home      2
# Plot the performance of Indian Test team  against all teams at all venues
teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

# Get the performance of Australia against India, England and New Zealand at all venues in Tests
df <-teamWinLossStatusVsOpposition("australiaTest.csv",teamName="Australia",opposition=c("India","England","New Zealand"),homeOrAway=c("all"),matchType="Test",plot=FALSE)

#Plot the performance of Australia against England, India and New Zealand only at home (Australia) 
teamWinLossStatusVsOpposition("australiaTest.csv",teamName="Australia",opposition=c("India","England","New Zealand"),homeOrAway=c("home"),matchType="Test",plot=TRUE)

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

 

2b Wins vs losses of Test teams against opposition at different venues

# Get the  performance of Pakistan against India, West Indies, South Africa at all venues in Tests and show performances at the venues
df <- teamWinLossStatusAtGrounds("pakistanTest.csv",teamName="Pakistan",opposition=c("India","West Indies","South Africa"),homeOrAway=c("all"),matchType="Test",plot=FALSE)
head(df)
## # A tibble: 6 x 4
## # Groups:   Ground, Result [6]
##   Ground     Result ha      count
##   <chr>      <chr>  <chr>   <int>
## 1 Abu Dhabi  draw   neutral     2
## 2 Abu Dhabi  won    neutral     4
## 3 Ahmedabad  draw   away        2
## 4 Bahawalpur draw   home        1
## 5 Basseterre won    away        2
## 6 Bengaluru  draw   away        5
# Plot the performance of New Zealand Test team against England, Sri Lanka and Bangladesh at all grounds playes 
teamWinLossStatusAtGrounds("newzealandTest.csv",teamName="New Zealand",opposition=c("England","Sri Lanka","Bangladesh"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

2c. Plot the time line of wins vs losses of Test teams against opposition at different venues during an interval

# Plot the time line of wins/losses of India against Australia, West Indies, South Africa in away/neutral venues
#from 2000-01-01 to 2017-01-01
plotTimelineofWinsLosses("indiaTest.csv",team="India",opposition=c("Australia","West Indies","South Africa"),
                         homeOrAway=c("away","neutral"), startDate="2000-01-01",endDate="2017-01-01")

#Plot the time line of wins/losses of Indian Test team from 1970 onwards
plotTimelineofWinsLosses("indiaTest.csv",team="India",startDate="1970-01-01",endDate="2017-01-01")

3 ODI

The functions below perform analysis of ODI teams listed above

3a. Wins vs Loss against opposition ODI teams

This function performs analysis of ODI teams against other teams at home/away or neutral venue. Note:- The opposition can be a vector of opposition teams. Similarly homeOrAway can also be a vector of home/away/neutral venues.

# Get the performance of West Indies in ODIs against all other ODI teams at all venues and retirn as a dataframe
df <- teamWinLossStatusVsOpposition("westindiesODI.csv",teamName="West Indies",opposition=c("all"),homeOrAway=c("all"),matchType="ODI",plot=FALSE)
head(df)
## # A tibble: 6 x 4
## # Groups:   Opposition, Result [3]
##   Opposition  Result ha      count
##   <chr>       <chr>  <chr>   <int>
## 1 Afghanistan lost   home        1
## 2 Afghanistan lost   neutral     2
## 3 Afghanistan won    home        1
## 4 Australia   lost   away       41
## 5 Australia   lost   home       25
## 6 Australia   lost   neutral     8
# Plot the performance of West Indies in ODIs against Sri Lanka, India at all venues
teamWinLossStatusVsOpposition("westindiesODI.csv",teamName="West Indies",opposition=c("Sri Lanka", "India"),homeOrAway=c("all"),matchType="ODI",plot=TRUE)

 

#Plot the performance of Ireland in ODIs against Zimbabwe, Kenya, bermuda, UAE, Oman and Scotland at all venues
teamWinLossStatusVsOpposition("irelandODI.csv",teamName="Ireland",opposition=c("Zimbabwe","Kenya","Bermuda","U.A.E.","Oman","Scotland"),homeOrAway=c("all"),matchType="ODI",plot=TRUE)

3b Wins vs losses of ODI teams against opposition at different venues

# Plot the performance of England ODI team against Bangladesh, West Indies and Australia at all venues
teamWinLossStatusAtGrounds("englandODI.csv",teamName="England",opposition=c("Bangladesh","West Indies","Australia"),homeOrAway=c("all"),matchType="ODI",plot=TRUE)

#Plot the performance of India against South Africa, West Indies and Australia at 'home' venues
teamWinLossStatusAtGrounds("indiaODI.csv",teamName="India",opposition=c("South Africa","West Indies","Australia"),homeOrAway=c("home"),matchType="ODI",plot=TRUE)

3c. Plot the time line of wins vs losses of ODI teams against opposition at different venues during an interval

#Plot the time line of wins/losses of Bangladesh ODI team between 2015 and 2019 against all other teams and at
# all venues
plotTimelineofWinsLosses("bangladeshOD.csv",team="Bangladesh",startDate="2015-01-01",endDate="2019-01-01",matchType="ODI")

#Plot the time line of wins/losses of India ODI against Sri Lanka, Bangladesh from 2016 to 2019
plotTimelineofWinsLosses("indiaODI.csv",team="India",opposition=c("Sri Lanka","Bangladesh"),startDate="2016-01-01",endDate="2019-01-01",matchType="ODI")

4 Twenty 20

The functions below perform analysis of Twenty 20 teams listed above

4a. Wins vs Loss against opposition ODI teams

This function performs analysis of T20 teams against other T20 teams at home/away or neutral venue. Note:- The opposition can be a vector of opposition teams. Similarly homeOrAway can also be a vector of home/away/neutral venues.

# Get the performance of South Africa T20 team against England, India and Sri Lanka at home grounds at England
df <- teamWinLossStatusVsOpposition("southafricaT20.csv",teamName="South Africa",opposition=c("England","India","Sri Lanka"),homeOrAway=c("home"),matchType="T20",plot=FALSE)

#Plot the performance of South Africa T20 against England, India and Sri Lanka at all venues
teamWinLossStatusVsOpposition("southafricaT20.csv",teamName="South Africa",opposition=c("England","India","Sri Lanka"),homeOrAway=c("all"),matchType="T20",plot=TRUE)

#Plot the performance of Afghanistan T20 teams against all oppositions
teamWinLossStatusVsOpposition("afghanistanT20.csv",teamName="Afghanistan",opposition=c("all"),homeOrAway=c("all"),matchType="T20",plot=TRUE)

 

 

4b Wins vs losses of T20 teams against opposition at different venues

# Compute the performance of Canada against all opposition at all venues and show by grounds. Return as dataframe
df <-teamWinLossStatusAtGrounds("canadaT20.csv",teamName="Canada",opposition=c("all"),homeOrAway=c("all"),matchType="T20",plot=FALSE)
head(df)
## # A tibble: 6 x 4
## # Groups:   Ground, Result [6]
##   Ground        Result ha      count
##   <chr>         <chr>  <chr>   <int>
## 1 Abu Dhabi     lost   neutral     1
## 2 Belfast       lost   neutral     1
## 3 Belfast       won    neutral     2
## 4 Colombo (SSC) lost   neutral     1
## 5 Colombo (SSC) won    neutral     1
## 6 Dubai (DSC)   lost   neutral     5
# Plot the performance of Sri Lanka T20 team against India and Bangladesh in different venues at home/away and neutral
teamWinLossStatusAtGrounds("srilankaT20.csv",teamName="Sri Lanka",opposition=c("India", "Bangladesh"), homeOrAway=c("all"), matchType="T20", plot=TRUE)

4c. Plot the time line of wins vs losses of T20 teams against opposition at different venues during an interval

#Plot the time line of Sri Lanka T20 team agaibst all opposition
plotTimelineofWinsLosses("srilankaT20.csv",team="Sri Lanka",opposition=c("Australia", "Pakistan"), startDate="2013-01-01", endDate="2019-01-01",  matchType="T20")

# Plot the time line of South Africa T20 between 2010 and 2015 against West Indies and Pakistan
plotTimelineofWinsLosses("southafricaT20.csv",team="South Africa",opposition=c("West Indies", "Pakistan"), startDate="2010-01-01", endDate="2015-01-01",  matchType="T20")

My book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

‘Cricket analytics with cricketr and cricpy – Analytics harmony with R and Python’ is now available on Amazon in both paperback ($21.99) and kindle ($9.99/Rs 449) versions. The book includes analysis of cricketers using both my R package ‘cricketr’ and my python package ‘cricpy’ for all formats of the game namely Test, ODI and T20. Both packages use data from ESPN Cricinfo Statsguru. The paperback is available on Amazon for $21.99 and the kindle version is available for $9.99/Rs 449

Pick up your copy today!

The book includes the following chapters

CONTENTS

Introduction 7
1. Cricket analytics with cricketr 9
1.1. Introducing cricketr! : An R package to analyze performances of cricketers 10
1.2. Taking cricketr for a spin – Part 1 48
1.2. cricketr digs the Ashes! 69
1.3. cricketr plays the ODIs! 97
1.4. cricketr adapts to the Twenty20 International! 139
1.5. Sixer – R package cricketr’s new Shiny avatar 168
1.6. Re-introducing cricketr! : An R package to analyze performances of cricketers 178
1.7. cricketr sizes up legendary All-rounders of yesteryear 233
1.8. cricketr flexes new muscles: The final analysis 277
1.9. The Clash of the Titans in Test and ODI cricket 300
1.10. Analyzing performances of cricketers using cricketr template 338
2. Cricket analytics with cricpy 352
2.1 Introducing cricpy:A python package to analyze performances of cricketers 353
2.2 Cricpy takes a swing at the ODIs 405
Analysis of Top 4 batsman 448
2.3 Cricpy takes guard for the Twenty20s 449
2.4 Analyzing batsmen and bowlers with cricpy template 490
9. Average runs against different opposing teams 493
3. Other cricket posts in R 500
3.1 Analyzing cricket’s batting legends – Through the mirage with R 500
3.2 Mirror, mirror … the best batsman of them all? 527
4. Appendix 541
Cricket analysis with Machine Learning using Octave 541
4.1 Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid 542
4.2 Informed choices through Machine Learning-2 Pitting together Kumble, Kapil, Chandra 555
Further reading 569
Important Links 570

Also see
1. My book “Deep Learning from first principles” now on Amazon
2. Practical Machine Learning with R and Python – Part 1
3. Revisiting World Bank data analysis with WDI and gVisMotionChart
4. Natural language processing: What would Shakespeare say?
5. Optimal Cloud Computing
6. Pitching yorkpy … short of good length to IPL – Part 1
7. Computer Vision: Ramblings on derivatives, histograms and contours

To see all posts click Index of posts

Analyzing performances of cricketers using cricketr template

This post includes a template which you can use for analyzing the performances of cricketers, both batsmen and bowlers in Test, ODI and Twenty 20 cricket using my R package cricketr. To see actual usage of functions in the R package cricketr see Introducing cricketr! : An R package to analyze performances of cricketers.

This template can be downloaded from Github at cricketer-template

The ‘cricketr’ package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!!  See Cricketr adds team analytics to its repertoire!!!

Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players

Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

The cricketr package

The cricketr package has several functions that perform several different analyses on both batsman and bowlers. The package has function that plot percentage frequency runs or wickets, runs likelihood for a batsman, relative run/strike rates of batsman and relative performance/economy rate for bowlers are available.

Other interesting functions include batting performance moving average, forecast and a function to check whether the batsmans in in-form or out-of-form.

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Ricky Ponting, Sachin Tendulkar etc. This will bring up a page which have the profile number for the player e.g. for Sachin Tendulkar this would be http://www.espncricinfo.com/india/content/player/35320.html. Hence, Sachin’s profile is 35320. This can be used to get the data for Tendulkar as shown below

The cricketr package is now available from CRAN!!! You should be able to install directly with

1. Install the cricketr package

if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

The cricketr package includes some pre-packaged sample (.csv) files. You can use these sample to test functions as shown below

# Retrieve the file path of a data file installed with cricketr
#pathToFile <- system.file("data", "tendulkar.csv", package = "cricketr")
#batsman4s(pathToFile, "Sachin Tendulkar")

# The general format is pkg-function(pathToFile,par1,...)
#batsman4s(<path-To-File>,"Sachin Tendulkar")

“` The pre-packaged files can be accessed as shown above. To get the data of any player use the function in Test, ODI and Twenty20 use the following

2. For Test cricket

#tendulkar <- getPlayerData(35320,dir="..",file="tendulkar.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4))

2a. For ODI cricket

#tendulkarOD <- getPlayerDataOD(35320,dir="..",file="tendulkarOD.csv",type="batting")

2b For Twenty 20 cricket

#tendulkarT20 <- getPlayerDataTT(35320,dir="..",file="tendulkarT20.csv",type="batting")

Analysis of batsmen

Important Note This needs to be done only once for a player. This function stores the player’s data in a CSV file (for e.g. tendulkar.csv as above) which can then be reused for all other functions. Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerData for all subsequent analyses

Sachin Tendulkar’s performance – Basic Analyses

The 3 plots below provide the following for Tendulkar

  1. Frequency percentage of runs in each run range over the whole career
  2. Mean Strike Rate for runs scored in the given range
  3. A histogram of runs frequency percentages in runs ranges For example

3. Basic analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
#batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
#batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()
## null device 
##           1
  1. Player 1
  2. Player 2
  3. Player 3
  4. Player 4

4. More analyses

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player1.csv","Player1")
#batsman6s("./player1.csv","Player1")
#batsmanMeanStrikeRate("./player1.csv","Player1")

# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player2.csv","Player2")
#batsman6s("./player2.csv","Player2")
#batsmanMeanStrikeRate("./player2.csv","Player2")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player3.csv","Player3")
#batsman6s("./player3.csv","Player3")
#batsmanMeanStrikeRate("./player3.csv","Player3")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")

dev.off()
## null device 
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#batsman4s("./player4.csv","Player4")
#batsman6s("./player4.csv","Player4")
#batsmanMeanStrikeRate("./player4.csv","Player4")
# For ODI and T20
#batsmanScoringRateODTT("./player1.csv","Player1")
dev.off()
## null device 
##           1

Note: For mean strike rate in ODI and Twenty20 use the function batsmanScoringRateODTT()

5.Boxplot histogram plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

#batsmanPerfBoxHist("./player1.csv","Player1")
#batsmanPerfBoxHist("./player2.csv","Player2")
#batsmanPerfBoxHist("./player3.csv","Player3")
#batsmanPerfBoxHist("./player4.csv","Player4")

6. Contribution to won and lost matches

For the 2 functions below you will have to use the getPlayerDataSp() function. I have commented this as I already have these files. This function can only be used for Test matches

#player1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player1sp.csv",ttype="batting")
#player2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player2sp.csv",ttype="batting")
#player3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player3sp.csv",ttype="batting")
#player4sp <- getPlayerDataSp(xxxx,tdir=".",tfile="player4sp.csv",ttype="batting")
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanContributionWonLost("player1sp.csv","Player1")
#batsmanContributionWonLost("player2sp.csv","Player2")
#batsmanContributionWonLost("player3sp.csv","Player3")
#batsmanContributionWonLost("player4sp.csv","Player4")
dev.off()
## null device 
##           1

7, Performance at home and overseas

This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfHomeAway("player1sp.csv","Player1")
#batsmanPerfHomeAway("player2sp.csv","Player2")
#batsmanPerfHomeAway("player3sp.csv","Player3")
#batsmanPerfHomeAway("player4sp.csv","Player4")
dev.off()
## null device 
##           1

8. Batsman average at different venues

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsGround("./player1.csv","Player1")
#batsmanAvgRunsGround("./player2.csv","Player2")
#batsmanAvgRunsGround("./player3.csv","Ponting")
#batsmanAvgRunsGround("./player4.csv","Player4")
dev.off()
## null device 
##           1

9. Batsman average against different opposition

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanAvgRunsOpposition("./player1.csv","Player1")
#batsmanAvgRunsOpposition("./player2.csv","Player2")
#batsmanAvgRunsOpposition("./player3.csv","Ponting")
#batsmanAvgRunsOpposition("./player4.csv","Player4")
dev.off()
## null device 
##           1

10. Runs Likelihood of batsman

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanRunsLikelihood("./player1.csv","Player1")
#batsmanRunsLikelihood("./player2.csv","Player2")
#batsmanRunsLikelihood("./player3.csv","Ponting")
#batsmanRunsLikelihood("./player4.csv","Player4")
dev.off()
## null device 
##           1

11. Moving Average of runs in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanMovingAverage("./player1.csv","Player1")
#batsmanMovingAverage("./player2.csv","Player2")
#batsmanMovingAverage("./player3.csv","Ponting")
#batsmanMovingAverage("./player4.csv","Player4")
dev.off()
## null device 
##           1

12. Cumulative Average runs of batsman in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeAverageRuns("./player1.csv","Player1")
#batsmanCumulativeAverageRuns("./player2.csv","Player2")
#batsmanCumulativeAverageRuns("./player3.csv","Ponting")
#batsmanCumulativeAverageRuns("./player4.csv","Player4")
dev.off()
## null device 
##           1

13. Cumulative Average strike rate of batsman in career

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanCumulativeStrikeRate("./player1.csv","Player1")
#batsmanCumulativeStrikeRate("./player2.csv","Player2")
#batsmanCumulativeStrikeRate("./player3.csv","Ponting")
#batsmanCumulativeStrikeRate("./player4.csv","Player4")
dev.off()
## null device 
##           1

14. Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
#batsmanPerfForecast("./player1.csv","Player1")
#batsmanPerfForecast("./player2.csv","Player2")
#batsmanPerfForecast("./player3.csv","Player3")
#batsmanPerfForecast("./player4.csv","Player4")
dev.off()
## null device 
##           1

15. Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanSR(frames,names)

16. Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeRunsFreqPerf(frames,names)

17. Relative cumulative average runs in career

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","Player4")
#relativeBatsmanCumulativeAvgRuns(frames,names)

18. Relative cumulative average strike rate in career

frames <- list("./player1.csv","./player2.csv","player3.csv","player4.csv")
names <- list("Player1","Player2","Player3","player4")
#relativeBatsmanCumulativeStrikeRate(frames,names)

19. Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

#checkBatsmanInForm("./player1.csv","Player1")
#checkBatsmanInForm("./player2.csv","Player2")
#checkBatsmanInForm("./player3.csv","Player3")
#checkBatsmanInForm("./player4.csv","Player4")

20. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player1.csv","Player1")
#battingPerf3d("./player2.csv","Player2")
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
#battingPerf3d("./player3.csv","Player3")
#battingPerf3d("./player4.csv","player4")
dev.off()
## null device 
##           1

21. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
#Player1 <- batsmanRunsPredict("./player1.csv","Player1",newdataframe=newDF)
#Player2 <- batsmanRunsPredict("./player2.csv","Player2",newdataframe=newDF)
#ponting <- batsmanRunsPredict("./player3.csv","Player3",newdataframe=newDF)
#sangakkara <- batsmanRunsPredict("./player4.csv","Player4",newdataframe=newDF)
#batsmen <-cbind(round(Player1$Runs),round(Player2$Runs),round(Player3$Runs),round(Player4$Runs))
#colnames(batsmen) <- c("Player1","Player2","Player3","Player4")
#newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
#colnames(newDF) <- c("BallsFaced","MinsAtCrease")
#predictedRuns <- cbind(newDF,batsmen)
#predictedRuns

Analysis of bowlers

  1. Bowler1
  2. Bowler2
  3. Bowler3
  4. Bowler4

player1 <- getPlayerData(xxxx,dir=“..”,file=“player1.csv”,type=“bowling”) Note For One day you will have to use getPlayerDataOD() and for Twenty20 it is getPlayerDataTT()

21. Wicket Frequency Plot

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsFreqPercent("./bowler1.csv","Bowler1")
#bowlerWktsFreqPercent("./bowler2.csv","Bowler2")
#bowlerWktsFreqPercent("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

22. Wickets Runs plot

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerWktsRunsPlot("./bowler1.csv","Bowler1")
#bowlerWktsRunsPlot("./bowler2.csv","Bowler2")
#bowlerWktsRunsPlot("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

23. Average wickets at different venues

#bowlerAvgWktsGround("./bowler3.csv","Bowler3")

24. Average wickets against different opposition

#bowlerAvgWktsOpposition("./bowler3.csv","Bowler3")

25. Wickets taken moving average

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerMovingAverage("./bowler1.csv","Bowler1")
#bowlerMovingAverage("./bowler2.csv","Bowler2")
#bowlerMovingAverage("./bowler3.csv","Bowler3")

dev.off()
## null device 
##           1

26. Cumulative Wickets taken

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgWickets("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgWickets("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgWickets("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

27. Cumulative Economy rate

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerCumulativeAvgEconRate("./bowler1.csv","Bowler1")
#bowlerCumulativeAvgEconRate("./bowler2.csv","Bowler2")
#bowlerCumulativeAvgEconRate("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

28. Future Wickets forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfForecast("./bowler1.csv","Bowler1")
#bowlerPerfForecast("./bowler2.csv","Bowler2")
#bowlerPerfForecast("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

29. Contribution to matches won and lost

As discussed above the next 2 charts require the use of getPlayerDataSp(). This can only be done for Test matches

#bowler1sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler1sp.csv",ttype="bowling")
#bowler2sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler2sp.csv",ttype="bowling")
#bowler3sp <- getPlayerDataSp(xxxx,tdir=".",tfile="bowler3sp.csv",ttype="bowling")
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerContributionWonLost("bowler1sp","Bowler1")
#bowlerContributionWonLost("bowler2sp","Bowler2")
#bowlerContributionWonLost("bowler3sp","Bowler3")
dev.off()
## null device 
##           1

30. Performance home and overseas.

This can only be done for Test matches

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
#bowlerPerfHomeAway("bowler1sp","Bowler1")
#bowlerPerfHomeAway("bowler2sp","Bowler2")
#bowlerPerfHomeAway("bowler3sp","Bowler3")
dev.off()
## null device 
##           1

31 Relative Wickets Frequency Percentage

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingPerf(frames,names)

32 Relative Economy Rate against wickets taken

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlingER(frames,names)

33 Relative cumulative average wickets of bowlers in career

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgWickets(frames,names)

34 Relative cumulative average economy rate of bowlers

frames <- list("./bowler1.csv","./bowler3.csv","bowler2.csv")
names <- list("Bowler1","Bowler3","Bowler2")
#relativeBowlerCumulativeAvgEconRate(frames,names)

35 Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

Note: The check for the form status of the bowlers indicate

#checkBowlerInForm("./bowler1.csv","Bowler1")
#checkBowlerInForm("./bowler2.csv","Bowler2")
#checkBowlerInForm("./bowler3.csv","Bowler3")
dev.off()
## null device 
##           1

The Clash of the Titans in Test and ODI cricket

Who looks outside, dreams; who looks inside, awakes.
Show me a sane man and I will cure him for you.

            Carl Jung 

 

We’re made of star stuff. We are a way for the cosmos to know itself.
If you want to make an apple pie from scratch, you must first create the universe.

            Carl Sagan

Introduction

The biggest nag in the collective psyche of cricketing fraternity these days, is whether Virat Kohli has surpassed Sachin Tendulkar. This question has been troubling cricket lovers the world over and particularly in India, for quite a while. This nagging question has only grown stronger with Kohli’s 41st ODI century and with Michael Vaughan bestowing the GOAT title to Virat Kohli for ODI cricket. Hence, I decided to do my bit in addressing this, by doing analysis of Kohli’s and Tendulkar’s performance in ODI cricket. I also wanted to address the the best among the cricketing idols of India in Test cricket, namely Sunil Gavaskar, Sachin Tendulkar and Virat Kohli. Hence this post has 2 parts

  1. Analysis of Tendulkar, Gavaskar and Kohli in Test cricket
  2. Analysis of Tendulkar and Kohli in ODIs

In this post, I analyze the performances of these titans in Test and ODI cricket using my R package cricketr. While some may feel that comparisons are not possible as these batsmen are from different eras. To some extent this is true. I would give some leeway to Gavaskar as he had to bat in a pre-helmet era. But with Tendulkar and Kohli a fair and objective comparison is possible. There were pre-eminient bowlers in the times of Tendulkar as there are now.

From the analysis below, it can be seen that Tendulkar is ahead  of everybody else in Test cricket. However it must be noted that Tendulkar’s performance deteriorated towards the end of his career. Such was not the case with Gavaskar. Kohli has some catching up to do and he still has a lot of Test cricket in him.

In ODI Kohli can be seen to pulling ahead of Tendulkar in several aspects.

My R package cricketr can be installed directly from CRAN and you can use it analyze cricketers.

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Important note 1: The latest release of ‘cricketr’ now includes the ability to analyze performances of teams now!!  See Cricketr adds team analytics to its repertoire!!!

Important note 2 : Cricketr can now do a more fine-grained analysis of players, see Cricketr learns new tricks : Performs fine-grained analysis of players

Important note 3: Do check out the python avatar of cricketr, ‘cricpy’ in my post ‘Introducing cricpy:A python package to analyze performances of cricketers

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Note 1: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr templatefrom Github (which is the R Markdown file I have used for the analysis below).

Note 2: I sprinkle the charts with my observations. Feel free to look at them more closely and come to your conclusions.

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post Introducing cricpy:A python package to analyze performances of cricketers

1 Load the cricketr package

if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

A Test cricket  – Analysis of Gavaskar, Tendulkar and Kohli

2. Get player data

tendulkar <- getPlayerData(35320,dir=".",file="tendulkar.csv",type="batting")
kohli <- getPlayerData(253802,dir=".",file="kohli.csv",type="batting")
gavaskar <- getPlayerData(28794,dir=".",file="gavaskar.csv",type="batting")

3a. Basic analyses for Tendulkar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()

3b Basic analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohli.csv","Kohli")
batsmanMeanStrikeRate("./kohli.csv","Kohli")
batsmanRunsRanges("./kohli.csv","Kohli")
dev.off()

3c Basic analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./gavaskar.csv","Gavaskar")
batsmanMeanStrikeRate("./gavaskar.csv","Gavaskar")
batsmanRunsRanges("./gavaskar.csv","Gavaskar")
dev.off()

4a.More analyses for Tendulkar

It can be seen that Tendulkar and Gavaskar has been bowled more often than Kohli. Also Kohli does not have as many sixes in Test cricket as Tendulkar and Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")
dev.off()

4b. More analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanDismissals("./kohli.csv","Kohli")
dev.off()

4c More analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gavaskar.csv","Gavaskar")
batsman6s("./gavaskar.csv","Gavaskar")
batsmanDismissals("./gavaskar.csv","Gavaskar")
dev.off()

5 Performance of batsmen on different grounds

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsGround("./kohli.csv","Kohli")
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")

a

#dev.off()

6. Performance if batsmen against different Opposition

  1. Tendulkar averages 50 against the following countries – Australia, Bangladesh, England, Sri Lanka, West Indies and Zimbabwe
  2. Kohli average almost 50 against all the nations he has played – Australia, Bangladesh, England, New Zealand, Sri Lanka and West Indies
  3. Gavaskar averages 50 against Australia, Pakistan, West Indies, Sri Lanka
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohli.csv","Kohli")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")

7. Get player data special

This is required for the next 2 function calls

tendulkarsp <- getPlayerDataSp(35320,tdir=".",tfile="tendulkarsp.csv",ttype="batting")
kohlisp <- getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
gavaskarsp <- getPlayerDataSp(28794,tdir=".",tfile="gavaskarsp.csv",ttype="batting")

#dev.off()

8 Get contribution of batsmen in matches won and lost

Kohli contribution has had an equal contribution in won and lost matches. Tendulkar’s runs seem to have not helped in winning as much as only 50% of matches he has played have been won

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("./kohlisp.csv","Kohli")
batsmanContributionWonLost("./gavaskarsp.csv","Gavaskar")
  

a

9 Performance of batsmen at home and overseas

The boxplots show that Kohli performs better overseas than at home. The 3rd quartile is higher, though the median seems to lower overseas. For Tendulkar the performance is similar both ways. Gavaskar’s median runs scored overseas is higher.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))


batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("./kohlisp.csv","Kohli")
batsmanPerfHomeAway("./gavaskarsp.csv","Gavaskar")

10. Moving average of runs

Gavaskar’s moving average was very good at the time of his retirement. Kohli seems to be going very strong. Tendulkar’s performance shows signs of deterioration around the time of his retirement.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

#dev.off()

11 Boxplot and histogram of runs

Kohli has a marginally higher average (50.69) than Tendulkar (48.65) while Gavaskar 46. The median runs are same for Tendulkar and Kohli at 32

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfBoxHist("./kohli.csv","Kohli")
batsmanPerfBoxHist("./gavaskar.csv","Gavaskar")

12 Cumulative average Runs for batsmen

Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohli.csv","Kohli")
batsmanCumulativeAverageRuns("./gavaskar.csv","Gavaskar")

13. Cumulative average strike rate of batsmen

Tendulkar’s strike rate is better than Kohli and Gavaskar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohli.csv","Kohli")
batsmanCumulativeStrikeRate("./gavaskar.csv","Gavaskar")

14 Performance forecast of batsmen

The forecasted performance for Kohli and Gavaskar is higher than that of Tendulkar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kohli.csv","Kohli")
batsmanPerfForecast("./gavaskar.csv","Gavaskar")

#dev.off()

15. Relative strike rate of batsmen

par(mar=c(4,4,2,2))

frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanSR(frames,names)
#dev.off()

16. Relative Runs frequency of batsmen

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeRunsFreqPerf(frames,names)
#dev.off()

17. Relative cumulative average runs of batsmen

Tendulkar leads the way here, but it can be seem Kohli catching up.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

18. Relative cumulative average strike rate

Tendulkar has better strike rate than the other two.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

19. Check batsman in form

As in the moving average and performance forecast and cumulative average runs, Kohli and Gavaskar are in-form while Tendulkar was out-of-form towards the end.

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## [1] "**************************** Form status of Sachin Tendulkar ****************************
\n\n Population size: 294  Mean of population: 50.48 \n Sample size: 33  Mean of sample: 32.42 SD of 
sample: 29.8 \n\n Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval 
of population average\n Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below 
the 95% confidence interval of population average\n\n 
Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 117
  Mean of population: 50.35 \n Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n Null 
hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n 
Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population
 average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## [1] "**************************** Form status of Gavaskar ****************************\n\n 
Population size: 125  Mean of population: 44.67 \n Sample size: 14  Mean of sample: 57.86 SD of sample:
 58.55 \n\n Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval of population
 average\n Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence interval of 
population average\n\n Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater 
than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

20. Performance 3D

A 3D regression plane is fitted between the the Balls faced, Minutes at crease and Runs scored

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Sachin Tendulkar")
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./gavaskar.csv","Gavaskar")
#dev.off()

20. Runs likelihood

This functions computes the K-Means and determines the runs the batsmen are likely to score.

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 25.08 % likelihood that Tendulkar  will make  66 Runs in  122 balls over  167  Minutes 
## There is a 58.41 % likelihood that Tendulkar  will make  16 Runs in  31 balls over 44  Minutes
batsmanRunsLikelihood("./kohli.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 20 % likelihood that Kohli  will make  143 Runs in  232 balls over 330  Minutes 
## There is a 33.85 % likelihood that Kohli  will make  51 Runs in  92 balls over  127  Minutes 
## There is a 46.15 % likelihood that Kohli  will make  11 Runs in  24 balls over 31  Minutes
batsmanRunsLikelihood("./gavaskar.csv","Gavaskar")
## Summary of  Gavaskar 's runs scoring likelihood
## **************************************************
## 
## There is a 33.81 % likelihood that Gavaskar  will make  69 Runs in  159 balls over 214  Minutes 
## There is a 8.63 % likelihood that Gavaskar  will make  172 Runs in  364 balls over  506  Minutes 
## There is a 57.55 % likelihood that Gavaskar  will make  13 Runs in  35 balls over 48  Minutes

21. Predict runs for a random combination of Balls faced and runs scored

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)
batsmen <-cbind(round(tendulkar$Runs),round(kohli$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli Gavaskar
## 1          10           30         7     6        4
## 2          38           71        23    24       17
## 3          66          111        39    42       30
## 4          94          152        54    60       43
## 5         121          193        70    78       56
## 6         149          234        86    96       69
## 7         177          274       102   114       82
## 8         205          315       118   132       95
## 9         233          356       134   150      108
## 10        261          396       150   168      121
## 11        289          437       165   186      134
## 12        316          478       181   204      147
## 13        344          519       197   222      160
## 14        372          559       213   240      173
## 15        400          600       229   258      186
#dev.off()

Key findings

  1. Kohli has a marginally higher average than Tendulkar
  2. Tendulkar has the best strike rate of all the 3.
  3. The cumulative average runs and the performance forecast for Kohli and Gavaskar show an improving trend, while Tendulkar’s numbers deteriorate towards the end of his career
  4. Kohli is fast catching up Tendulkar on cumulative average runs vs innings in career.

B ODI Cricket – Analysis of Tendulkar and Kohli

The functions below get the ODI data for Tendulkar and Kohli as CSV files so that the analyses can be done

22 Get player data for ODIs

tendulkarOD <- getPlayerDataOD(35320,dir=".",file="tendulkarOD.csv",type="batting")
kohliOD <- getPlayerDataOD(253802,dir=".",file="kohliOD.csv",type="batting")

#dev.off()

23a Basic performance of Tendulkar in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkarOD.csv","Tendulkar")
batsmanRunsRanges("./tendulkarOD.csv","Tendulkar")
batsman4s("./tendulkarOD.csv","Tendulkar")
batsman6s("./tendulkarOD.csv","Tendulkar")
batsmanScoringRateODTT("./tendulkarOD.csv","Tendulkar")
#dev.off()

23b. Basic performance of Kohli in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohliOD.csv","Kohli")
batsmanRunsRanges("./kohliOD.csv","Kohli")
batsman4s("./kohliOD.csv","Kohli")
batsman6s("./kohliOD.csv","Kohli")
batsmanScoringRateODTT("./kohliOD.csv","Kohli")
#dev.off()

24. Performance forecast in ODIs

Kohli’s forecasted runs are much higher than Tendulkar’s in ODIs

par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkarOD.csv","Tendulkar")
batsmanPerfForecast("./kohliOD.csv","Kohli")

25. Batting performance

A 3D regression plane is fitted between Balls faced, Minutes at crease and Runs scored.

par(mar=c(4,4,2,2))
battingPerf3d("./tendulkarOD.csv","Tendulkar")
battingPerf3d("./kohliOD.csv","Kohli")

26. Predicting runs scored for the ODI batsmen

Kohli will score runs than Tendulkar for the same minutes at crease and balls faced.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)
tendulkarDF <- batsmanRunsPredict("./tendulkarOD.csv","Tendulkar",newdataframe=newDF)
kohliDF <- batsmanRunsPredict("./kohliOD.csv","Kohli",newdataframe=newDF)
batsmen <-cbind(round(tendulkarDF$Runs),round(kohliDF$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli
## 1          10           30         7     8
## 2          31           51        26    28
## 3          52           72        45    48
## 4          73           93        64    68
## 5          94          114        83    88
## 6         116          136       102   108
## 7         137          157       121   128
## 8         158          178       140   149
## 9         179          199       159   169
## 10        200          220       178   189

27. Runs likelihood for the ODI batsmen

Tendulkar has clusters around 13, 53 and 111 runs while Kohli has clusters around 13, 63,116. So it more likely that Kohli will tend to score higher

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkarOD.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 18.09 % likelihood that Tendulkar  will make  111 Runs in  118 balls over 172  Minutes 
## There is a 28.39 % likelihood that Tendulkar  will make  53 Runs in  63 balls over  95  Minutes 
## There is a 53.52 % likelihood that Tendulkar  will make  13 Runs in  18 balls over 27  Minutes
batsmanRunsLikelihood("./kohliOD.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 31.41 % likelihood that Kohli  will make  63 Runs in  69 balls over 97  Minutes 
## There is a 49.74 % likelihood that Kohli  will make  13 Runs in  18 balls over  24  Minutes 
## There is a 18.85 % likelihood that Kohli  will make  116 Runs in  113 balls over 163  Minutes

28. Runs in different venues for the ODI batsmen

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsGround("./kohliOD.csv","Kohli")

28. Runs against different opposition for the ODI batsmen

Tendulkar’s has 50+ average against Bermuda, Kenya and Namibia. While Kohli has a 50+ average against New Zealand, West Indies, South Africa, Zimbabwe and Bangladesh

par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohliOD.csv","Kohli")

29. Moving average of runs for the ODI batsmen

Tendulkar’s moving average shows an improvement (50+) towards the end of his career, but Kohli shows a marked increase 60+ currently

par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkarOD.csv","Tendulkar")
batsmanMovingAverage("./kohliOD.csv","Kohli")

30. Cumulative average runs of ODI batsmen

Tendulkar plateaus at 40+ while Kohli’s cumulative average runs goes up and up!!!

par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohliOD.csv","Kohli")

31 Cumulative strike rate of ODI batsmen

par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohliOD.csv","Kohli")

32. Relative batsmen strike rate

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanSRODTT(frames,names)
#dev.off()

33. Relative Run Frequency percentages

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeRunsFreqPerfODTT(frames,names)
#dev.off()

34. Relative cumulative average runs of ODI batsmen

Kohli breaks away from Tendulkar in cumulative average runs after 100 innings

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

35. Relative cumulative strike rate of ODI batsmen

This seems to be tussle with Kohli having an edge till about 40 innings and then from 40+ to 180 innings Tendulkar leads. Kohli just seems to be edging forward.

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

36. Batsmen 4s and 6s

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
batsman4s6s(frames,names)
##                Tendulkar Kohli
## Runs(1s,2s,3s)     66.29 69.67
## 4s                 29.65 25.90
## 6s                  4.06  4.43
#dev.off()

37. Check ODI batsmen form

par(mar=c(4,4,2,2))

checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## [1] "**************************** Form status of Tendulkar ********
********************\n\n Population size: 294  Mean of population: 50.48 \n
 Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 \n\n 
Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence
 interval of population average\n Alternative hypothesis 
Ha : Tendulkar 's sample average is below the 95% confidence interval 
of population average\n\n Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ***********
*****************\n\n Population size: 117  Mean of population: 50.35 \n
 Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n 
Null hypothesis H0 : Kohli 's sample average is within 95% confidence 
interval of population average\n Alternative hypothesis 
Ha : Kohli 's sample average is below the 95% confidence interval 
of population average\n\n Kohli 's Form Status: In-Form because 
the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

Key Findings

  1. Kohli has a better performance against oppositions like West Indies, South Africa and New Zealand
  2. Kohli breaks away from Tendulkar in cumulative average runs
  3. Tendulkar has been leading the strike rate rate but Kohli in recent times seems to be breaking loose.

Check out some other players with my R package cricketr

Important note: Do check out my other posts using cricketr at cricketr-posts

Also see

  1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
  2. A primer on Qubits, Quantum gates and Quantum Operations
  3. De-blurring revisited with Wiener filter using OpenCV
  4. Deep Learning from first principles in Python, R and Octave – Part 4
  5. The Many Faces of Latency
  6. Fun simulation of a Chain in Android
  7. Presentation on Wireless Technologies – Part 1
  8. yorkr crashes the IPL party ! – Part 1

To see all posts click Index of posts

My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

The second edition of my book ‘Deep Learning from first principles:Second Edition- In vectorized Python, R and Octave’, is now available on Amazon, in both paperback ($18.99)  and kindle ($9.99/Rs449/-)  versions. Since this book is almost 70% code, all functions, and code snippets have been formatted to use the fixed-width font ‘Lucida Console’. In addition line numbers have been added to all code snippets. This makes the code more organized and much more readable. I have also fixed typos in the book

Untitled

 

The book includes the following chapters

Table of Contents
Preface 4
Introduction 6
1. Logistic Regression as a Neural Network 8
2. Implementing a simple Neural Network 23
3. Building a L- Layer Deep Learning Network 48
4. Deep Learning network with the Softmax 85
5. MNIST classification with Softmax 103
6. Initialization, regularization in Deep Learning 121
7. Gradient Descent Optimization techniques 167
8. Gradient Check in Deep Learning 197
1. Appendix A 214
2. Appendix 1 – Logistic Regression as a Neural Network 220
3. Appendix 2 - Implementing a simple Neural Network 227
4. Appendix 3 - Building a L- Layer Deep Learning Network 240
5. Appendix 4 - Deep Learning network with the Softmax 259
6. Appendix 5 - MNIST classification with Softmax 269
7. Appendix 6 - Initialization, regularization in Deep Learning 302
8. Appendix 7 - Gradient Descent Optimization techniques 344
9. Appendix 8 – Gradient Check 405
References 475

Also see
1. My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon
2. The 3rd paperback & kindle editions of my books on Cricket, now on Amazon
3. De-blurring revisited with Wiener filter using OpenCV
4. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
5. A Cloud medley with IBM Bluemix, Cloudant DB and Node.js
6. Practical Machine Learning with R and Python – Part 6
7. GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables
8. Fun simulation of a Chain in Android

To see posts click Index of Posts

My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon

Note: The 3rd edition of this book is now available My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon

The third edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback ($12.99) and kindle ($9.99/Rs449) versions.  This second edition includes more content,  extensive comments and formatting for better readability.

In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code.
1. Practical machine with R and Python: Third Edition – Machine Learning in Stereo(Paperback-$12.99)
2. Practical machine with R and Third Edition – Machine Learning in Stereo(Kindle- $9.99/Rs449)

This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Table of Contents
Preface …………………………………………………………………………….4
Introduction ………………………………………………………………………6
1. Essential R ………………………………………………………………… 8
2. Essential Python for Datascience ……………………………………………57
3. R vs Python …………………………………………………………………81
4. Regression of a continuous variable ……………………………………….101
5. Classification and Cross Validation ………………………………………..121
6. Regression techniques and regularization ………………………………….146
7. SVMs, Decision Trees and Validation curves ………………………………191
8. Splines, GAMs, Random Forests and Boosting ……………………………222
9. PCA, K-Means and Hierarchical Clustering ………………………………258
References ……………………………………………………………………..269

Pick up your copy today!!
Hope you have a great time learning as I did while implementing these algorithms!

The 3rd paperback & kindle editions of my books on Cricket, now on Amazon

The 3rd  paperback & kindle edition of both my books on cricket is now available on Amazon

a) Cricket analytics with cricketr, Third Edition. The paperback edition is $12.99 and the kindle edition is $4.99/Rs320.  This book is based on my R package ‘cricketr‘, available on CRAN and uses ESPN Cricinfo Statsguru

b) Beaten by sheer pace! Cricket analytics with yorkr, 3rd edition . The paperback is $12.99 and the kindle version is $6.99/Rs448. This is based on my R package ‘yorkr‘ on CRAN and uses data from Cricsheet
Pick up your copies today!!

Note: In the 3rd edition of  the paperback book, the charts will be in black and white. If you would like the charts to be in color, please check out the 2nd edition of these books see More book, more cricket! 2nd edition of my books now on Amazon

You may also like
1. My book ‘Practical Machine Learning with R and Python’ on Amazon
2. A crime map of India in R: Crimes against women
3.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
4.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
5. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
6. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data

To see all posts see Index of posts