# Cricpy takes guard for the Twenty20s

There are two ways to write error-free programs; only the third one works.”” Alan J. Perlis

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to produce bigger and better idiots. So far, the universe is winning. ” Rick Cook

My software never has bugs. It just develops random features.” Anon

If you make an ass out of yourself, there will always be someone to ride you.” Bruce Lee

# Introduction

This is the 3rd and final post on cricpy, and is a continuation to my 2 earlier posts

Cricpy, is the python avatar of my R package ‘cricketr’. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers

With this post  cricpy, like cricketr, now becomes omnipotent, and is now capable of handling Test, ODI and T20 matches.

Cricpy uses the statistics info available in ESPN Cricinfo Statsguru.

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

This post is also hosted on Rpubs at Cricpy takes guard for the Twenty 20s. You can also download the pdf version of this post at cricpy-TT.pdf

You can fork/clone the package at Github cricpy

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

# The cricpy package

The data for a particular player in Twenty20s can be obtained with the getPlayerDataTT() function. To do this you will need to go to T20 Batting and T20 Bowling and click the player you are interested in This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence,this can be used to get the data for Virat Kohlias shown below

The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the languages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the package at Github cricpy

Note: The charts are self-explanatory and I have not added much of my own interpretation to it. Do look at the plots closely and check out the performances for yourself.

## 1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 

## 2. Invoking functions with Python package cricpy

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

# 3. Getting help from cricpy – Python

import cricpy.analytics as ca
help(ca.getPlayerDataTT)
## Help on function getPlayerDataTT in module cricpy.analytics:
##
## getPlayerDataTT(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
##     Get the Twenty20 International player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory~
##
##     Description
##
##     Get the Twenty20 player data given the profile of the batsman/bowler. The allowed inputs are home,away, neutralboth and won,lost,tied or no result of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##
##     Usage
##
##     getPlayerDataTT(profile, opposition="",host="",dir = "./data", file = "player001.csv",
##     type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
##     Arguments
##
##     profile
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virat Kohli this turns out to be 253802 http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
##     opposition
##     The numerical value of the opposition country e.g.Australia,India, England etc. The values are Afghanistan:40,Australia:2,Bangladesh:25,England:1,Hong Kong:19,India:6,Ireland:29, New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Note: If no value is entered for opposition then all teams are considered
##     host
##     The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,New Zealand:5, South Africa:3,Sri Lanka:8,United States of America:11,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
##     dir
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "./data". Default="./data"
##     file
##     Name of the file to store the data into for e.g. kohli.csv. This can be used for subsequent functions. Default="player001.csv"
##     type
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway
##     This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
##     result
##     This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
##     Details
##
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##
##     Value
##
##     Returns the player's dataframe
##
##     Note
##
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##
##     Author(s)
##
##     Tinniam V Ganesh
##
##     References
##
##     http://www.espncricinfo.com/ci/content/stats/index.html
##
##
##     bowlerWktRateTT getPlayerData
##
##     Examples
##
##     ## Not run:
##     # Only away. Get data only for won and lost innings
##     kohli =getPlayerDataTT(253802,dir="../cricketr/data", file="kohli1.csv",
##     type="batting")
##
##     # Get bowling data and store in file for future
##     ashwin = getPlayerDataTT(26421,dir="../cricketr/data",file="ashwin1.csv",
##     type="bowling")
##
##     kohli =getPlayerDataTT(253802,opposition = 2,host=2,dir="../cricketr/data",
##     file="kohli1.csv",type="batting")

The details below will introduce the different functions that are available in cricpy.

## 4. Get the Twenty20 player data for a player using the function getPlayerDataOD()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataTT for all subsequent analyses

import cricpy.analytics as ca
#kohli=ca.getPlayerDataTT(253802,dir=".",file="kohli.csv",type="batting")
#guptill=ca.getPlayerDataTT(226492,dir=".",file="guptill.csv",type="batting")
#mccullum=ca.getPlayerDataTT(37737,dir=".",file="mccullum.csv",type="batting")

Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test, ODI and Twenty20 records

## 5 Virat Kohli’s performance – Basic Analyses

The 3 plots below provide the following for Virat Kohli in T20s

1. Frequency percentage of runs in each run range over the whole career
2. Mean Strike Rate for runs scored in the given range
3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")

ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")

## 6. More analyses

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

ca.batsman6s("./kohli.csv","Virat Kohli")

ca.batsmanDismissals("./kohli.csv","Virat Kohli")

ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")

## 7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

## 8. Average runs at different venues

The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")

## 9. Average runs against different opposing teams

This plot computes the average runs scored by Kohli against different countries.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")

## 10 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")

# 11. A look at the Top 4 batsman – Kohli,  Guptill, Shahzad and McCullum

The following batsmen have been very prolific in Twenty20 cricket and will be used for the analyses

1. Virat Kohli: Runs – 2167, Average:49.25 ,Strike rate-136.11
2. MJ Guptill : Runs -2271, Average:34.4 ,Strike rate-132.88
3. Mohammed Shahzad :Runs – 1936, Average:31.22 ,Strike rate-134.81
4. BB McCullum : Runs – 2140, Average:35.66 ,Strike rate-136.21

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

## 12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")

ca.batsmanPerfBoxHist("./guptill.csv","M J Guptill")

ca.batsmanPerfBoxHist("./shahzad.csv","M Shahzad")

ca.batsmanPerfBoxHist("./mccullum.csv","BB McCullum")

## 13 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 Twenty20 batsmen.

import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")

ca.batsmanMovingAverage("./guptill.csv","M J Guptill")
#ca.batsmanMovingAverage("./shahzad.csv","M Shahzad") # Gives error. Check!

ca.batsmanMovingAverage("./mccullum.csv","BB McCullum")

## 14 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career.Kohli’s average tops around 45 runs around 43 innings, though there is a dip downwards

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeAverageRuns("./guptill.csv","M J Guptill")

ca.batsmanCumulativeAverageRuns("./shahzad.csv","M Shahzad")

ca.batsmanCumulativeAverageRuns("./mccullum.csv","BB McCullum")

## 15 Cumulative Average strike rate of batsman in career

Kohli, Guptill and McCullum average a strike rate of 125+

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeStrikeRate("./guptill.csv","M J Guptill")

ca.batsmanCumulativeStrikeRate("./shahzad.csv","M Shahzad")

ca.batsmanCumulativeStrikeRate("./mccullum.csv","BB McCullum")

## 16 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman. Kohli is way above all the other 3 batsmen. Behind Kohli is McCullum and then Guptill

import cricpy.analytics as ca
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

## 17. Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show that Kohli tops the overall strike rate followed by McCullum and then Guptill

import cricpy.analytics as ca
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

## 18. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

ca.battingPerf3d("./guptill.csv","M J Guptill")

ca.battingPerf3d("./shahzad.csv","M Shahzad")

ca.battingPerf3d("./mccullum.csv","BB McCullum")

## 19. 3D plot of Runs vs Balls Faced and Minutes at Crease

Guptill and McCullum have a large percentage of sixes in comparison to the 4s. Kohli has a relative lower number of 6s

import cricpy.analytics as ca
ca.batsman4s6s(frames,names)

## 20. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
##             BF        Mins        Runs
## 0    10.000000   30.000000   14.753153
## 1    37.857143   70.714286   55.963333
## 2    65.714286  111.428571   97.173513
## 3    93.571429  152.142857  138.383693
## 4   121.428571  192.857143  179.593873
## 5   149.285714  233.571429  220.804053
## 6   177.142857  274.285714  262.014233
## 7   205.000000  315.000000  303.224414
## 8   232.857143  355.714286  344.434594
## 9   260.714286  396.428571  385.644774
## 10  288.571429  437.142857  426.854954
## 11  316.428571  477.857143  468.065134
## 12  344.285714  518.571429  509.275314
## 13  372.142857  559.285714  550.485494
## 14  400.000000  600.000000  591.695674

## 21 Analysis of Top Bowlers

The following 4 bowlers have had an excellent career and will be used for the analysis

1. Shakib Hasan:Wickets: 80, Average = 21.07, Economy Rate – 6.74
2. Mohammed Nabi : Wickets: 67, Average = 24.25, Economy Rate – 7.13
3. Rashid Khan: Wickets: 64, Average = 12.40, Economy Rate – 6.01
4. Imran Tahir : Wickets:62, Average – 14.95, Economy Rate – 6.77

## 22. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#shakib=ca.getPlayerDataTT(56143,dir=".",file="shakib.csv",type="bowling")
#nabi=ca.getPlayerDataOD(25913,dir=".",file="nabi.csv",type="bowling")
#rashid=ca.getPlayerDataOD(793463,dir=".",file="rashid.csv",type="bowling")
#tahir=ca.getPlayerDataOD(40618,dir=".",file="tahir.csv",type="bowling")

## 23. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./shakib.csv","Shakib Al Hasan")

ca.bowlerWktsFreqPercent("./nabi.csv","Mohammad Nabi")

ca.bowlerWktsFreqPercent("./rashid.csv","Rashid Khan")

ca.bowlerWktsFreqPercent("./tahir.csv","Imran Tahir")

## 24. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken.

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./shakib.csv","Shakib Al Hasan")

ca.bowlerWktsRunsPlot("./nabi.csv","Mohammad Nabi")

ca.bowlerWktsRunsPlot("./rashid.csv","Rashid Khan")

ca.bowlerWktsRunsPlot("./tahir.csv","Imran Tahir")

## 25 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues.

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./shakib.csv","Shakib Al Hasan")

ca.bowlerAvgWktsGround("./nabi.csv","Mohammad Nabi")

ca.bowlerAvgWktsGround("./rashid.csv","Rashid Khan")

ca.bowlerAvgWktsGround("./tahir.csv","Imran Tahir")

## 26 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./shakib.csv","Shakib Al Hasan")

ca.bowlerAvgWktsOpposition("./nabi.csv","Mohammad Nabi")

ca.bowlerAvgWktsOpposition("./rashid.csv","Rashid Khan")

ca.bowlerAvgWktsOpposition("./tahir.csv","Imran Tahir")

## 27 Wickets taken moving average

From the plot below it can be see

import cricpy.analytics as ca
ca.bowlerMovingAverage("./shakib.csv","Shakib Al Hasan")

ca.bowlerMovingAverage("./nabi.csv","Mohammad Nabi")

ca.bowlerMovingAverage("./rashid.csv","Rashid Khan")

ca.bowlerMovingAverage("./tahir.csv","Imran Tahir")

## 28 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. Rashid Khan has been the most effective with almost 2.28 wickets per match

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./shakib.csv","Shakib Al Hasan")

ca.bowlerCumulativeAvgWickets("./nabi.csv","Mohammad Nabi")

ca.bowlerCumulativeAvgWickets("./rashid.csv","Rashid Khan")

ca.bowlerCumulativeAvgWickets("./tahir.csv","Imran Tahir")

## 29 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. Rashid Khan has the nest economy rate followed by Mohammed Nabi

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./shakib.csv","Shakib Al Hasan")

ca.bowlerCumulativeAvgEconRate("./nabi.csv","Mohammad Nabi")

ca.bowlerCumulativeAvgEconRate("./rashid.csv","Rashid Khan")

ca.bowlerCumulativeAvgEconRate("./tahir.csv","Imran Tahir")

## 30 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate is given below. It can be seen that Rashid Khan has the best economy rate followed by Mohammed Nabi and then Imran Tahir

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

## 31 Relative Economy Rate against wickets taken

Rashid Khan has the best figures for wickets between 2-3.5 wickets. Mohammed Nabi pips Rashid Khan when takes a haul of 4 wickets.

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlingER(frames,names)

## 32 Relative cumulative average wickets of bowlers in career

Rashid has the best performance with cumulative average wickets. He is followed by Imran Tahir in the wicket haul, followed by Shakib Al Hasan

import cricpy.analytics as ca
frames = ["./shakib.csv","./nabi.csv","./rashid.csv","tahir.csv"]
names = ["Shakib Al Hasan","Mohammad Nabi","Rashid Khan", "Imran Tahir"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

# 33. Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

## Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Kohli, Guptill, Shahzad and McCullum
1.Kohli has the best overall cumulative average runs and towers over everybody else
2. Kohli, Guptill and McCullum has a very good strike rate of around 125+
3. Guptill and McCullum have a larger percentage of sixes as compared to Kohli
4. Rashid Khan has the best cumulative average wickets, followed by Imran Tahir and then Shakib Al Hasan
5. Rashid Khan is the most economical bowler, followed by Mohammed Nabi

You can fork/clone the package at Github cricpy

## Conclusion

Cricpy now has almost all the functions and functionalities of my R package cricketr. There are still a few more features that need to be added to cricpy. I intend to do this as and when I find time.

Go ahead, take cricpy for a spin! Hope you enjoy the ride!

Watch this space!!!

To see all posts click Index of Posts

# Cricpy takes a swing at the ODIs

No computer has ever been designed that is ever aware of what it’s doing; but most of the time, we aren’t either.” Marvin Minksy

“The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague” Edgser Djikstra

# Introduction

In this post, cricpy, the Python avatar of my R package cricketr, learns some new tricks to be able to handle ODI matches. To know more about my R package cricketr see Re-introducing cricketr! : An R package to analyze performances of cricketers

Cricpy uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports only Test cricket

You should be able to install the package using pip install cricpy and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

To know how to use cricpy see Introducing cricpy:A python package to analyze performances of cricketers. To the original version of cricpy, I have added 3 new functions for ODI. The earlier functions work for Test and ODI.

This post is also hosted on Rpubs at Cricpy takes a swing at the ODIs. You can also down the pdf version of this post at cricpy-odi.pdf

You can fork/clone the package at Github cricpy

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricpy-template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. The functions can be executed in RStudio or in a IPython notebook.

# The cricpy package

The data for a particular player in ODI can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Virendar Sehwag, Chris Gayle etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohli’s profile is 253802. This can be used to get the data for Virat Kohlis shown below

The cricpy package is a clone of my R package cricketr. The signature of all the python functions are identical with that of its clone ‘cricketr’, with only the necessary variations between Python and R. It may be useful to look at my post R vs Python: Different similarities and similar differences. In fact if you are familar with one of the lanuguages you can look up the package in the other and you will notice the parallel constructs.

You can fork/clone the package at Github cricpy

Note: The charts are self-explanatory and I have not added much of my owy interpretation to it. Do look at the plots closely and check out the performances for yourself.

## 1 Importing cricpy – Python

# Install the package
# Do a pip install cricpy
# Import cricpy
import cricpy.analytics as ca 

## 2. Invoking functions with Python package crlcpy

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")


# 3. Getting help from cricpy – Python

import cricpy.analytics as ca
help(ca.getPlayerDataOD)
## Help on function getPlayerDataOD in module cricpy.analytics:
##
## getPlayerDataOD(profile, opposition='', host='', dir='./data', file='player001.csv', type='batting', homeOrAway=[1, 2, 3], result=[1, 2, 3, 5], create=True)
##     Get the One day player data from ESPN Cricinfo based on specific inputs and store in a file in a given directory
##
##     Description
##
##     Get the player data given the profile of the batsman. The allowed inputs are home,away or both and won,lost or draw of matches. The data is stored in a <player>.csv file in a directory specified. This function also returns a data frame of the player
##
##     Usage
##
##     getPlayerDataOD(profile, opposition="",host="",dir = "../", file = "player001.csv",
##     type = "batting", homeOrAway = c(1, 2, 3), result = c(1, 2, 3,5))
##     Arguments
##
##     profile
##     This is the profile number of the player to get data. This can be obtained from http://www.espncricinfo.com/ci/content/player/index.html. Type the name of the player and click search. This will display the details of the player. Make a note of the profile ID. For e.g For Virender Sehwag this turns out to be http://www.espncricinfo.com/india/content/player/35263.html. Hence the profile for Sehwag is 35263
##     opposition      The numerical value of the opposition country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,Bermuda:12, England:1,Hong Kong:19,India:6,Ireland:29, Netherlands:15,New Zealand:5,Pakistan:7,Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27, West Indies:4, Zimbabwe:9; Africa XI:405 Note: If no value is entered for opposition then all teams are considered
##     host            The numerical value of the host country e.g.Australia,India, England etc. The values are Australia:2,Bangladesh:25,England:1,India:6,Ireland:29,Malaysia:16,New Zealand:5,Pakistan:7, Scotland:30,South Africa:3,Sri Lanka:8,United Arab Emirates:27,West Indies:4, Zimbabwe:9 Note: If no value is entered for host then all host countries are considered
##     dir
##     Name of the directory to store the player data into. If not specified the data is stored in a default directory "../data". Default="../data"
##     file
##     Name of the file to store the data into for e.g. tendulkar.csv. This can be used for subsequent functions. Default="player001.csv"
##     type
##     type of data required. This can be "batting" or "bowling"
##     homeOrAway
##     This is vector with either or all 1,2, 3. 1 is for home 2 is for away, 3 is for neutral venue
##     result
##     This is a vector that can take values 1,2,3,5. 1 - won match 2- lost match 3-tied 5- no result
##     Details
##
##     More details can be found in my short video tutorial in Youtube https://www.youtube.com/watch?v=q9uMPFVsXsI
##
##     Value
##
##     Returns the player's dataframe
##
##     Note
##
##     Maintainer: Tinniam V Ganesh <tvganesh.85@gmail.com>
##
##     Author(s)
##
##     Tinniam V Ganesh
##
##     References
##
##     http://www.espncricinfo.com/ci/content/stats/index.html
##
##
##     getPlayerDataSp getPlayerData
##
##     Examples
##
##
##     ## Not run:
##     # Both home and away. Result = won,lost and drawn
##     sehwag =getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag1.csv",
##     type="batting", homeOrAway=[1,2],result=[1,2,3,4])
##
##     # Only away. Get data only for won and lost innings
##     sehwag = getPlayerDataOD(35263,dir="../cricketr/data", file="sehwag2.csv",
##     type="batting",homeOrAway=[2],result=[1,2])
##
##     # Get bowling data and store in file for future
##     malinga = getPlayerData(49758,dir="../cricketr/data",file="malinga1.csv",
##     type="bowling")
##
##     # Get Dhoni's ODI record in Australia against Australua
##     dhoni = getPlayerDataOD(28081,opposition = 2,host=2,dir=".",
##     file="dhoniVsAusinAusOD",type="batting")
##
##     ## End(Not run)

The details below will introduce the different functions that are available in cricpy.

## 4. Get the ODI player data for a player using the function getPlayerDataOD()

Important Note This needs to be done only once for a player. This function stores the player’s data in the specified CSV file (for e.g. kohli.csv as above) which can then be reused for all other functions). Once we have the data for the players many analyses can be done. This post will use the stored CSV file obtained with a prior getPlayerDataOD for all subsequent analyses

import cricpy.analytics as ca
#sehwag=ca.getPlayerDataOD(35263,dir=".",file="sehwag.csv",type="batting")
#kohli=ca.getPlayerDataOD(253802,dir=".",file="kohli.csv",type="batting")
#jayasuriya=ca.getPlayerDataOD(49209,dir=".",file="jayasuriya.csv",type="batting")
#gayle=ca.getPlayerDataOD(51880,dir=".",file="gayle.csv",type="batting")

Included below are some of the functions that can be used for ODI batsmen and bowlers. For this I have chosen, Virat Kohli, ‘the run machine’ who is on-track for breaking many of the Test & ODI records

## 5 Virat Kohli’s performance – Basic Analyses

The 3 plots below provide the following for Virat Kohli

1. Frequency percentage of runs in each run range over the whole career
2. Mean Strike Rate for runs scored in the given range
3. A histogram of runs frequency percentages in runs ranges
import cricpy.analytics as ca
import matplotlib.pyplot as plt
ca.batsmanRunsFreqPerf("./kohli.csv","Virat Kohli")

ca.batsmanMeanStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanRunsRanges("./kohli.csv","Virat Kohli")

## 6. More analyses

import cricpy.analytics as ca
ca.batsman4s("./kohli.csv","Virat Kohli")

ca.batsman6s("./kohli.csv","Virat Kohli")

ca.batsmanDismissals("./kohli.csv","Virat Kohli")

ca.batsmanScoringRateODTT("./kohli.csv","Virat Kohli")

## 7. 3D scatter plot and prediction plane

The plots below show the 3D scatter plot of Kohli’s Runs versus Balls Faced and Minutes at crease. A linear regression plane is then fitted between Runs and Balls Faced + Minutes at crease

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

## Average runs at different venues

The plot below gives the average runs scored by Kohli at different grounds. The plot also the number of innings at each ground as a label at x-axis.

import cricpy.analytics as ca
ca.batsmanAvgRunsGround("./kohli.csv","Virat Kohli")

## 9. Average runs against different opposing teams

This plot computes the average runs scored by Kohli against different countries.

import cricpy.analytics as ca
ca.batsmanAvgRunsOpposition("./kohli.csv","Virat Kohli")

## 10 . Highest Runs Likelihood

The plot below shows the Runs Likelihood for a batsman. For this the performance of Kohli is plotted as a 3D scatter plot with Runs versus Balls Faced + Minutes at crease. K-Means. The centroids of 3 clusters are computed and plotted. In this plot Kohli’s highest tendencies are computed and plotted using K-Means

import cricpy.analytics as ca
ca.batsmanRunsLikelihood("./kohli.csv","Virat Kohli")

# A look at the Top 4 batsman – Kohli, Jayasuriya, Sehwag and Gayle

The following batsmen have been very prolific in ODI cricket and will be used for the analyses

1. Virat Kohli: Runs – 10232, Average:59.83 ,Strike rate-92.88
2. Sanath Jayasuriya : Runs – 13430, Average:32.36 ,Strike rate-91.2
3. Virendar Sehwag :Runs – 8273, Average:35.05 ,Strike rate-104.33
4. Chris Gayle : Runs – 9727, Average:37.12 ,Strike rate-85.82

The following plots take a closer at their performances. The box plots show the median the 1st and 3rd quartile of the runs

## 12. Box Histogram Plot

This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency

import cricpy.analytics as ca
ca.batsmanPerfBoxHist("./kohli.csv","Virat Kohli")

ca.batsmanPerfBoxHist("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanPerfBoxHist("./gayle.csv","Chris Gayle")

ca.batsmanPerfBoxHist("./sehwag.csv","Virendar Sehwag")

## 13 Moving Average of runs in career

Take a look at the Moving Average across the career of the Top 4 (ignore the dip at the end of all plots. Need to check why this is so!). Kohli’s performance has been steadily improving over the years, so has Sehwag. Gayle seems to be on the way down

import cricpy.analytics as ca
ca.batsmanMovingAverage("./kohli.csv","Virat Kohli")

ca.batsmanMovingAverage("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanMovingAverage("./gayle.csv","Chris Gayle")

ca.batsmanMovingAverage("./sehwag.csv","Virendar Sehwag")

## 14 Cumulative Average runs of batsman in career

This function provides the cumulative average runs of the batsman over the career. Kohli seems to be getting better with time and reaches a cumulative average of 45+. Sehwag improves with time and reaches around 35+. Chris Gayle drops from 42 to 35

import cricpy.analytics as ca
ca.batsmanCumulativeAverageRuns("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeAverageRuns("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeAverageRuns("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeAverageRuns("./sehwag.csv","Virendar Sehwag")

## 15 Cumulative Average strike rate of batsman in career

Sehwag has the best strike rate of almost 90. Kohli and Jayasuriya have a cumulative strike rate of 75.

import cricpy.analytics as ca
ca.batsmanCumulativeStrikeRate("./kohli.csv","Virat Kohli")

ca.batsmanCumulativeStrikeRate("./jayasuriya.csv","Sanath jayasuriya")

ca.batsmanCumulativeStrikeRate("./gayle.csv","Chris Gayle")

ca.batsmanCumulativeStrikeRate("./sehwag.csv","Virendar Sehwag")

## 16 Relative Batsman Cumulative Average Runs

The plot below compares the Relative cumulative average runs of the batsman . It can be seen that Virat Kohli towers above all others in the runs. He is followed by Chris Gayle and then Sehwag

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeAvgRuns(frames,names)

## Relative Batsman Strike Rate

The plot below gives the relative Runs Frequency Percentages for each 10 run bucket. The plot below show Sehwag has the best strike rate, followed by Jayasuriya

import cricpy.analytics as ca
frames = ["./sehwag.csv","./gayle.csv","./jayasuriya.csv","./kohli.csv"]
names = ["Sehwag","Gayle","Jayasuriya","Kohli"]
ca.relativeBatsmanCumulativeStrikeRate(frames,names)

## 18. 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A 3D prediction plane is fitted

import cricpy.analytics as ca
ca.battingPerf3d("./kohli.csv","Virat Kohli")

ca.battingPerf3d("./jayasuriya.csv","Sanath jayasuriya")

ca.battingPerf3d("./gayle.csv","Chris Gayle")

ca.battingPerf3d("./sehwag.csv","Virendar Sehwag")

## 3D plot of Runs vs Balls Faced and Minutes at Crease

From the plot below it can be seen that Sehwag has more runs by way of 4s than 1’s,2’s or 3s. Gayle and Jayasuriya have large number of 6s

import cricpy.analytics as ca
frames = ["./sehwag.csv","./kohli.csv","./gayle.csv","./jayasuriya.csv"]
names = ["Sehwag","Kohli","Gayle","Jayasuriya"]
ca.batsman4s6s(frames,names)

## 20. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

import cricpy.analytics as ca
import numpy as np
import pandas as pd
BF = np.linspace( 10, 400,15)
Mins = np.linspace( 30,600,15)
newDF= pd.DataFrame({'BF':BF,'Mins':Mins})
kohli= ca.batsmanRunsPredict("./kohli.csv",newDF,"Kohli")
print(kohli)
##             BF        Mins        Runs
## 0    10.000000   30.000000    6.807407
## 1    37.857143   70.714286   36.034833
## 2    65.714286  111.428571   65.262259
## 3    93.571429  152.142857   94.489686
## 4   121.428571  192.857143  123.717112
## 5   149.285714  233.571429  152.944538
## 6   177.142857  274.285714  182.171965
## 7   205.000000  315.000000  211.399391
## 8   232.857143  355.714286  240.626817
## 9   260.714286  396.428571  269.854244
## 10  288.571429  437.142857  299.081670
## 11  316.428571  477.857143  328.309096
## 12  344.285714  518.571429  357.536523
## 13  372.142857  559.285714  386.763949
## 14  400.000000  600.000000  415.991375

The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease.

## 21 Analysis of Top Bowlers

The following 4 bowlers have had an excellent career and will be used for the analysis

1. Muthiah Muralitharan:Wickets: 534, Average = 23.08, Economy Rate – 3.93
2. Wasim Akram : Wickets: 502, Average = 23.52, Economy Rate – 3.89
3. Shaun Pollock: Wickets: 393, Average = 24.50, Economy Rate – 3.67
4. Javagal Srinath : Wickets:315, Average – 28.08, Economy Rate – 4.44

How do Muralitharan, Akram, Pollock and Srinath compare with one another with respect to wickets taken and the Economy Rate. The next set of plots compute and plot precisely these analyses.

## 22. Get the bowler’s data

This plot below computes the percentage frequency of number of wickets taken for e.g 1 wicket x%, 2 wickets y% etc and plots them as a continuous line

import cricpy.analytics as ca
#akram=ca.getPlayerDataOD(43547,dir=".",file="akram.csv",type="bowling")
#murali=ca.getPlayerDataOD(49636,dir=".",file="murali.csv",type="bowling")
#pollock=ca.getPlayerDataOD(46774,dir=".",file="pollock.csv",type="bowling")
#srinath=ca.getPlayerDataOD(34105,dir=".",file="srinath.csv",type="bowling")

## 23. Wicket Frequency Plot

This plot below plots the frequency of wickets taken for each of the bowlers

import cricpy.analytics as ca
ca.bowlerWktsFreqPercent("./murali.csv","M Muralitharan")

ca.bowlerWktsFreqPercent("./akram.csv","Wasim Akram")

ca.bowlerWktsFreqPercent("./pollock.csv","Shaun Pollock")

ca.bowlerWktsFreqPercent("./srinath.csv","J Srinath")

## 24. Wickets Runs plot

The plot below create a box plot showing the 1st and 3rd quartile of runs conceded versus the number of wickets taken. Murali’s median runs for wickets ia around 40 while Akram, Pollock and Srinath it is around 32+ runs. The spread around the median is larger for these 3 bowlers in comparison to Murali

import cricpy.analytics as ca
ca.bowlerWktsRunsPlot("./murali.csv","M Muralitharan")

ca.bowlerWktsRunsPlot("./akram.csv","Wasim Akram")

ca.bowlerWktsRunsPlot("./pollock.csv","Shaun Pollock")

ca.bowlerWktsRunsPlot("./srinath.csv","J Srinath")

## 25 Average wickets at different venues

The plot gives the average wickets taken by Muralitharan at different venues. McGrath best performances are at Centurion, Lord’s and Port of Spain averaging about 4 wickets. Kapil Dev’s does good at Kingston and Wellington. Anderson averages 4 wickets at Dunedin and Nagpur

import cricpy.analytics as ca
ca.bowlerAvgWktsGround("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsGround("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsGround("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsGround("./srinath.csv","J Srinath")

## 26 Average wickets against different opposition

The plot gives the average wickets taken by Muralitharan against different countries. The x-axis also includes the number of innings against each team

import cricpy.analytics as ca
ca.bowlerAvgWktsOpposition("./murali.csv","M Muralitharan")

ca.bowlerAvgWktsOpposition("./akram.csv","Wasim Akram")

ca.bowlerAvgWktsOpposition("./pollock.csv","Shaun Pollock")

ca.bowlerAvgWktsOpposition("./srinath.csv","J Srinath")

## 27 Wickets taken moving average

From the plot below it can be see James Anderson has had a solid performance over the years averaging about wickets

import cricpy.analytics as ca
ca.bowlerMovingAverage("./murali.csv","M Muralitharan")

ca.bowlerMovingAverage("./akram.csv","Wasim Akram")

ca.bowlerMovingAverage("./pollock.csv","Shaun Pollock")

ca.bowlerMovingAverage("./srinath.csv","J Srinath")

## 28 Cumulative average wickets taken

The plots below give the cumulative average wickets taken by the bowlers. Muralitharan has consistently taken wickets at an average of 1.6 wickets per game. Shaun Pollock has an average of 1.5

import cricpy.analytics as ca
ca.bowlerCumulativeAvgWickets("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgWickets("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgWickets("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgWickets("./srinath.csv","J Srinath")

## 29 Cumulative average economy rate

The plots below give the cumulative average economy rate of the bowlers. Pollock is the most economical, followed by Akram and then Murali

import cricpy.analytics as ca
ca.bowlerCumulativeAvgEconRate("./murali.csv","M Muralitharan")

ca.bowlerCumulativeAvgEconRate("./akram.csv","Wasim Akram")

ca.bowlerCumulativeAvgEconRate("./pollock.csv","Shaun Pollock")

ca.bowlerCumulativeAvgEconRate("./srinath.csv","J Srinath")

## 30 Relative cumulative average economy rate of bowlers

The Relative cumulative economy rate shows that Pollock is the most economical of the 4 bowlers. He is followed by Akram and then Murali

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgEconRate(frames,names)

## 31 Relative Economy Rate against wickets taken

Pollock is most economical vs number of wickets taken. Murali has the best figures for 4 wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlingER(frames,names)

## 32 Relative cumulative average wickets of bowlers in career

The plot below shows that McGrath has the best overall cumulative average wickets. While the bowlers are neck to neck around 130 innings, you can see Muralitharan is most consistent and leads the pack after 150 innings in the number of wickets taken.

import cricpy.analytics as ca
frames = ["./srinath.csv","./akram.csv","./murali.csv","pollock.csv"]
names = ["J Srinath","Wasim Akram","M Muralitharan", "S Pollock"]
ca.relativeBowlerCumulativeAvgWickets(frames,names)

# 33. Key Findings

The plots above capture some of the capabilities and features of my cricpy package. Feel free to install the package and try it out. Please do keep in mind ESPN Cricinfo’s Terms of Use.

Here are the main findings from the analysis above

## Analysis of Top 4 batsman

The analysis of the Top 4 test batsman Tendulkar, Kallis, Ponting and Sangakkara show the folliwing

1. Kohli is a mean run machine and has been consistently piling on runs. Clearly records will lay shattered in days to come for Kohli
2. Virendar Sehwag has the best strike rate of the 4, followed by Jayasuriya and then Kohli
3. Shaun Pollock is the most economical of the bowlers followed by Wasim Akram
4. Muralitharan is the most consistent wicket of the lot.

To see all posts click Index of Posts

# Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket

In my recent post, My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI), I had recounted my journey in the domains of of Data Science, Machine Learning (ML), and more recently Deep Learning (DL) all of which are useful while analyzing data. Of late, I have come to the realization that there are many facets to data. And to glean insights from data, Data Science, ML and DL alone are not sufficient and one needs to also have a good handle on linear programming and optimization. My colleague at IBM Research also concurred with this view and told me he had arrived at this conclusion several years ago.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr and Beaten by sheer pace-Cricket analytics with yorkr A must read for any cricket lover! Check it out!! While ML & DL are very useful and interesting to make inferences and predictions of outputs from input variables, optimization computes the choice of input which results in maximizing or minimizing the output. So I made a small course correction and started on a course from India’s own NPTEL Introduction to Linear Programming by Prof G. Srinivasan of IIT Madras (highly recommended!). The lectures are delivered with remarkable clarity by the Prof and I am just about halfway through the course (each lecture is of 50-55 min duration), when I decided that I needed to try to formulate and solve some real world Linear Programming problem. As usual, I turned towards cricket for some appropriate situations, and sure enough it was there in the open. For this LP formulation I take International T20 and IPL, though International ODI will also work equally well. You can download the associated code and data for this from Github at LP-cricket-analysis In T20 matches the captain has to make choice of how to rotate bowlers with the aim of restricting the batting side. Conversely, the batsmen need to take advantage of the bowling strength to maximize the runs scored. Note: a) A simple and obvious strategy would be – If the ith bowler’s economy rate is less than the economy rate of the jth bowler i.e. $er_{i}$ < $er_{j}$ then have bowler ‘i’ to bowl more overs as his/her economy rate is better b)A better strategy would be to consider the economy rate of each bowler against each batsman. How often have we witnessed bowlers with a great bowling average get thrashed time and again by the same batsman, or a bowler who is generally very poor being very effective against a particular batsman. i.e. $er_{ij}$ < $er_{ik}$ where the jth bowler is more effective than the kth bowler against the ith batsman. This now becomes a linear optimization problem as we can have several combinations of number of overs X economy rate for different bowlers and we will have to solve this algorithmically to determine the lowest score for bowling performance or highest score for batting order. This post uses the latter approach to optimize bowling change and batting lineup. Let is take a hypothetical situation Assume there are 3 bowlers – $bwlr_{1},bwlr_{2},bwlr_{3}$ and there are 4 batsmen – $bman_{1},bman_{2},bman_{3},bman_{4}$ Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman. Also if remaining overs for the bowlers are $o_{1},o_{2},o_{3}$ and the total number of overs left to be bowled are $o_{1}+o_{2}+o_{3} = N$ then the question is a) Given the economy rate of each bowler per batsman, how many overs should each bowler bowl, so that the total runs scored by all the batsmen are minimum? b) Alternatively, if the know the individual strike rate of a batsman against the individual bowlers, how many overs should each batsman face with a bowler so that the total runs scored is maximized? ## 1. LP Formulation for bowling order Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman. Objective function : Minimize – $er_{11}*o_{11} + er_{12}*o_{12} +..+er_{1n}*o_{1n}+ er_{21}*o_{21} + er_{22}*o_{22}+.. + er_{22}*o_{2n}+ er_{m1}*o_{m1}+..+ er_{mn}*o_{mn}$ i.e. $\sum_{i=1}^{i=m}\sum_{j=1}^{i=n}er_{ij}*o_{ij}$ Constraints Where $o_{j}$ is the number of overs remaining for the jth bowler against ‘k’ batsmen $o_{j1} + o_{j2} + .. o_{jk} < o_{j}$ and if the total number of overs remaining to be bowled is N then $o_{1} + o_{2} +...+ o_{k} = N$ or $\sum_{j=1}^{j=k} o_{j} =N$ The overs that any bowler can bowl is $o_{j} >=0$ ## 2. LP Formulation for batting lineup Let the strike rate $sr_{ij}$ be the Strike Rate of the ith batsman to the jth bowler Objective function : Maximize – $sr_{11}*o_{11} + sr_{12}*o_{12} +..+ sr_{1n}*o_{1n}+ sr_{21}*o_{21} + sr_{22}*o_{22}+.. sr_{2n}*o_{2n}+ sr_{m1}*o_{m1}+..+ sr_{mn}*o_{mn}$ i.e. $\sum_{i=1}^{i=4}\sum_{j=1}^{i=3}sr_{ij}*o_{ij}$ Constraints Where $o_{j}$ is the number of overs remaining for the jth bowler against ‘k’ batsmen $o_{j1} + o_{j2} + .. o_{jk} < o_{j}$ and the total number of overs remaining to be bowled is N then $o_{1} + o_{2} +...+ o_{k} = N$ or $\sum_{j=1}^{j=k} o_{j} =N$ The overs that any bowler can bowl is $o_{j} >=0$ lpSolveAPI– For this maximization and minimization problem I used lpSolveAPI. Below I take 2 simple examples (example1 & 2) to ensure that my LP formulation and solution is correct before applying it on real T20 cricket data (Intl. T20 and IPL) ## 3. LP formulation (Example 1) Initially I created a test example to ensure that I get the LP formulation and solution correct. Here the er1=4 and er2=3 and o1 & o2 are the overs bowled by bowlers 1 & 2. Also o1+o2=4 In this example as below o1 o2 Obj Fun(=4o1+3o2) 1 3 13 2 2 14 3 1 15 library(lpSolveAPI) library(dplyr) library(knitr) lprec <- make.lp(0, 2) a <-lp.control(lprec, sense="min") set.objfn(lprec, c(4, 3)) # Economy Rate of 4 and 3 for er1 and er2 add.constraint(lprec, c(1, 1), "=",4) # o1 + o2 =4 add.constraint(lprec, c(1, 0), ">",1) # o1 > 1 add.constraint(lprec, c(0, 1), ">",1) # o2 > 1 lprec ## Model name: ## C1 C2 ## Minimize 4 3 ## R1 1 1 = 4 ## R2 1 0 >= 1 ## R3 0 1 >= 1 ## Kind Std Std ## Type Real Real ## Upper Inf Inf ## Lower 0 0 b <-solve(lprec) get.objective(lprec) # 13 ## [1] 13 get.variables(lprec) # 1 3  ## [1] 1 3 Note 1: In the above example 13 runs is the minimum that can be scored and this requires LP solution: Minimum runs=13 • o1=1 • o2=3 Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman. ## 4. LP formulation (Example 2) In this formulation there are 2 bowlers and 2 batsmen o11,o12 are the oves bowled by bowler 1 to batsmen 1 & 2 and o21, o22 are the overs bowled by bowler 2 to batsmen 1 & 2 er11=4, er12=2,er21=2,er22=5 o11+o12+o21+o22=5 The solution for this manually computed is o11, o12, o21, o22 Runs where B11, B12 are the overs bowler 1 bowls to batsman 1 and B21 and B22 are overs bowler 2 bowls to batsman 2 o11 o12 o21 o22 Runs=(4*o11+2*o12+2*o21+5*o22) 1 1 1 2 18 1 2 1 1 15 2 1 1 1 17 1 1 2 1 15 lprec <- make.lp(0, 4) a <-lp.control(lprec, sense="min") set.objfn(lprec, c(4, 2,2,5)) add.constraint(lprec, c(1, 1,0,0), "<=",8) add.constraint(lprec, c(0, 0,1,1), "<=",7) add.constraint(lprec, c(1, 1,1,1), "=",5) add.constraint(lprec, c(1, 0,0,0), ">",1) add.constraint(lprec, c(0, 1,0,0), ">",1) add.constraint(lprec, c(0, 0,1,0), ">",1) add.constraint(lprec, c(0, 0,0,1), ">",1) lprec ## Model name: ## C1 C2 C3 C4 ## Minimize 4 2 2 5 ## R1 1 1 0 0 <= 8 ## R2 0 0 1 1 <= 7 ## R3 1 1 1 1 = 5 ## R4 1 0 0 0 >= 1 ## R5 0 1 0 0 >= 1 ## R6 0 0 1 0 >= 1 ## R7 0 0 0 1 >= 1 ## Kind Std Std Std Std ## Type Real Real Real Real ## Upper Inf Inf Inf Inf ## Lower 0 0 0 0 b<-solve(lprec) get.objective(lprec)  ## [1] 15 get.variables(lprec)  ## [1] 1 2 1 1 Note: In the above example 15 runs is the minimum that can be scored and this requires LP Solution: Minimum runs=15 • o11=1 • o12=2 • o21=1 • o22=1 It is possible to keep the minimum to other values and solves also. ## 5. LP formulation for International T20 India vs Australia (Batting lineup) To analyze batting and bowling lineups in the cricket world I needed to get the ball-by-ball details of runs scored by each batsman against each of the bowlers. Fortunately I had already created this with my R package yorkr. yorkr processes yaml data from Cricsheet. So I copied the data of all matches between Australia and India in International T20s. You can download my processed data for International T20 at Inswinger load("Australia-India-allMatches.RData") dim(matches) ## [1] 3541 25 The following functions compute the ‘Strike Rate’ of a batsman as SR=1/oversRunsScored Also the Economy Rate is computed as ER=1/oversRunsConceded Incidentally the SR=ER # Compute the Strike Rate of the batsman computeSR <- function(batsman1,bowler1){ a <- matches %>% filter(batsman==batsman1 & bowler==bowler1) a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6) a1 } # Compute the Economy Rate of the batsman computeER <- function(batsman1,bowler1){ a <- matches %>% filter(batsman==batsman1 & bowler==bowler1) a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(ER=(totalRuns/count)*6) a1 } Here I compute the Strike Rate of Virat Kohli, Yuvraj Singh and MS Dhoni against Shane Watson, Brett Lee and MA Starc  # Kohli kohliWatson<- computeSR("V Kohli","SR Watson") kohliWatson ## totalRuns count SR ## 1 45 37 7.297297 kohliLee <- computeSR("V Kohli","B Lee") kohliLee ## totalRuns count SR ## 1 10 7 8.571429 kohliStarc <- computeSR("V Kohli","MA Starc") kohliStarc ## totalRuns count SR ## 1 11 9 7.333333 # Yuvraj yuvrajWatson<- computeSR("Yuvraj Singh","SR Watson") yuvrajWatson ## totalRuns count SR ## 1 24 22 6.545455 yuvrajLee <- computeSR("Yuvraj Singh","B Lee") yuvrajLee ## totalRuns count SR ## 1 12 7 10.28571 yuvrajStarc <- computeSR("Yuvraj Singh","MA Starc") yuvrajStarc ## totalRuns count SR ## 1 12 8 9 # MS Dhoni dhoniWatson<- computeSR("MS Dhoni","SR Watson") dhoniWatson ## totalRuns count SR ## 1 33 28 7.071429 dhoniLee <- computeSR("MS Dhoni","B Lee") dhoniLee ## totalRuns count SR ## 1 26 20 7.8 dhoniStarc <- computeSR("MS Dhoni","MA Starc") dhoniStarc ## totalRuns count SR ## 1 11 8 8.25 When we consider the batting lineup, the problem is one of maximization. In the LP formulation below V Kohli has a SR of 7.29, 8.57, 7.33 against Watson, Lee & Starc Yuvraj has a SR of 6.5, 10.28, 9 against Watson, Lee & Starc and Dhoni has a SR of 7.07, 7.8, 8.25 against Watson, Lee and Starc The constraints are Watson, Lee and Starc have 3, 4 & 3 overs remaining respectively. The total number of overs remaining to be bowled is 9.The other constraints could be that a bowler bowls at least 1 over etc. Formulating and solving # 3 batsman x 3 bowlers lprec <- make.lp(0, 9) # Maximization a<-lp.control(lprec, sense="max") # Set the objective function set.objfn(lprec, c(kohliWatson$SR, kohliLee$SR,kohliStarc$SR,
yuvrajWatson$SR,yuvrajLee$SR,yuvrajStarc$SR, dhoniWatson$SR,dhoniLee$SR,dhoniStarc$SR))

#Assume the  bowlers have 3,4,3 overs left respectively
add.constraint(lprec, c(1, 1,1,0,0,0, 0,0,0), "<=",3)
#o11+o12+o13+o21+o22+o23+o31+o32+o33=8 (overs remaining)

add.constraint(lprec, c(1,0,0,0,0,0,0,0,0), ">=",1) #o11 >=1
add.constraint(lprec, c(0,1,0,0,0,0,0,0,0), ">=",0) #o12 >=0
add.constraint(lprec, c(0,0,1,0,0,0,0,0,0), ">=",0) #o13 >=0
add.constraint(lprec, c(0,0,0,1,0,0,0,0,0), ">=",1) #o21 >=1
add.constraint(lprec, c(0,0,0,0,1,0,0,0,0), ">=",1) #o22 >=1
add.constraint(lprec, c(0,0,0,0,0,1,0,0,0), ">=",0) #o23 >=0
add.constraint(lprec, c(0,0,0,0,0,0,1,0,0), ">=",1) #o31 >=1
add.constraint(lprec, c(0,0,0,0,0,0,0,1,0), ">=",0) #o32 >=0
add.constraint(lprec, c(0,0,0,0,0,0,0,0,1), ">=",0) #o33 >=0

lprec
## Model name:
##   a linear program with 9 decision variables and 13 constraints
b <-solve(lprec)
get.objective(lprec) #  
## [1] 77.16418
get.variables(lprec) # 
## [1] 1 2 0 1 3 0 1 0 1

This shows that the maximum runs that can be scored for the current strike rate is 77.16   runs in 9 overs The breakup is as follows

This is also shown below

get.variables(lprec) # 
## [1] 1 2 0 1 3 0 1 0 1

This is also shown below

e <- as.data.frame(rbind(c(1,2,0,3),c(1,3,0,4),c(1,0,1,2)))
names(e) <- c("S Watson","B Lee","MA Starc","Overs")
rownames(e) <- c("Kohli","Yuvraj","Dhoni")
e

LP Solution:
Maximum runs that can be scored by India against Australia is:77.164 if the 9 overs to be faced by the batsman are as below

##        S Watson B Lee MA Starc Overs
## Kohli         1     2        0     3
## Yuvraj        1     3        0     4
## Dhoni         1     0        1     2
#Total overs=9

Note: This assumes that the batsmen perform at their current Strike Rate. Howvever anything can happen in a real game, but nevertheless this is a fairly reasonable estimate of the performance

Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman.

Note 3:You could try other combinations of overs for the above SR. For the above constraints 77.16 is the highest score for the given number of overs

## 6. LP formulation for International T20 India vs Australia (Bowling lineup)

For this I compute how the bowling should be rotated between R Ashwin, RA Jadeja and JJ Bumrah when taking into account their performance against batsmen like Shane Watson, AJ Finch and David Warner. For the bowling performance I take the Economy rate of the bowlers. The data is the same as above

computeSR <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6)
a1
}
jadejaWatson
##   totalRuns count       ER
## 1        60    29 12.41379
jadejaFinch <- computeER("AJ Finch","RA Jadeja")
jadejaFinch
##   totalRuns count       ER
## 1        36    33 6.545455
jadejaWarner <- computeER("DA Warner","RA Jadeja")
jadejaWarner
##   totalRuns count       ER
## 1        23    11 12.54545
# Ashwin
ashwinWatson<- computeER("SR Watson","R Ashwin")
ashwinWatson
##   totalRuns count       ER
## 1        41    26 9.461538
ashwinFinch <- computeER("AJ Finch","R Ashwin")
ashwinFinch
##   totalRuns count   ER
## 1        63    36 10.5
ashwinWarner <- computeER("DA Warner","R Ashwin")
ashwinWarner
##   totalRuns count       ER
## 1        38    28 8.142857
# JJ Bunrah
bumrahWatson<- computeER("SR Watson","JJ Bumrah")
bumrahWatson
##   totalRuns count  ER
## 1        22    20 6.6
bumrahFinch <- computeER("AJ Finch","JJ Bumrah")
bumrahFinch
##   totalRuns count       ER
## 1        25    19 7.894737
bumrahWarner <- computeER("DA Warner","JJ Bumrah")
bumrahWarner
##   totalRuns count ER
## 1         2     4  3

As can be seen from above RA Jadeja has a ER of 12.4, 6.54, 12.54 against Watson, AJ Finch and Warner also Ashwin has a ER of 9.46, 10.5, 8.14 against Watson, Finch and Warner. Similarly Bumrah has an ER of 6.6,7.89, 3 against Watson, Finch and Warner
The constraints are Jadeja, Ashwin and Bumrah have 4, 3 & 4 overs remaining and the total overs remaining to be bowled is 10.

Formulating solving the bowling lineup is shown below

lprec <- make.lp(0, 9)
a <-lp.control(lprec, sense="min")

# Set the objective function
set.objfn(lprec, c(jadejaWatson$ER, jadejaFinch$ER,jadejaWarner$ER, ashwinWatson$ER,ashwinFinch$ER,ashwinWarner$ER,
bumrahWatson$ER,bumrahFinch$ER,bumrahWarner$ER)) add.constraint(lprec, c(1, 1,1,0,0,0, 0,0,0), "<=",4) # Jadeja has 4 overs add.constraint(lprec, c(0,0,0,1,1,1,0,0,0), "<=",3) # Ashwin has 3 overs left add.constraint(lprec, c(0,0,0,0,0,0,1,1,1), "<=",4) # Bumrah has 4 overs left add.constraint(lprec, c(1,1,1,1,1,1,1,1,1), "=",10) # Total overs = 10 add.constraint(lprec, c(1,0,0,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,1,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,1,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,1,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,1,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,1,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,1,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,1,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,0,0,1), ">=",0) lprec ## Model name: ## a linear program with 9 decision variables and 13 constraints b <-solve(lprec) get.objective(lprec) #  ## [1] 73.58775 get.variables(lprec) #  ## [1] 1 2 1 0 1 1 0 1 3 The minimum runs that will be conceded by these 3 bowlers in 10 overs is 73.58 assuming the bowling is rotated as follows e <- as.data.frame(rbind(c(1,0,0),c(2,1,1),c(1,1,3),c(4,2,4))) names(e) <- c("RA Jadeja","R Ashwin","JJ Bumrah") rownames(e) <- c("S Watson","AJ Finch","DA Warner","Overs") e  LP Solution: Minimum runs that will be conceded by India against Australia is 73.58 in 10 overs if the overs bowled are as follows ## RA Jadeja R Ashwin JJ Bumrah ## S Watson 1 0 0 ## AJ Finch 2 1 1 ## DA Warner 1 1 3 ## Overs 4 2 4 #Total overs=10  ## 7. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Bowling lineup) As in the case of International T20s I also have processed IPL data derived from my R package yorkr. yorkr. yorkr processes yaml data from Cricsheet. The processed data for all IPL matches can be downloaded from GooglyPlus load("Mumbai Indians-Kolkata Knight Riders-allMatches.RData") dim(matches) ## [1] 4237 25 # Compute the Economy Rate of the Mumbai Indian bowlers against Kolkata Knight Riders # Gambhir gambhirMalinga <- computeER("G Gambhir","SL Malinga") gambhirHarbhajan <- computeER("G Gambhir","Harbhajan Singh") gambhirPollard <- computeER("G Gambhir","KA Pollard") #Yusuf Pathan yusufMalinga <- computeER("YK Pathan","SL Malinga") yusufHarbhajan <- computeER("YK Pathan","Harbhajan Singh") yusufPollard <- computeER("YK Pathan","KA Pollard") #JH Kallis kallisMalinga <- computeER("JH Kallis","SL Malinga") kallisHarbhajan <- computeER("JH Kallis","Harbhajan Singh") kallisPollard <- computeER("JH Kallis","KA Pollard") #RV Uthappa uthappaMalinga <- computeER("RV Uthappa","SL Malinga") uthappaHarbhajan <- computeER("RV Uthappa","Harbhajan Singh") uthappaPollard <- computeER("RV Uthappa","KA Pollard") Here gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the ER of Malinga against Gambhir, Yusuf Pathan, Kallis and Uthappa gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the ER of Harbhajan against Gambhir, Yusuf Pathan, Kallis and Uthappa gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the ER of Kieron Pollard against Gambhir, Yusuf Pathan, Kallis and Uthappa The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10. Formulating and solving this for the bowling lineup of Mumbai Indians against Kolkata Knight Riders  library("lpSolveAPI") lprec <- make.lp(0, 12) a=lp.control(lprec, sense="min") set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER, gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER, gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER)) add.constraint(lprec, c(1,1,1,1, 0,0,0,0, 0,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,1,1,1,1,0,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,0,0,0,0,1,1,1,1), "<=",4) add.constraint(lprec, c(1,1,1,1,1,1,1,1,1,1,1,1), "=",10) add.constraint(lprec, c(1,0,0,0,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,1,0,0,0,0,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,1,0,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,1,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,1,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,1,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,1,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,1,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,0,0,1,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,1,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,0,1,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,0,0,1), ">=",0) lprec ## Model name: ## a linear program with 12 decision variables and 16 constraints  b=solve(lprec) get.objective(lprec) #  ## [1] 55.57887  get.variables(lprec) #  ## [1] 3 1 0 0 0 1 0 1 3 1 0 0 e <- as.data.frame(rbind(c(3,1,0,0,4),c(0, 1, 0,1,2),c(3, 1, 0,0,4))) names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs") rownames(e) <- c("Malinga","Harbhajan","Pollard") e LP Solution: Mumbai Indians can restrict Kolkata Knight Riders to 55.87 in 10 overs if the overs are bowled as below ## Gambhir Yusuf Kallis Uthappa Overs ## Malinga 3 1 0 0 4 ## Harbhajan 0 1 0 1 2 ## Pollard 3 1 0 0 4 #Total overs=10  ## 8. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Batting lineup) As I mentioned it is possible to perform a maximation with the same formulation since computeSR<==>computeER This just flips the problem around and computes the maximum runs that can be scored for the batsman’s Strike rate (this is same as the bowler’s Economy rate) i.e. gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Malinga gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Harbhajan gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Kieron Pollard. The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10.  library("lpSolveAPI") lprec <- make.lp(0, 12) a=lp.control(lprec, sense="max") a <-set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER, gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER, gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER)) add.constraint(lprec, c(1,1,1,1, 0,0,0,0, 0,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,1,1,1,1,0,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,0,0,0,0,1,1,1,1), "<=",4) add.constraint(lprec, c(1,1,1,1,1,1,1,1,1,1,1,1), "=",11) add.constraint(lprec, c(1,0,0,0,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,1,0,0,0,0,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,1,0,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,1,0,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,1,0,0,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,1,0,0,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,1,0,0,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,1,0,0,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,0,0,1,0,0,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,1,0,0), ">=",1) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,0,1,0), ">=",0) add.constraint(lprec, c(0,0,0,0,0,0,0,0,0,0,0,1), ">=",0) lprec ## Model name: ## a linear program with 12 decision variables and 16 constraints  b=solve(lprec) get.objective(lprec) #  ## [1] 94.22649  get.variables(lprec) #  ## [1] 0 3 0 0 0 1 0 3 0 1 3 0 e <- as.data.frame(rbind(c(0,3,0,0,3),c(0, 1, 0,3,4),c(0, 1, 3,0,4))) names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs") rownames(e) <- c("Malinga","Harbhajan","Pollard") e LP Solution: Kolkata Knight Riders can score a maximum of 94.22 in 11 overs against Mumbai Indians if the the number of overs KKR face is as below ## Gambhir Yusuf Kallis Uthappa Overs ## Malinga 0 3 0 0 3 ## Harbhajan 0 1 0 3 4 ## Pollard 0 1 3 0 4 #Total overs=11  Conclusion: It is possible to thus determine the optimum no of overs to give to a specific bowler based on his/her Economy Rate with a particular batsman. Similarly one can determine the maximum runs that can be scored by a batsmen based on their strike rate with bowlers. Cricket like many other games is a game of strategy, skill, talent and some amount of luck. So while the LP formulation can provide some direction, one must be aware anything could happen in a game of cricket! Thoughts, comments, suggestions welcome! To see all posts see Index of Posts # cricketr flexes new muscles: The final analysis Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.  Jabberwocky by Lewis Carroll  No analysis of cricket is complete, without determining how players would perform in the host country. Playing Test cricket on foreign pitches, in the host country, is a ‘real test’ for both batsmen and bowlers. Players, who can perform consistently both on domestic and foreign pitches are the genuinely ‘class’ players. Player performance on foreign pitches lets us differentiate the paper tigers, and home ground bullies among batsmen. Similarly, spinners who perform well, only on rank turners in home ground or pace bowlers who can only swing and generate bounce on specially prepared pitches are neither genuine spinners nor real pace bowlers. So this post, helps in identifying those with real strengths, and those who play good only when the conditions are in favor, in home grounds. This post brings a certain level of finality to the analysis of players with my R package ‘cricketr’ Besides, I also meant ‘final analysis’ in the literal sense, as I intend to take a long break from cricket analysis/analytics and focus on some other domains like Neural Networks, Deep Learning and Spark. If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

As already mentioned, my R package ‘cricketr’ uses the statistics info available in ESPN Cricinfo Statsguru. You should be able to install the package from CRAN and use many of the functions available in the package. Please be mindful of ESPN Cricinfo Terms of Use

(Note: This page is also hosted at RPubs as cricketrFinalAnalysis. You can download the PDF file at cricketrFinalAnalysis.

For getting data of a player against a particular country for the match played in the host country, I just had to add 2 extra parameters to the getPlayerData() function. The cricketr package has been updated with the changed functions for getPlayerData() – Tests, getPlayerDataOD() – ODI and getPlayerDataTT() for the Twenty20s. The updated functions will be available in cricketr Version -0.0.14

The data for the following players have already been obtained with the new, changed getPlayerData() function and have been saved as *.csv files. I will be re-using these files, instead of getting them all over again. Hence the getPlayerData() lines have been commented below

library(cricketr)

#### 1. Performance of a batsman against a host ountry in the host country

For e.g We can the get the data for Sachin Tendulkar for matches played against Australia and in Australia Here opposition=2 and host =2 indicate that the opposition is Australia and the host country is also Australia

#tendulkarAus=getPlayerData(35320,opposition=2,host=2,file="tendulkarVsAusInAus.csv",type="batting")

All cricketr functions can be used with this data frame, as before. All the charts show the performance of Tendulkar in Australia against Australia.

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsman6s("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanRunsRanges("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanDismissals("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanAvgRunsGround("./data/tendulkarVsAusInAus.csv","Tendulkar")
batsmanMovingAverage("./data/tendulkarVsAusInAus.csv","Tendulkar")

dev.off()
## null device
##           1

# 2. Relative performances of international batsmen against England in England

While we can analyze the performance of a player against an opposition in some host country, I wanted to compare the relative performances of players, to see how players from different nations play in a host country which is not their home ground.

The following lines gets player’s data of matches played in England and against England.The Oval, Lord’s are famous for generating some dangerous swing and bounce. I chose the following players

1. Sir Don Bradman (Australia)
2. Steve Waugh (Australia)
3. Rahul Dravid (India)
4. Vivian Richards (West Indies)
5. Sachin Tendulkar (India)
#tendulkarEng=getPlayerData(35320,opposition=1,host=1,file="tendulkarVsEngInEng.csv",type="batting")
#srwaughEng=getPlayerData(8192,opposition=1,host=1,file="srwaughVsEngInEng.csv",type="batting")
#dravidEng=getPlayerData(28114,opposition=1,host=1,file="dravidVsEngInEng.csv",type="batting")
#vrichardEng=getPlayerData(52812,opposition=1,host=1,file="vrichardsEngInEng.csv",type="batting")
frames <- list("./data/tendulkarVsEngInEng.csv","./data/bradmanVsEngInEng.csv","./data/srwaughVsEngInEng.csv",
"./data/dravidVsEngInEng.csv","./data/vrichardsEngInEng.csv")
names <- list("S Tendulkar","D Bradman","SR Waugh","R Dravid","Viv Richards")

The Lords and the Oval in England are some of the best pitches in the world. Scoring on these pitches and weather conditions, where there is both swing and bounce really requires excellent batting skills. It can be easily seen that Don Bradman stands heads and shoulders over everybody else, averaging close a cumulative average of 100+. He is followed by Viv Richards, who averages around ~60. Interestingly in English conditions, Rahul Dravid edges out Sachin Tendulkar.

relativeBatsmanCumulativeAvgRuns(frames,names)

# The other 2 plots on relative strike rate and cumulative average strike rate,
shows Viv Richards really  blasts the bowling. Viv Richards has a strike rate
of 70, while Bradman 62+, followed by Tendulkar.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

### 3. Relative performances of international batsmen against Australia in Australia

The following players from these countries were chosen

1. Sachin Tendulkar (India)
2. Viv Richard (West Indies)
3. David Gower (England)
4. Jacques Kallis (South Africa)
5. Alastair Cook (Emgland)
frames <- list("./data/tendulkarVsAusInAus.csv","./data/vrichardsVAusInAus.csv","./data/dgowerVsAusInAus.csv",
"./data/kallisVsAusInAus.csv","./data/ancookVsWIInWI.csv")
names <- list("S Tendulkar","Viv Richards","David Gower","J Kallis","AN Cook")

Alastair Cook of England has fantastic cumulative average of 55+ on the pitches of Australia. There is a dip towards the end, but we cannot predict whether it would have continued. AN Cook is followed by Tendulkar who has a steady average of 50+ runs, after which there is Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to cumulative or relative strike rate Viv Richards is a class apart.He seems to really
#tear into bowlers. David Gower has an excellent strike rate and is followed by Tendulkar
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

# 4. Relative performances of international batsmen against India in India

While England & Australia are famous for bouncy tracks with swing, Indian pitches are renowed for being extraordinary turners. Also India has always thrown up world class spinners, from the spin quartet of BS Chandraskehar, Bishen Singh Bedi, EAS Prasanna, S Venkatraghavan, to the times of dangerous Anil Kumble, and now to the more recent Ravichander Ashwon and Harbhajan Singh.

A batsmen who can score runs in India against Indian spinners has to be really adept in handling all kinds of spin.

While Clive Lloyd & Alvin Kallicharan had the best performance against India, they have not been included as ESPN Cricinfo had many of the columns missing.

So I chose the following international players for the analysis against India

1. Hashim Amla (South Africa)
2. Alastair Cook (England)
3. Matthew Hayden (Australia)
4. Viv Richards (West Indies)
frames <- list("./data/amlaVsIndInInd.csv","./data/ancookVsIndInInd.csv","./data/mhaydenVsIndInInd.csv",
"./data/vrichardsVsIndInInd.csv")
names <- list("H Amla","AN Cook","M Hayden","Viv Riachards")

Excluding Clive Lloyd & Alvin Kallicharan the next best performer against India is Hashim Amla,followed by Alastair Cook, Viv Richards.

relativeBatsmanCumulativeAvgRuns(frames,names)

#With respect to strike rate, there is no contest when Viv Richards is around. He is clearly the best
#striker of the ball regardless of whether it is the pacy wickets of
#Australia/England or the spinning tracks of the subcontinent. After
#Viv Richards, Hayden and Alastair Cook have good cumulative strike rates
#in India
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

## 5. All time greats of Indian batting

I couldn’t resist checking out how the top Indian batsmen perform when playing in host countries So here is a look at how the top Indian batsmen perform against different host countries

# 6. Top Indian batsmen against Australia in Australia

The following Indian batsmen were chosen

2. Sachin Tendulkar
3. Virat Kohli
4. Virendar Sehwag
5. VVS Laxman
frames <- list("./data/tendulkarVsAusInAus.csv","./data/gavaskarVsAusInAus.csv","./data/kohliVsAusInAus.csv",
"./data/sehwagVsAusInAus.csv","./data/vvslaxmanVsAusInAus.csv")
names <- list("S Tendulkar","S Gavaskar","V Kohli","V Sehwag","VVS Laxman")

Virat Kohli has the best overall performance against Australia, with a current cumulative average of 60+ runs for the total number of innings played by him (15). With 15 matches the 2nd best is Virendar Sehwag, followed by VVS Laxman. Tendulkar maintains a cumulative average of 48+ runs for an excess of 30+ innings.

relativeBatsmanCumulativeAvgRuns(frames,names)

# Sehwag leads the strike rate against host Australia, followed by
# Tendulkar in Australia and then Kohli
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

# 7. Top Indian batsmen against England in England

The top Indian batmen’s performances against England are shown below

1. Rahul Dravid
2. Dilip Vengsarkar
3. Rahul Dravid
4. Sourav Ganguly
5. Virat Kohli
frames <- list("./data/tendulkarVsEngInEng.csv","./data/dravidVsEngInEng.csv","./data/vengsarkarVsEngInEng.csv",
names <- list("S Tendulkar","R Dravid","D Vengsarkar","S Ganguly","S Gavaskar","V Kohli")

Rahul Dravid has the best performance against England and edges out Tendulkar. He is followed by Tendulkar and then Sourav Ganguly. Note:Incidentally Virat Kohli’s performance against England in England so far has been extremely poor and he averages around 13-15 runs per innings. However he has a long way to go and I hope he catches up. In any case it will be an uphill climb for Kohli in England.

relativeBatsmanCumulativeAvgRuns(frames,names)

#Tendulkar, Ganguly and Dravid have the best strike rate and in that order.
relativeBatsmanSR(frames,names)

relativeBatsmanCumulativeStrikeRate(frames,names)

## 8. Top Indian batsmen against West Indies in West Indies

frames <- list("./data/tendulkarVsWInWI.csv","./data/dravidVsWInWI.csv","./data/vvslaxmanVsWIInWI.csv",
names <- list("S Tendulkar","R Dravid","VVS Laxman","S Gavaskar")

Against the West Indies Sunil Gavaskar is heads and shoulders above the rest. Gavaskar has a very impressive cumulative average against West Indies

relativeBatsmanCumulativeAvgRuns(frames,names)

# VVS Laxman followed by  Tendulkar & then Dravid have a very
# good strike rate against the West Indies
relativeBatsmanCumulativeStrikeRate(frames,names)

# 9. World’s best spinners on tracks suited for pace & bounce

In this part I compare the performances of the top 3 spinners in recent years and check out how they perform on surfaces that are known for pace, and bounce. I have taken the following 3 spinners

1. Anil Kumble (India)
2. M Muralitharan (Sri Lanka)
3. Shane Warne (Australia)
#kumbleEng=getPlayerData(30176  ,opposition=3,host=3,file="kumbleVsEngInEng.csv",type="bowling")
#muraliEng=getPlayerData(49636  ,opposition=3,host=3,file="muraliVsEngInEng.csv",type="bowling")
#warneEng=getPlayerData(8166  ,opposition=3,host=3,file="warneVsEngInEng.csv",type="bowling")

# 10. Top international spinners against England in England

frames <- list("./data/kumbleVsEngInEng.csv","./data/muraliVsEngInEng.csv","./data/warneVsEngInEng.csv")
names <- list("Anil KUmble","M Muralitharan","Shane Warne")

Against England and in England, Muralitharan shines with a cumulative average of nearly 5 wickets per match with a peak of almost 8 wickets. Shane Warne has a steady average at 5 wickets and then Anil Kumble.

relativeBowlerCumulativeAvgWickets(frames,names)

# The order relative cumulative Economy rate, Warne has the best figures,followed by Anil Kumble. Muralitharan
# is much more expensive.
relativeBowlerCumulativeAvgEconRate(frames,names)

# 11. Top international spinners against South Africa in South Africa

frames <- list("./data/kumbleVsSAInSA.csv","./data/muraliVsSAInSA.csv","./data/warneVsSAInSA.csv")
names <- list("Anil Kumble","M Muralitharan","Shane Warne")

In South Africa too, Muralitharan has the best wicket taking performance averaging about 4 wickets. Warne averages around 3 wickets and Kumble around 2 wickets

relativeBowlerCumulativeAvgWickets(frames,names)

# Muralitharan is expensive in South Africa too, while Kumble and Warne go neck-to-neck in the economy rate.
# Kumble edges out Warne and has a better cumulative average economy rate
relativeBowlerCumulativeAvgEconRate(frames,names)

# 11. Top international pacers against India in India

As a final analysis I check how the world’s pacers perform in India against India. India pitches are supposed to be flat devoid of bounce, while being terrific turners. Hence Indian pitches are more suited to spin bowling than pace bowling. This is changing these days.

The best performers against India in India are mostly the deadly pacemen of yesteryears

For this I have chosen the following bowlers

1. Courtney Walsh (West Indies)
2. Andy Roberts (West Indies)
3. Malcolm Marshall
4. Glenn McGrath
#cawalshInd=getPlayerData(53216  ,opposition=6,host=6,file="cawalshVsIndInInd.csv",type="bowling")
#arobertsInd=getPlayerData(52817  ,opposition=6,host=6,file="arobertsIndInInd.csv",type="bowling")
#mmarshallInd=getPlayerData(52419  ,opposition=6,host=6,file="mmarshallVsIndInInd.csv",type="bowling")
#gmccgrathInd=getPlayerData(6565  ,opposition=6,host=6,file="mccgrathVsIndInInd.csv",type="bowling")
frames <- list("./data/cawalshVsIndInInd.csv","./data/arobertsIndInInd.csv","./data/mmarshallVsIndInInd.csv",
"./data/mccgrathVsIndInInd.csv")
names <- list("C Walsh","A Roberts","M Marshall","G McGrath")

Courtney Walsh has the best performance, followed by Andy Roberts followed by Andy Roberts and then Malcom Marshall who tips ahead of Glenn McGrath

relativeBowlerCumulativeAvgWickets(frames,names)

#On the other hand McGrath has the best economy rate, followed by A Roberts and then Courtney Walsh
relativeBowlerCumulativeAvgEconRate(frames,names)

### 12. ODI performance of a player against a specific country in the host country

This gets the data for MS Dhoni in ODI matches against Australia and in Australia

#dhoniAusODI=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusODI.csv",type="batting")

### 13. Twenty 20 performance of a player against a specific country in the host country

#dhoniAusTT=getPlayerDataOD(28081,opposition=2,host=2,file="dhoniVsAusInAusTT.csv",type="batting")

All the ODI and Twenty20 functions of cricketr can be used on the above dataframes of MS Dhoni.

### Some key observations

Here are some key observations

1. At the top of the batting spectrum is Don Bradman with a very impressive average 100-120 in matches played in England and Australia. Unfortunately there weren’t matches he played in other countries and different pitches. 2.Viv Richard has the best cumulative strike rate overall.
2. Muralitharan strikes more often than Kumble or Warne even in pitches at ENgland, South Africa and West Indies. However Muralitharan is also the most expensive
3. Warne and Kumble have a much better economy rate than Muralitharan.
4. Sunil Gavaskar has an extremely impressive performance in West Indies.
5. Rahul Dravid performs much better than Tendulkar in both England and West Indies.
6. Virat Kohli has the best performance against Australia so far and hope he maintains his stellar performance followed by Sehwag. However Kohli’s performance in England has been very poor
7. West Indies batsmen and bowlers seem to thrive on Indian pitches, with Clive Lloyd and Alvin Kalicharan at the top of the list.

You may like my Shiny apps on cricket

Also see

To see all my posts see Index of posts

# Analysis of IPL T20 matches with yorkr templates

## Introduction

In this post I create RMarkdown templates for end-to-end analysis of IPL T20 matches, that are available on Cricsheet based on my R package yorkr.  With these templates you can convert all IPL data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing IPL matches, teams and players. All of these can be accessed at yorkrIPLTemplate.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr and Beaten by sheer pace-Cricket analytics with yorkr A must read for any cricket lover! Check it out!! 9/Rs 320 and$6.99/Rs448 respectively

The templates are

1. Template for conversion and setup – IPLT20Template.Rmd
2. Any IPL match – IPLMatchtemplate.Rmd
3. IPL matches between 2 nations – IPLMatches2TeamTemplate.Rmd
4. A IPL nations performance against all other IPL nations – IPLAllMatchesAllOppnTemplate.Rmd
5. Analysis of IPL batsmen and bowlers of all IPL nations – IPLBatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all IPL matches I downloaded from Cricsheet in Dec 2016. So this data is complete till the 2016 IPL season. You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future!

See my earlier posts where I analyze IPL T20
1. yorkr crashes the IPL party ! – Part 1
2. yorkr crashes the IPL party! – Part 2
3. yorkr crashes the IPL party! – Part 3!
4. yorkr crashes the IPL party! – Part 4

There will be 5 folders at the root

1. IPLdata – Match files as yaml from Cricsheet
2. IPLMatches – Yaml match files converted to dataframes
3. IPLMatchesBetween2Teams – All Matches between any 2 IPL teams
4. allMatchesAllOpposition – An IPL teams’s performance against all other teams
5. BattingBowlingDetails – Batting and bowling details of all IPL teams
library(yorkr)
library(dplyr)

The first few steps take care of the data setup. This needs to be done before any of the analysis of IPL batsmen, bowlers, any IPL match, matches between any 2 IPL countries or analysis of a teams performance against all other countries

There will be 5 folders at the root

1. data
2. IPLMatches
3. IPLMatchesBetween2Teams
4. allMatchesAllOpposition
5. BattingBowlingDetails

# 1.Create directory of IPLMatches

Some files may give conversions errors. You could try to debug the problem or just remove it from the IPLdata folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted.

Also take a look at my GooglyPlus shiny app which was created after performing the same conversion on the Dec 16 data .

convertAllYaml2RDataframesT20("data","IPLMatches")

### 2.Save all matches between all combinations of IPL nations

This function will create the set of all matches between each IPL team against every other IPL team. This uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("./IPLMatchesBetween2Teams")
saveAllMatchesBetween2IPLTeams("../IPLMatches")

### 3.Save all matches against all opposition

This will create a consolidated dataframe of all matches played by every IPL playing nation against all other nattions. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../allMatchesAllOpposition")
saveAllMatchesAllOppositionIPLT20("../IPLMatches")

### 4. Create batting and bowling details for each IPL team

These are the current IPL playing teams. You can add to this vector as newer IPL teams start playing IPL. You will get to know all IPL teams by also look at the directory created above namely allMatchesAllOpposition. This also uses the data that was created in IPLMatches, with the convertAllYaml2RDataframesIPL() function.

setwd("../BattingBowlingDetails")
ipl_teams <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings XI Punjab",
"Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
"Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
"Rising Pune Supergiants")

for(i in seq_along(ipl_teams)){
print(ipl_teams[i])
val <- paste(ipl_teams[i],"-details",sep="")
val <- getTeamBattingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

for(i in seq_along(ipl_teams)){
print(ipl_teams[i])
val <- paste(ipl_teams[i],"-details",sep="")
val <- getTeamBowlingDetails(ipl_teams[i],dir="../IPLMatches", save=TRUE)

}

### 5. Get the list of batsmen for a particular IPL team

The following code is needed for analyzing individual IPL batsmen. In IPL a player could have played in multiple IPL teams.

getBatsmen <- function(df){
bmen <- df %>% distinct(batsman)
bmen <- as.character(bmen$batsman) batsmen <- sort(bmen) } load("Chennai Super Kings-BattingDetails.RData") csk_details <- battingDetails load("Deccan Chargers-BattingDetails.RData") dc_details <- battingDetails load("Delhi Daredevils-BattingDetails.RData") dd_details <- battingDetails load("Kings XI Punjab-BattingDetails.RData") kxip_details <- battingDetails load("Kochi Tuskers Kerala-BattingDetails.RData") ktk_details <- battingDetails load("Kolkata Knight Riders-BattingDetails.RData") kkr_details <- battingDetails load("Mumbai Indians-BattingDetails.RData") mi_details <- battingDetails load("Pune Warriors-BattingDetails.RData") pw_details <- battingDetails load("Rajasthan Royals-BattingDetails.RData") rr_details <- battingDetails load("Royal Challengers Bangalore-BattingDetails.RData") rcb_details <- battingDetails load("Sunrisers Hyderabad-BattingDetails.RData") sh_details <- battingDetails load("Gujarat Lions-BattingDetails.RData") gl_details <- battingDetails load("Rising Pune Supergiants-BattingDetails.RData") rps_details <- battingDetails #Get the batsmen for each IPL team csk_batsmen <- getBatsmen(csk_details) dc_batsmen <- getBatsmen(dc_details) dd_batsmen <- getBatsmen(dd_details) kxip_batsmen <- getBatsmen(kxip_details) ktk_batsmen <- getBatsmen(ktk_details) kkr_batsmen <- getBatsmen(kkr_details) mi_batsmen <- getBatsmen(mi_details) pw_batsmen <- getBatsmen(pw_details) rr_batsmen <- getBatsmen(rr_details) rcb_batsmen <- getBatsmen(rcb_details) sh_batsmen <- getBatsmen(sh_details) gl_batsmen <- getBatsmen(gl_details) rps_batsmen <- getBatsmen(rps_details) # Save the dataframes save(csk_batsmen,file="csk.RData") save(dc_batsmen, file="dc.RData") save(dd_batsmen, file="dd.RData") save(kxip_batsmen, file="kxip.RData") save(ktk_batsmen, file="ktk.RData") save(kkr_batsmen, file="kkr.RData") save(mi_batsmen , file="mi.RData") save(pw_batsmen, file="pw.RData") save(rr_batsmen, file="rr.RData") save(rcb_batsmen, file="rcb.RData") save(sh_batsmen, file="sh.RData") save(gl_batsmen, file="gl.RData") save(rps_batsmen, file="rps.RData") ### 6. Get the list of bowlers for a particular IPL team The method below can get the list of bowler names for any IPL team.The following code is needed for analyzing individual IPL bowlers. In IPL a player could have played in multiple IPL teams. getBowlers <- function(df){ bwlr <- df %>% distinct(bowler) bwlr <- as.character(bwlr$bowler)
bowler <- sort(bwlr)
}

csk_details <- bowlingDetails
dc_details <- bowlingDetails
dd_details <- bowlingDetails
kxip_details <- bowlingDetails
ktk_details <- bowlingDetails
kkr_details <- bowlingDetails
mi_details <- bowlingDetails
pw_details <- bowlingDetails
rr_details <- bowlingDetails
rcb_details <- bowlingDetails
sh_details <- bowlingDetails
gl_details <- bowlingDetails
rps_details <- bowlingDetails

# Get the bowlers for each team
csk_bowlers <- getBowlers(csk_details)
dc_bowlers <- getBowlers(dc_details)
dd_bowlers <- getBowlers(dd_details)
kxip_bowlers <- getBowlers(kxip_details)
ktk_bowlers <- getBowlers(ktk_details)
kkr_bowlers <- getBowlers(kkr_details)
mi_bowlers <- getBowlers(mi_details)
pw_bowlers <- getBowlers(pw_details)
rr_bowlers <- getBowlers(rr_details)
rcb_bowlers <- getBowlers(rcb_details)
sh_bowlers <- getBowlers(sh_details)
gl_bowlers <- getBowlers(gl_details)
rps_bowlers <- getBowlers(rps_details)

#Save the dataframes
save(csk_bowlers,file="csk1.RData")
save(dc_bowlers, file="dc1.RData")
save(dd_bowlers, file="dd1.RData")
save(kxip_bowlers, file="kxip1.RData")
save(ktk_bowlers, file="ktk1.RData")
save(kkr_bowlers, file="kkr1.RData")
save(mi_bowlers , file="mi1.RData")
save(pw_bowlers, file="pw1.RData")
save(rr_bowlers, file="rr1.RData")
save(rcb_bowlers, file="rcb1.RData")
save(sh_bowlers, file="sh1.RData")
save(gl_bowlers, file="gl1.RData")
save(rps_bowlers, file="rps1.RData")

### 1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

### 2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

### 3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

### 4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

### 5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

### 6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

### 9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

### 1 IPL Match Analysis

Load any match data from the ./IPLMatches folder for e.g. Chennai Super Kings-Deccan Chargers-2008-05-06.RData

setwd("./IPLMatches")
load("Chennai Super Kings-Deccan Chargers-2008-05-06.RData")
csk_dc<- overs
#The steps are
IPLTeam1_IPLTeam2 <- overs

All analysis for this match can be done now

### 2. Scorecard

teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBattingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

### 3.Batting Partnerships

teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
teamBatsmenPartnershipMatch(IPLTeam1_IPLTeam2,"IPLTeam2","IPLTeam1")

### 4. Batsmen vs Bowler Plot

teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=TRUE)
teamBatsmenVsBowlersMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)

### 5. Team bowling scorecard

teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam1")
teamBowlingScorecardMatch(IPLTeam1_IPLTeam2,"IPLTeam2")

### 6. Team bowling Wicket kind match

teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketKindMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 7. Team Bowling Wicket Runs Match

teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <-teamBowlingWicketRunsMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 8. Team Bowling Wicket Match

m <-teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m
teamBowlingWicketMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

### 9. Team Bowler vs Batsmen

teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")
m <- teamBowlersVsBatsmenMatch(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2",plot=FALSE)
m

### 10. Match Worm chart

matchWormGraph(IPLTeam1_IPLTeam2,"IPLTeam1","IPLTeam2")

### 1. IPL Matches for a team against all other teams

Load the data between for a IPL team against all other countries ./allMatchesAllOpposition for e.g all matches of Kolkata Knight Riders

load("allMatchesAllOpposition-Kolkata Knight Riders.RData")
kkr_matches <- matches
IPLTeam="IPLTeam1"
allMatches <- paste("allMatchesAllOposition-",IPLTeam,".RData",sep="")
IPLTeam1AllMatches <- matches


### 2. Team’s batting scorecard all Matches

m <-teamBattingScorecardAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m

### 3. Batting scorecard of opposing team

m <-teamBattingScorecardAllOppnAllMatches(matches=IPLTeam1AllMatches,theTeam="IPLTeam2")

### 4. Team batting partnerships

m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam="IPLTeam1")
m
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="detailed")
m <- teamBatsmenPartnershipAllOppnAllMatches(IPLTeam1AllMatches,theTeam='IPLTeam1',report="summary")
m

### 5. Team batting partnerships plot

teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam1")
teamBatsmenPartnershipAllOppnAllMatchesPlot(IPLTeam1AllMatches,"IPLTeam1",main="IPLTeam2")

### 6, Team batsmen vs bowlers report

m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=0)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=30)
m
m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam2",rank=1,dispRows=25)
m

### 7. Team batsmen vs bowler plot

d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=1,dispRows=50)
d
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)
d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(IPLTeam1AllMatches,"IPLTeam1",rank=2,dispRows=50)
teamBatsmenVsBowlersAllOppnAllMatchesPlot(d)

### 8. Team bowling scorecard

teamBowlingScorecardAllOppnAllMatchesMain(matches=IPLTeam1AllMatches,theTeam="IPLTeam1")
teamBowlingScorecardAllOppnAllMatches(IPLTeam1AllMatches,'IPLTeam2')

### 9. Team bowler vs batsmen

teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)
teamBowlersVsBatsmenAllOppnAllMatchesMain(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=2)
teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=IPLTeam1AllMatches,theTeam="IPLTeam1",rank=0)

### 10. Team Bowler vs bastmen

df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(IPLTeam1AllMatches,theTeam="IPLTeam1",rank=1)
teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"IPLTeam1","IPLTeam1")

### 11. Team bowler wicket kind

teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All")
teamBowlingWicketKindAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2")


### 12.

teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="All",plot=TRUE)
teamBowlingWicketRunsAllOppnAllMatches(IPLTeam1AllMatches,t1="IPLTeam1",t2="IPLTeam2",plot=TRUE)

### 1 IPL Batsman setup functions

Get the batsman’s details for a batsman

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab",
"Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
"Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
"Rising Pune Supergiants")

# Check and get the team indices of IPL teams in which the batsman has played
getTeamIndex <- function(batsman){
setwd("./BattingBowlingDetails")
setwd("..")
getwd()
print(ls())
teams_batsmen = list(csk_batsmen,dc_batsmen,dd_batsmen,kxip_batsmen,ktk_batsmen,kkr_batsmen,mi_batsmen,
pw_batsmen,rr_batsmen,rcb_batsmen,sh_batsmen,gl_batsmen,rps_batsmen)
b <- NULL
for (i in 1:length(teams_batsmen)){
a <- which(teams_batsmen[[i]] == batsman)

if(length(a) != 0)
b <- c(b,i)
}
b
}

# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

l <- NULL
# Get the teams passed in as indexes
for (i in seq_along(x)){

l <- c(l, IPLTeamNames[[x[i]]])

}
l
}

# Create a consolidated data frame with all teams the IPL batsman has played for
getIPLBatsmanDF <- function(teamNames){
batsmanDF <- NULL
# Create a consolidated Data frame of batsman for all IPL teams played
for (i in seq_along(teamNames)){
df <- getBatsmanDetails(team=teamNames[i],name=IPLBatsman,dir="./BattingBowlingDetails")
batsmanDF <- rbind(batsmanDF,df)

}
batsmanDF
}


### 2. Create a consolidated IPL batsman data frame

# Since an IPL batsman coculd have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check MS Dhoni we need to do the following

IPLBatsman = "MS Dhoni"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBatsman)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
# Check if file exists in the directory. This check is necessary when moving between matchType

############## Create a consolidated IPL batsman dataframe for analysis
batsmanDF <- getIPLBatsmanDF(teamNames)


### 3. Runs vs deliveries

# For e.g. batsmanName="MS Dhoni""
#batsmanRunsVsDeliveries(batsmanDF, "MS Dhoni")
batsmanRunsVsDeliveries(batsmanDF,"batsmanName")

### 4. Batsman 4s & 6s

batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs)
p1 <- batsmanFoursSixes(batsman46,"batsmanName")

### 5. Batsman dismissals

batsmanDismissals(batsmanDF,"batsmanName")

### 6. Runs vs Strike rate

batsmanRunsVsStrikeRate(batsmanDF,"batsmanName")

### 7. Batsman Moving Average

batsmanMovingAverage(batsmanDF,"batsmanName")

### 8. Batsman cumulative average

batsmanCumulativeAverageRuns(batsmanDF,"batsmanName")

### 9. Batsman cumulative strike rate

batsmanCumulativeStrikeRate(batsmanDF,"batsmanName")

### 10. Batsman runs against oppositions

batsmanRunsAgainstOpposition(batsmanDF,"batsmanName")

### 11. Batsman runs vs venue

batsmanRunsVenue(batsmanDF,"batsmanName")

### 12. Batsman runs predict

batsmanRunsPredict(batsmanDF,"batsmanName")

### 13.Bowler set up functions

setwd("../BattingBowlingDetails")
# IPL Team names
IPLTeamNames <- list("Chennai Super Kings","Deccan Chargers", "Delhi Daredevils","Kings Xi Punjab",
"Kochi Tuskers Kerala","Kolkata Knight Riders","Mumbai Indians","Pune Warriors",
"Rajasthan Royals","Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions",
"Rising Pune Supergiants")

# Get the team indices of IPL teams for which the bowler as played
getTeamIndex_bowler <- function(bowler){
# Load IPL Bowlers
setwd("./data")
setwd("..")
teams_bowlers = list(csk_bowlers,dc_bowlers,dd_bowlers,kxip_bowlers,ktk_bowlers,kkr_bowlers,mi_bowlers,
pw_bowlers,rr_bowlers,rcb_bowlers,sh_bowlers,gl_bowlers,rps_bowlers)
b <- NULL
for (i in 1:length(teams_bowlers)){
a <- which(teams_bowlers[[i]] == bowler)
if(length(a) != 0){
b <- c(b,i)
}
}
b
}

# Get the list of the IPL team names from the indices passed
getTeams <- function(x){

l <- NULL
# Get the teams passed in as indexes
for (i in seq_along(x)){

l <- c(l, IPLTeamNames[[x[i]]])

}
l
}

# Get the team names
teamNames <- getTeams(i)

getIPLBowlerDF <- function(teamNames){
bowlerDF <- NULL

# Create a consolidated Data frame of batsman for all IPL teams played
for (i in seq_along(teamNames)){
df <- getBowlerWicketDetails(team=teamNames[i],name=IPLBowler,dir="./BattingBowlingDetails")
bowlerDF <- rbind(bowlerDF,df)

}
bowlerDF
}

### 14. Get the consolidated data frame for an IPL bowler

# Since an IPL bowler could have played in multiple teams we need to determine these teams and
# create a consolidated data frame for the analysis
# For example to check R Ashwin we need to do the following

IPLBowler = "R Ashwin"
#Check and get the team indices of IPL teams in which the batsman has played
i <- getTeamIndex(IPLBowler)

# Get the team names in which the IPL batsman has played
teamNames <- getTeams(i)
# Check if file exists in the directory. This check is necessary when moving between matchType

############## Create a consolidated IPL batsman dataframe for analysis
bowlerDF <- getIPLBowlerDF(teamNames)


### 15. Bowler Mean Economy rate

# For e.g. to get the details of R Ashwin do
#bowlerMeanEconomyRate(bowlerDF,"R Ashwin")
bowlerMeanEconomyRate(bowlerDF,"bowlerName")

### 16. Bowler mean runs conceded

bowlerMeanRunsConceded(bowlerDF,"bowlerName")

### 17. Bowler Moving Average

bowlerMovingAverage(bowlerDF,"bowlerName")

### 18. Bowler cumulative average wickets

bowlerCumulativeAvgWickets(bowlerDF,"bowlerName")

### 19. Bowler cumulative Economy Rate (ER)

bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName")

### 20. Bowler wicket plot

bowlerWicketPlot(bowlerDF,"bowlerName")

### 21. Bowler wicket against opposition

bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName")

### 22. Bowler wicket at cricket grounds

bowlerWicketsVenue(bowlerDF,"bowlerName")

### 23. Predict number of deliveries to wickets

setwd("./IPLMatches")
bowlerDF1 <- getDeliveryWickets(team="IPLTeam1",dir=".",name="bowlerName",save=FALSE)
bowlerWktsPredict(bowlerDF1,"bowlerName")

# Analysis of International T20 matches with yorkr templates

## Introduction

In this post I create yorkr templates for International T20 matches that are available on Cricsheet. With these templates you can convert all T20 data which is in yaml format to R dataframes. Further I create data and the necessary templates for analyzing. All of these templates can be accessed from Github at yorkrT20Template. The templates are

1. Template for conversion and setup – T20Template.Rmd
2. Any T20 match – T20Matchtemplate.Rmd
3. T20 matches between 2 nations – T20Matches2TeamTemplate.Rmd
4. A T20 nations performance against all other T20 nations – T20AllMatchesAllOppnTemplate.Rmd
5. Analysis of T20 batsmen and bowlers of all T20 nations – T20BatsmanBowlerTemplate.Rmd

Besides the templates the repository also includes the converted data for all T20 matches I downloaded from Cricsheet in Dec 2016, You can recreate the files as more matches are added to Cricsheet site. This post contains all the steps needed for T20 analysis, as more matches are played around the World and more data is added to Cricsheet. This will also be my reference in future if I decide to analyze T20 in future!

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr and Beaten by sheer pace-Cricket analytics with yorkr A must read for any cricket lover! Check it out!! Feel free to download/clone these templates from Github yorkrT20Template and perform your own analysis There will be 5 folders at the root 1. T20data – Match files as yaml from Cricsheet 2. T20Matches – Yaml match files converted to dataframes 3. T20MatchesBetween2Teams – All Matches between any 2 T20 teams 4. allMatchesAllOpposition – A T20 countries match data against all other teams 5. BattingBowlingDetails – Batting and bowling details of all countries library(yorkr) library(dplyr) The first few steps take care of the data setup. This needs to be done before any of the analysis of T20 batsmen, bowlers, any T20 match, matches between any 2 T20 countries or analysis of a teams performance against all other countries There will be 5 folders at the root 1. T20data 2. T20Matches 3. T20MatchesBetween2Teams 4. allMatchesAllOpposition 5. BattingBowlingDetails # The source YAML files will be in T20Data folder # 1.Create directory T20Matches Some files may give conversions errors. You could try to debug the problem or just remove it from the T20data folder. At most 2-4 file will have conversion problems and I usally remove then from the files to be converted. Also take a look at my Inswinger shiny app which was created after performing the same conversion on the Dec 16 data . convertAllYaml2RDataframesT20("T20Data","T20Matches") ### 2.Save all matches between all combinations of T20 nations This function will create the set of all matches between every T20 country against every other T20 country. This uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function. setwd("./T20MatchesBetween2Teams") saveAllMatchesBetweenTeams("../T20Matches") ### 3.Save all matches against all opposition This will create a consolidated dataframe of all matches played by every T20 playing nation against all other nattions. This also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function. setwd("../allMatchesAllOpposition") saveAllMatchesAllOpposition("../T20Matches") ### 4. Create batting and bowling details for each T20 country These are the current T20 playing nations. You can add to this vector as more countries start playing T20. You will get to know all T20 nations by also look at the directory created above namely allMatchesAllOpposition. his also uses the data that was created in T20Matches, with the convertAllYaml2RDataframesT20() function. setwd("../BattingBowlingDetails") teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka', "England", "Bangladesh","Netherlands","Scotland", "Afghanistan", "Zimbabwe","Ireland","New Zealand","South Africa","Canada", "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea", "United Arab Emirates") for(i in seq_along(teams)){ print(teams[i]) val <- paste(teams[i],"-details",sep="") val <- getTeamBattingDetails(teams[i],dir="../T20Matches", save=TRUE) } for(i in seq_along(teams)){ print(teams[i]) val <- paste(teams[i],"-details",sep="") val <- getTeamBowlingDetails(teams[i],dir="../T20Matches", save=TRUE) } ### 5. Get the list of batsmen for a particular country For e.g. if you wanted to get the batsmen of Canada you would do the following. By replacing Canada for any other country you can get the batsmen of that country. These batsmen names can then be used in the batsmen analysis country="Canada" teamData <- paste(country,"-BattingDetails.RData",sep="") load(teamData) countryDF <- battingDetails bmen <- countryDF %>% distinct(batsman) bmen <- as.character(bmen$batsman)
batsmen <- sort(bmen)
batsmen

### 6. Get the list of bowlers for a particular country

The method below can get the list of bowler names for any T20 nation. These names can then be used in the bowler analysis below

country="Netherlands"
teamData <- paste(country,"-BowlingDetails.RData",sep="")
countryDF <- bowlingDetails
bwlr <- countryDF %>% distinct(bowler)
bwlr <- as.character(bwlr$bowler) bowler <- sort(bwlr) bowler ### Now we are all set ### A) International T20 Match Analysis Load any match data from the ./T20Matches folder for e.g. Afganistan-England-2016-03-23.RData setwd("./T20Matches") load("Afghanistan-England-2016-03-23.RData") afg_eng<- overs #The steps are load("Country1-Country2-Date.Rdata") country1_country2 <- overs All analysis for this match can be done now ### 2. Scorecard teamBattingScorecardMatch(country1_country2,"Country1") teamBattingScorecardMatch(country1_country2,"Country2") ### 3.Batting Partnerships teamBatsmenPartnershipMatch(country1_country2,"Country1","Country2") teamBatsmenPartnershipMatch(country1_country2,"Country2","Country1") ### 4. Batsmen vs Bowler Plot teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=TRUE) teamBatsmenVsBowlersMatch(country1_country2,"Country1","Country2",plot=FALSE) ### 5. Team bowling scorecard teamBowlingScorecardMatch(country1_country2,"Country1") teamBowlingScorecardMatch(country1_country2,"Country2") ### 6. Team bowling Wicket kind match teamBowlingWicketKindMatch(country1_country2,"Country1","Country2") m <-teamBowlingWicketKindMatch(country1_country2,"Country1","Country2",plot=FALSE) m ### 7. Team Bowling Wicket Runs Match teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2") m <-teamBowlingWicketRunsMatch(country1_country2,"Country1","Country2",plot=FALSE) m ### 8. Team Bowling Wicket Match m <-teamBowlingWicketMatch(country1_country2,"Country1","Country2",plot=FALSE) m teamBowlingWicketMatch(country1_country2,"Country1","Country2") ### 9. Team Bowler vs Batsmen teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2") m <- teamBowlersVsBatsmenMatch(country1_country2,"Country1","Country2",plot=FALSE) m ### 10. Match Worm chart matchWormGraph(country1_country2,"Country1","Country2")  ### B) International T20 Matches between 2 teams Load match data between any 2 teams from ./T20MatchesBetween2Teams for e.g.Australia-India-allMatches setwd("./T20MatchesBetween2Teams") load("Australia-India-allMatches.RData") aus_ind_matches <- matches #Replace below with your own countries country1<-"England" country2 <- "South Africa" country1VsCountry2 <- paste(country1,"-",country2,"-allMatches.RData",sep="") load(country1VsCountry2) country1_country2_matches <- matches  ### 2.Batsmen partnerships m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="summary") m m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country2",report="summary") m m<- teamBatsmenPartnershiOppnAllMatches(country1_country2_matches,"country1",report="detailed") m teamBatsmenPartnershipOppnAllMatchesChart(country1_country2_matches,"country1","country2") ### 3. Team batsmen vs bowlers teamBatsmenVsBowlersOppnAllMatches(country1_country2_matches,"country1","country2") ### 4. Bowling scorecard a <-teamBattingScorecardOppnAllMatches(country1_country2_matches,main="country1",opposition="country2") a ### 5. Team bowling performance teamBowlingPerfOppnAllMatches(country1_country2_matches,main="country1",opposition="country2") ### 6. Team bowler wickets teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2") m <-teamBowlersWicketsOppnAllMatches(country1_country2_matches,main="country1",opposition="country2",plot=FALSE) teamBowlersWicketsOppnAllMatches(country1_country2_matches,"country1","country2",top=3) m ### 7. Team bowler vs batsmen teamBowlersVsBatsmenOppnAllMatches(country1_country2_matches,"country1","country2",top=5) ### 8. Team bowler wicket kind teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=TRUE) m <- teamBowlersWicketKindOppnAllMatches(country1_country2_matches,"country1","country2",plot=FALSE) m[1:30,] ### 9. Team bowler wicket runs teamBowlersWicketRunsOppnAllMatches(country1_country2_matches,"country1","country2") ### 10. Plot wins and losses setwd("./T20Matches") plotWinLossBetweenTeams("country1","country2") ### C) International T20 Matches for a team against all other teams Load the data between for a T20 team against all other countries ./allMatchesAllOpposition for e.g all matches of India load("allMatchesAllOpposition-India.RData") india_matches <- matches country="country1" allMatches <- paste("allMatchesAllOposition-",country,".RData",sep="") load(allMatches) country1AllMatches <- matches  ### 2. Team’s batting scorecard all Matches m <-teamBattingScorecardAllOppnAllMatches(country1AllMatches,theTeam="country1") m ### 3. Batting scorecard of opposing team m <-teamBattingScorecardAllOppnAllMatches(matches=country1AllMatches,theTeam="country2") ### 4. Team batting partnerships m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam="country1") m m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="detailed") head(m,30) m <- teamBatsmenPartnershipAllOppnAllMatches(country1AllMatches,theTeam='country1',report="summary") m ### 5. Team batting partnerships plot teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country1") teamBatsmenPartnershipAllOppnAllMatchesPlot(country1AllMatches,"country1",main="country2") ### 6, Team batsmen vs bowlers report m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=0) m m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=30) m m <-teamBatsmenVsBowlersAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country2",rank=1,dispRows=25) m ### 7. Team batsmen vs bowler plot d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=1,dispRows=50) d teamBatsmenVsBowlersAllOppnAllMatchesPlot(d) d <- teamBatsmenVsBowlersAllOppnAllMatchesRept(country1AllMatches,"country1",rank=2,dispRows=50) teamBatsmenVsBowlersAllOppnAllMatchesPlot(d) ### 8. Team bowling scorecard teamBowlingScorecardAllOppnAllMatchesMain(matches=country1AllMatches,theTeam="country1") teamBowlingScorecardAllOppnAllMatches(country1AllMatches,'country2') ### 9. Team bowler vs batsmen teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=0) teamBowlersVsBatsmenAllOppnAllMatchesMain(country1AllMatches,theTeam="country1",rank=2) teamBowlersVsBatsmenAllOppnAllMatchesRept(matches=country1AllMatches,theTeam="country1",rank=0) ### 10. Team Bowler vs bastmen df <- teamBowlersVsBatsmenAllOppnAllMatchesRept(country1AllMatches,theTeam="country1",rank=1) teamBowlersVsBatsmenAllOppnAllMatchesPlot(df,"country1","country1") ### 11. Team bowler wicket kind teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="All") teamBowlingWicketKindAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2")  ### 12. teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="All",plot=TRUE) teamBowlingWicketRunsAllOppnAllMatches(country1AllMatches,t1="country1",t2="country2",plot=TRUE) ### D) Batsman functions Get the batsman’s details for a batsman setwd("../BattingBowlingDetails") kohli <- getBatsmanDetails(team="India",name="Kohli",dir=".") batsmanDF <- getBatsmanDetails(team="country1",name="batsmanName",dir=".") ### 2. Runs vs deliveries batsmanRunsVsDeliveries(batsmanDF,"batsmanName") ### 3. Batsman 4s & 6s batsman46 <- select(batsmanDF,batsman,ballsPlayed,fours,sixes,runs) p1 <- batsmanFoursSixes(batsman46,"batsmanName") ### 4. Batsman dismissals batsmanDismissals(batsmanDF,"batsmanName") ### 5. Runs vs Strike rate batsmanRunsVsStrikeRate(batsmanDF,"batsmanName") ### 6. Batsman Moving Average batsmanMovingAverage(batsmanDF,"batsmanName") ### 7. Batsman cumulative average batsmanCumulativeAverageRuns(batsmanDF,"batsmanName") ### 8. Batsman cumulative strike rate batsmanCumulativeStrikeRate(batsmanDF,"batsmanName") ### 9. Batsman runs against oppositions batsmanRunsAgainstOpposition(batsmanDF,"batsmanName") ### 10. Batsman runs vs venue batsmanRunsVenue(batsmanDF,"batsmanName") ### 11. Batsman runs predict batsmanRunsPredict(batsmanDF,"batsmanName") ### 12. Bowler functions For example to get Ravicahnder Ashwin’s bowling details setwd("../BattingBowlingDetails") ashwin <- getBowlerWicketDetails(team="India",name="Ashwin",dir=".") bowlerDF <- getBatsmanDetails(team="country1",name="bowlerName",dir=".") ### 13. Bowler Mean Economy rate bowlerMeanEconomyRate(bowlerDF,"bowlerName") ### 14. Bowler mean runs conceded bowlerMeanRunsConceded(bowlerDF,"bowlerName") ### 15. Bowler Moving Average bowlerMovingAverage(bowlerDF,"bowlerName") ### 16. Bowler cumulative average wickets bowlerCumulativeAvgWickets(bowlerDF,"bowlerName") ### 17. Bowler cumulative Economy Rate (ER) bowlerCumulativeAvgEconRate(bowlerDF,"bowlerName") ### 18. Bowler wicket plot bowlerWicketPlot(bowlerDF,"bowlerName") ### 19. Bowler wicket against opposition bowlerWicketsAgainstOpposition(bowlerDF,"bowlerName") ### 20. Bowler wicket at cricket grounds bowlerWicketsVenue(bowlerDF,"bowlerName") ### 21. Predict number of deliveries to wickets setwd("./T20Matches") bowlerDF1 <- getDeliveryWickets(team="country1",dir=".",name="bowlerName",save=FALSE) bowlerWktsPredict(bowlerDF1,"bowlerName") # GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables In this post I introduce my new Shiny app,“GooglyPlus”, which is a more evolved version of my earlier Shiny app “Googly”. My R package ‘yorkr’, on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. My initial version of the app only included plots, and did not exercise the yorkr package fully. Moreover, I am certain, there may be a set of cricket aficionados who would prefer, numbers to charts. Hence I have created this enhanced version of the Googly app and appropriately renamed it as GooglyPlus. GooglyPlus is based on the yorkr package which uses data from Cricsheet. The app is based on IPL data from all IPL matches from 2008 up to 2016. Feel free to clone/fork or download the code from Github at GooglyPlus. If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

Click  GooglyPlus to access the Shiny app!

The changes for GooglyPlus over the earlier Googly app is only in the following 3 tab panels

• IPL match
• Overall Performance

The analysis of IPL batsman and IPL bowler tabs are unchanged. These charts are as they were before.

The changes are only in  tabs i) IPL match ii) Head to head and  iii) Overall Performance. New functionality has been added and existing functions now have the dual option of either displaying a plot or a table.

The changes are

A) IPL Match
The following additions/enhancements have been done

-Match Batting Scorecard – Table
-Batting Partnerships – Plot, Table (New)
-Batsmen vs Bowlers – Plot, Table(New)
-Match Bowling Scorecard   – Table (New)
-Bowling Wicket Kind – Plot, Table (New)
-Bowling Wicket Runs – Plot, Table (New)
-Bowling Wicket Match – Plot, Table (New)
-Bowler vs Batsmen – Plot, Table (New)
-Match Worm Graph – Plot

The following functions have been added/enhanced

-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard All Matches – Table (New)
-Team Batsmen vs Bowlers all Matches – Plot, Table (New)
-Team Wickets Opposition All Matches – Plot, Table (New)
-Team Bowling Scorecard All Matches – Table (New)
-Team Bowler vs Batsmen All Matches – Plot, Table (New)
-Team Bowlers Wicket Kind All Matches – Plot, Table (New)
-Team Bowler Wicket Runs All Matches – Plot, Table (New)
-Win Loss All Matches – Plot

C) Overall Performance
The following additions/enhancements have been done in this tab

-Team Batsmen Partnerships Overall – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard Overall –Table (New)
-Team Batsmen vs Bowlers Overall – Plot, Table (New)
-Team Bowler vs Batsmen Overall – Plot, Table (New)
-Team Bowling Scorecard Overall – Table (New)
-Team Bowler Wicket Kind Overall – Plot, Table (New)

Included below are some random charts and tables. Feel free to explore the Shiny app further

1) IPL Match
a) Match Batting Scorecard (Table only)
This is the batting score card for the Chennai Super Kings & Deccan Chargers 2011-05-11

b)  Match batting partnerships (Plot)
Delhi Daredevils vs Kings XI Punjab – 2011-04-23

c) Match batting partnerships (Table)
The same batting partnership  Delhi Daredevils vs Kings XI Punjab – 2011-04-23 as a table

d) Batsmen vs Bowlers (Plot)
Kolkata Knight Riders vs Mumbai Indians 2010-04-19

e)  Match Bowling Scorecard (Table only)

a) Team Batsmen Partnership (Plot)
Deccan Chargers vs Kolkata Knight Riders all matches

b)  Team Batsmen Partnership (Summary – Table)
In the following tables it can be seen that MS Dhoni has performed better that SK Raina  CSK against DD matches, whereas SK Raina performs better than Dhoni in CSK vs  KKR matches

i) Chennai Super Kings vs Delhi Daredevils (Summary – Table)

ii) Chennai Super Kings vs Kolkata Knight Riders (Summary – Table)

iii) Rising Pune Supergiants vs Gujarat Lions (Detailed – Table)
This table provides the detailed partnership for RPS vs GL all matches

c) Team Bowling Scorecard (Table only)
This table gives the bowling scorecard of Pune Warriors vs Deccan Chargers in all matches

C) Overall performances
a) Batting Scorecard All Matches  (Table only)

This is the batting scorecard of Royal Challengers Bangalore. The top 3 batsmen are V Kohli, C Gayle and AB Devilliers in that order

b) Batsman vs Bowlers all Matches (Plot)
This gives the performance of Mumbai Indian’s batsman of Rank=1, which is Rohit Sharma, against bowlers of all other teams

c)  Batsman vs Bowlers all Matches (Table)
The above plot as a table. It can be seen that Rohit Sharma has scored maximum runs against M Morkel, then Shakib Al Hasan and then UT Yadav.

d) Bowling scorecard (Table only)
The table below gives the bowling scorecard of CSK. R Ashwin leads with a tally of 98 wickets followed by DJ Bravo who has 88 wickets and then JA Morkel who has 83 wickets in all matches against all teams

This is just a random selection of functions. Do play around with the app and checkout how the different IPL batsmen, bowlers and teams stack against each other. Do read my earlier post Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr  for more details about the app and other functions available.

Click GooglyPlus to access the Shiny app!

You can clone/fork/download the code from Github at GooglyPlus

Hope you have fun playing around with the Shiny app!

Note: In the tabs, for some of the functions, not all controls  are required. It is possible to enable the controls selectively but this has not been done in this current version. I may make the changes some time in the future.

Take a look at my other Shiny apps
a.Revisiting crimes against women in India
b. Natural language processing: What would Shakespeare say?

To see all posts click Index of Posts

# Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr

Presenting ‘Googly’, a cool Shiny app that I developed over the last couple of days. This interactive Shiny app was on my mind for quite some time, and I finally got down to implementing it. The Googly Shiny app is based on my R package ‘yorkr’ which is now available in CRAN. The R package and hence this Shiny app is based on data from Cricsheet.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and$4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr and Beaten by sheer pace-Cricket analytics with yorkr A must read for any cricket lover! Check it out!! Googly is based on R package yorkr, and uses the data of all IPL matches from 2008 up to 2016, available on Cricsheet. Googly can do detailed analyses of a) Individual IPL batsman b) Individual IPL bowler c) Any IPL match d) Head to head confrontation between 2 IPL teams e) All matches of an IPL team against all other teams. With respect to the individual IPL batsman and bowler performance, I was in a bit of a ‘bind’ literally (pun unintended), as any IPL player could have played in more than 1 IPL team. Fortunately ‘rbind’ came to my rescue. I just get all the batsman’s/bowler’s performance in each IPL team, and then consolidate it into a single large dataframe to do the analyses of. The Shiny app can be accessed at Googly The code for Googly is available at Github. Feel free to clone/download/fork the code from Googly Check out my 2 books on cricket, a) Cricket analytics with cricketr b) Beaten by sheer pace – Cricket analytics with yorkr, now available in both paperback & kindle versions on Amazon!!! Pick up your copies today! Based on the 5 detailed analysis domains there are 5 tabs IPL Batsman: This tab can be used to perform analysis of all IPL batsman. If a batsman has played in more than 1 team, then the overall performance is considered. There are 10 functions for the IPL Batsman. They are shown below 1. Batsman Runs vs. Deliveries 2. Batsman’s Fours & Sixes 3. Dismissals of batsman 4. Batsman’s Runs vs Strike Rate 5. Batsman’s Moving Average 6. Batsman’s Cumulative Average Run 7. Batsman’s Cumulative Strike Rate 8. Batsman’s Runs against Opposition 9. Batsman’s Runs at Venue 10. Predict Runs of batsman IPL Bowler: This tab can be used to analyze individual IPL bowlers. The functions handle IPL bowlers who have played in more than 1 IPL team. 1. Mean Economy Rate of bowler 2. Mean runs conceded by bowler 3. Bowler’s Moving Average 4. Bowler’s Cumulative Avg. Wickets 5. Bowler’s Cumulative Avg. Economy Rate 6. Bowler’s Wicket Plot 7. Bowler’s Wickets against opposition 8. Bowler’s Wickets at Venues 9. Bowler’s wickets prediction IPL match: This tab can be used for analyzing individual IPL matches. The available functions are 1. Batting Partnerships 2. Batsmen vs Bowlers 3. Bowling Wicket Kind 4. Bowling Wicket Runs 5. Bowling Wicket Match 6. Bowler vs Batsmen 7. Match Worm Graph Head to head : This tab can be used for analyzing head-to-head confrontations, between any 2 IPL teams for e.g. all matches between Chennai Super Kings vs. Deccan Chargers or Kolkata Knight Riders vs. Delhi Daredevils. The available functions are 1. Team Batsmen Batting Partnerships All Matches 2. Team Batsmen vs Bowlers all Matches 3. Team Wickets Opposition All Matches 4. Team Bowler vs Batsmen All Matches 5. Team Bowlers Wicket Kind All Matches 6. Team Bowler Wicket Runs All Matches 7. Win Loss All Matches Overall performance : this tab can be used analyze the overall performance of any IPL team. For this analysis all matches played by this team is considered. The available functions are 1. Team Batsmen Partnerships Overall 2. Team Batsmen vs Bowlers Overall 3. Team Bowler vs Batsmen Overall 4. Team Bowler Wicket Kind Overall Below I include a random set of charts that are generated in each of the 5 tabs A. IPL Batsman a. A Symonds : Runs vs Deliveries b. AB Devilliers – Cumulative Strike Rate c. Gautam Gambhir – Runs at venues d. CH Gayle – Predict runs B. IPL Bowler a. Ashish Nehra – Cumulative Average Wickets b. DJ Bravo – Moving Average of wickets c. R Ashwin – Mean Economy rate vs Overs C.IPL Match a. Chennai Super Kings vs Deccan Chargers (2008 -05-06) – Batsmen Partnerships Note: You can choose either team in the match from the drop down ‘Choose team’ b. Kolkata Knight Riders vs Delhi Daredevils (2013-04-02) – Bowling wicket runs c. Mumbai Indians vs Kings XI Punjab (2010-03-30) – Match worm graph D. Head to head confrontation a. Rising Pune Supergiants vs Mumbai Indians in all matches – Team batsmen partnerships Note: You can choose the partnership of either team in the drop down ‘Choose team’ b. Gujarat Lions – Royal Challengers Bangalore all matches – Bowlers performance against batsmen E. Overall Performance a. Royal Challengers Bangalore overall performance – Batsman Partnership (Rank=1) This is Virat Kohli for RCB. Try out other ranks b. Rajashthan Royals overall Performance – Bowler vs batsman (Rank =2) This is Vinay Kumar. The Shiny app Googly can be accessed at Googly. Feel free to clone/fork the code from Github at Googly For details on my R package yorkr, please see my blog Giga thoughts. There are more than 15 posts detailing the functions and their usage. Do bowl a Googly!!! You may like my other Shiny apps Also see my other posts For more posts see Index of posts # cricketr sizes up legendary All-rounders of yesteryear # Introduction This is a post I have been wanting to write for several months, but had to put it off for one reason or another. In this post I use my R package cricketr to analyze the performance of All-rounder greats namely Kapil Dev, Ian Botham, Imran Khan and Richard Hadlee. All these players had talent that was natural and raw. They were good strikers of the ball and extremely lethal with their bowling. The ODI data for these players have been taken from ESPN Cricinfo. Please be mindful of the ESPN Cricinfo Terms of Use If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

320 and $6.99/Rs448 respectively You can also read this post at Rpubs as cricketr-AR. Dowload this report as a PDF file from cricketr-AR Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr template from Github (which is the R Markdown file I have used for the analysis below). You will only need to make appropriate changes for the players you are interested in. Just a familiarity with R and R Markdown only is needed. All Rounders 1. Kapil Dev (Ind) 2. Ian Botham (Eng) 3. Imran Khan (Pak) 4. Richard Hadlee (NZ) I have sprinkled the plots with a few of my comments. Feel free to draw your conclusions! The analysis is included below if (!require("cricketr")){ install.packages("cricketr",) } library(cricketr) The data for any particular ODI player can be obtained with the getPlayerDataOD() function. To do you will need to go to ESPN CricInfo Playerand type in the name of the player for e.g Kapil Dev, etc. This will bring up a page which have the profile number for the player e.g. for Kapil Dev this would be http://www.espncricinfo.com/india/content/player/30028.html. Hence, Kapils’s profile is 30028. This can be used to get the data for Kapil Dev’s data as shown below. I have already executed the below 4 commands and I will use the files to run further commands #kapil1 #botham11 #imran1 #hadlee1  ## Analyses of batting performances of the All Rounders The following plots gives the analysis of the 4 ODI batsmen 1. Kapil Dev (Ind) – Innings – 225, Runs = 3783, Average=23.79, Strike Rate= 95.07 2. Ian Botham (Eng) – Innings – 116, Runs= 2113, Average=23.21, Strike Rate= 79.10 3. Imran Khan (Pak) – Innings – 175, Runs= 3709, Average=33.41, Strike Rate= 72.65 4. Richard Hadlee (NZ) – Innings – 115, Runs= 1751, Average=21.61, Strike Rate= 75.50 ## Plot of 4s, 6s and the scoring rate in ODIs The 3 charts below give the number of 1. 4s vs Runs scored 2. 6s vs Runs scored 3. Balls faced vs Runs scored A regression line is fitted in each of these plots for each of the ODI batsmen A. Kapil Dev It can be seen that Kapil scores four 4’s when he scores 50. Also after facing 50 deliveries he scores around 43 par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./kapil1.csv","Kapil") batsman6s("./kapil1.csv","Kapil") batsmanScoringRateODTT("./kapil1.csv","Kapil") dev.off() ## null device ## 1 B. Ian Botham Botham scores around 39 runs after 50 deliveries par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./botham1.csv","Botham") batsman6s("./botham1.csv","Botham") batsmanScoringRateODTT("./botham1.csv","Botham") dev.off() ## null device ## 1 C. Imran Khan Imran scores around 36 runs for 50 deliveries par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./imran1.csv","Imran") batsman6s("./imran1.csv","Imran") batsmanScoringRateODTT("./imran1.csv","Imran") dev.off() ## null device ## 1 D. Richard Hadlee Hadlee also scores around 30 runs facing 50 deliveries par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./hadlee1.csv","Hadlee") batsman6s("./hadlee1.csv","Hadlee") batsmanScoringRateODTT("./hadlee1.csv","Hadlee") dev.off() ## null device ## 1 ## Cumulative Average runs of batsman in career Kapils cumulative avrerage runs drops towards the last 15 innings wheres Botham had a good run towards the end of his career. Imran performance as a batsman really peaks towards the end with a cumulative average of almost 25 runs. Hadlee has a stead performance par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeAverageRuns("./kapil1.csv","Kapil") batsmanCumulativeAverageRuns("./botham1.csv","Botham") batsmanCumulativeAverageRuns("./imran1.csv","Imran") batsmanCumulativeAverageRuns("./hadlee1.csv","Hadlee") dev.off() ## null device ## 1 ## Cumulative Average strike rate of batsman in career Kapil’s strike rate is superlative touching the 90’s steadily. Botham’s strike drops dramatically towards the latter part of his career. Imran average at a steady 75 and Hadlee averages around 85. par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeStrikeRate("./kapil1.csv","Kapil") batsmanCumulativeStrikeRate("./botham1.csv","Botham") batsmanCumulativeStrikeRate("./imran1.csv","Imran") batsmanCumulativeStrikeRate("./hadlee1.csv","Hadlee") dev.off() ## null device ## 1 ## Relative Mean Strike Rate Kapil tops the strike rate among all the all-rounders. This is really a revelation to me. This can also be seen in the original data in Kapil’s strike rate is at a whopping 95.07 in comparison to Botham, Inran and Hadlee who are at 79.1,72.65 and 75.50 respectively par(mar=c(4,4,2,2)) frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv") names <- list("Kapil","Botham","Imran","Hadlee") relativeBatsmanSRODTT(frames,names) ## Relative Runs Frequency Percentage This plot shows that Imran has a much better average runs scored over the other all rounders followed by Kapil frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv") names <- list("Kapil","Botham","Imran","Hadlee") relativeRunsFreqPerfODTT(frames,names) ## Relative cumulative average runs in career It can be seen clearly that Imran Khan leads the pack in cumulative average runs followed by Kapil Dev and then Botham frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv") names <- list("Kapil","Botham","Imran","Hadlee") relativeBatsmanCumulativeAvgRuns(frames,names) ## Relative cumulative average strike rate in career In the cumulative strike rate Hadlee and Kapil run a close race. frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv") names <- list("Kapil","Botham","Imran","Hadlee") relativeBatsmanCumulativeStrikeRate(frames,names) ## Percent 4’s,6’s in total runs scored The plot below shows the contrib frames <- list("./kapil1.csv","./botham1.csv","imran1.csv","hadlee1.csv") names <- list("Kapil","Botham","Imran","Hadlee") runs4s6s <-batsman4s6s(frames,names) print(runs4s6s) ## Kapil Botham Imran Hadlee ## Runs(1s,2s,3s) 72.08 66.53 77.53 73.27 ## 4s 21.98 25.78 17.61 21.08 ## 6s 5.94 7.68 4.86 5.65 ## Runs forecast The forecast for the batsman is shown below. par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfForecast("./kapil1.csv","Kapil") batsmanPerfForecast("./botham1.csv","Botham") batsmanPerfForecast("./imran1.csv","Imran") batsmanPerfForecast("./hadlee1.csv","Hadlee") dev.off() ## null device ## 1 ## 3D plot of Runs vs Balls Faced and Minutes at Crease The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./kapil1.csv","Kapil") battingPerf3d("./botham1.csv","Botham") dev.off() ## null device ## 1 par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./imran1.csv","Imran") battingPerf3d("./hadlee1.csv","Hadlee") dev.off() ## null device ## 1 ## Predicting Runs given Balls Faced and Minutes at Crease A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease. BF <- seq( 10, 200,length=10) Mins <- seq(30,220,length=10) newDF <- data.frame(BF,Mins) kapil <- batsmanRunsPredict("./kapil1.csv","Kapil",newdataframe=newDF) botham <- batsmanRunsPredict("./botham1.csv","Botham",newdataframe=newDF) imran <- batsmanRunsPredict("./imran1.csv","Imran",newdataframe=newDF) hadlee <- batsmanRunsPredict("./hadlee1.csv","Hadlee",newdataframe=newDF) The fitted model is then used to predict the runs that the batsmen will score for a hypotheticial Balls faced and Minutes at crease. It can be seen that Kapil is the best bet for a balls faced and minutes at crease followed by Botham. batsmen <-cbind(round(kapil$Runs),round(botham$Runs),round(imran$Runs),round(hadlee$Runs)) colnames(batsmen) <- c("Kapil","Botham","Imran","Hadlee") newDF <- data.frame(round(newDF$BF),round(newDF\$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kapil Botham Imran Hadlee
## 1          10           30    16      6    10     15
## 2          31           51    33     22    22     28
## 3          52           72    49     38    33     42
## 4          73           93    65     54    45     56
## 5          94          114    81     70    56     70
## 6         116          136    97     86    67     84
## 7         137          157   113    102    79     97
## 8         158          178   130    117    90    111
## 9         179          199   146    133   102    125
## 10        200          220   162    149   113    139

## Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means . A. Kapil Dev

batsmanRunsLikelihood("./kapil1.csv","Kapil")

## Summary of  Kapil 's runs scoring likelihood
## **************************************************
##
## There is a 34.57 % likelihood that Kapil  will make  22 Runs in  24 balls over 34  Minutes
## There is a 17.28 % likelihood that Kapil  will make  46 Runs in  46 balls over  65  Minutes
## There is a 48.15 % likelihood that Kapil  will make  5 Runs in  7 balls over 9  Minutes

B. Ian Botham

batsmanRunsLikelihood("./botham1.csv","Botham")

## Summary of  Botham 's runs scoring likelihood
## **************************************************
##
## There is a 47.95 % likelihood that Botham  will make  9 Runs in  12 balls over 15  Minutes
## There is a 39.73 % likelihood that Botham  will make  23 Runs in  32 balls over  44  Minutes
## There is a 12.33 % likelihood that Botham  will make  59 Runs in  74 balls over 101  Minutes

C. Imran Khan

batsmanRunsLikelihood("./imran1.csv","Imran")

## Summary of  Imran 's runs scoring likelihood
## **************************************************
##
## There is a 23.33 % likelihood that Imran  will make  36 Runs in  54 balls over 74  Minutes
## There is a 60 % likelihood that Imran  will make  14 Runs in  18 balls over  23  Minutes
## There is a 16.67 % likelihood that Imran  will make  53 Runs in  90 balls over 115  Minutes

batsmanRunsLikelihood("./hadlee1.csv","Hadlee")

## Summary of  Hadlee 's runs scoring likelihood
## **************************************************
##
## There is a 6.1 % likelihood that Hadlee  will make  64 Runs in  79 balls over 90  Minutes
## There is a 42.68 % likelihood that Hadlee  will make  25 Runs in  33 balls over  44  Minutes
## There is a 51.22 % likelihood that Hadlee  will make  9 Runs in  11 balls over 15  Minutes

## Average runs at ground and against opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./kapil1.csv","Kapil")
batsmanAvgRunsOpposition("./kapil1.csv","Kapil")

dev.off()
## null device
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./botham1.csv","Botham")
batsmanAvgRunsOpposition("./botham1.csv","Botham")

dev.off()
## null device
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./imran1.csv","Imran")
batsmanAvgRunsOpposition("./imran1.csv","Imran")

dev.off()
## null device
##           1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./hadlee1.csv","Hadlee")

dev.off()
## null device
##           1

## Moving Average of runs over career

The moving average for the 4 batsmen indicate the following

Kapil’s performance drops significantly while there is a slump in Botham’s performance. On the other hand Imran and Hadlee’s performance were on the upswing.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kapil1.csv","Kapil")
batsmanMovingAverage("./botham1.csv","Botham")
batsmanMovingAverage("./imran1.csv","Imran")
batsmanMovingAverage("./hadlee1.csv","Hadlee")

dev.off()
## null device
##           1

## Check batsmen in-form, out-of-form

[1] “**************************** Form status of Kapil ****************************\n\n
Population size: 72
Mean of population: 19.38 \n
Sample size: 9 Mean of sample: 6.78 SD of sample: 6.14 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 8.4e-05 is less than alpha= 0.05

“**************************** Form status of Botham ****************************\n\n
Population size: 65
Mean of population: 21.29 \n
Sample size: 8 Mean of sample: 15.38 SD of sample: 13.19 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.120342 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 54
Mean of population: 24.94 \n
Sample size: 6 Mean of sample: 30.83 SD of sample: 25.4 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence interval of population average\n\n
Imran ‘s Form Status: In-Form because the p value: 0.704683 is greater than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 73
Mean of population: 18 \n
Sample size: 9 Mean of sample: 27 SD of sample: 24.27 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.85262 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

## Analyses of bowling performances of the All Rounders

The following plots gives the analysis of the 4 ODI batsmen

1. Kapil Dev (Ind) – Innings – 225, Wickets = 253, Average=27.45, Economy Rate= 3.71
2. Ian Botham (Eng) – Innings – 116, Wickets = 145, Average=28.54, Economy Rate= 3.96
3. Imran Khan (Pak) – Innings – 175, Wickets = 182, Average=26.61, Economy Rate= 3.89
4. Richard Hadlee (NZ) – Innings – 115, Wickets = 158, Average=21.56, Economy Rate= 3.30

Botham has the highest number of innings and wickets followed closely by Mitchell. Imran and Hadlee have relatively fewer innings.

To get the bowler’s data use

#kapil2
#botham2
#imran2
#hadlee2 

“

# Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc).

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./kapil2.csv","Kapil")
bowlerWktsFreqPercent("./botham2.csv","Botham")
bowlerWktsFreqPercent("./imran2.csv","Imran")
bowlerWktsFreqPercent("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1

## Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers.

par(mfrow=c(1,4))
par(mar=c(4,4,2,2))

bowlerWktsRunsPlot("./kapil2.csv","Kapil")
bowlerWktsRunsPlot("./botham2.csv","Botham")
bowlerWktsRunsPlot("./imran2.csv","Imran")
bowlerWktsRunsPlot("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1

## Cumulative average wicket plot

Botham has the best cumulative average wicket touching almost 1.6 wickets followed by Hadlee

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("./kapil2.csv","Kapil")

bowlerCumulativeAvgWickets("./botham2.csv","Botham")

bowlerCumulativeAvgWickets("./imran2.csv","Imran")

bowlerCumulativeAvgWickets("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("./kapil2.csv","Kapil")

bowlerCumulativeAvgEconRate("./botham2.csv","Botham")

bowlerCumulativeAvgEconRate("./imran2.csv","Imran")

bowlerCumulativeAvgEconRate("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1

## Average wickets in different grounds and opposition

A. Kapil Dev

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./kapil2.csv","Kapil")
bowlerAvgWktsOpposition("./kapil2.csv","Kapil")

dev.off()
## null device
##           1

B. Ian Botham

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./botham2.csv","Botham")
bowlerAvgWktsOpposition("./botham2.csv","Botham")

dev.off()
## null device
##           1

C. Imran Khan

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("./imran2.csv","Imran")
bowlerAvgWktsOpposition("./imran2.csv","Imran")

dev.off()
## null device
##           1

par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1

## Relative bowling performance

It can be seen that Botham is the most effective wicket taker of the lot

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
relativeBowlingPerf(frames,names)

## Relative Economy Rate against wickets taken

Hadlee has the best overall economy rate followed by Kapil Dev

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
relativeBowlingERODTT(frames,names)

## Relative cumulative average wickets of bowlers in career

This plot confirms the wicket taking ability of Botham followed by Hadlee

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
relativeBowlerCumulativeAvgWickets(frames,names)

## Relative cumulative average economy rate of bowlers

frames <- list("./kapil2.csv","./botham2.csv","imran2.csv","hadlee2.csv")
relativeBowlerCumulativeAvgEconRate(frames,names)

## Moving average of wickets over career

This plot shows that Hadlee has the best economy rate followed by Kapil

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./kapil2.csv","Kapil")
bowlerMovingAverage("./botham2.csv","Botham")
bowlerMovingAverage("./imran2.csv","Imran")
bowlerMovingAverage("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1

## Wickets forecast

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerPerfForecast("./kapil2.csv","Kapil")
bowlerPerfForecast("./botham2.csv","Botham")
bowlerPerfForecast("./imran2.csv","Imran")
bowlerPerfForecast("./hadlee2.csv","Hadlee")

dev.off()
## null device
##           1`

## Check bowler in-form, out-of-form

“**************************** Form status of Kapil ****************************\n\n
Population size: 198
Mean of population: 1.2 \n Sample size: 23 Mean of sample: 0.65 SD of sample: 0.83 \n\n
Null hypothesis H0 : Kapil ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Kapil ‘s sample average is below the 95% confidence\n interval of population average\n\n
Kapil ‘s Form Status: Out-of-Form because the p value: 0.002097 is less than alpha= 0.05 \n

“**************************** Form status of Botham ****************************\n\n
Population size: 166
Mean of population: 1.58 \n Sample size: 19 Mean of sample: 1.47 SD of sample: 1.12 \n\n
Null hypothesis H0 : Botham ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Botham ‘s sample average is below the 95% confidence\n interval of population average\n\n
Botham ‘s Form Status: In-Form because the p value: 0.336694 is greater than alpha= 0.05 \n

“**************************** Form status of Imran ****************************\n\n
Population size: 137
Mean of population: 1.23 \n Sample size: 16 Mean of sample: 0.81 SD of sample: 0.91 \n\n
Null hypothesis H0 : Imran ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Imran ‘s sample average is below the 95% confidence\n interval of population average\n\n
Imran ‘s Form Status: Out-of-Form because the p value: 0.041727 is less than alpha= 0.05 \n

“**************************** Form status of Hadlee ****************************\n\n
Population size: 100
Mean of population: 1.38 \n Sample size: 12 Mean of sample: 1.67 SD of sample: 1.37 \n\n
Null hypothesis H0 : Hadlee ‘s sample average is within 95% confidence interval \n of population average\n
Alternative hypothesis Ha : Hadlee ‘s sample average is below the 95% confidence\n interval of population average\n\n
Hadlee ‘s Form Status: In-Form because the p value: 0.761265 is greater than alpha= 0.05 \n *******************************************************************************************\n\n”

# Key findings

Here are some key conclusions ODI batsmen

1. Kapil Dev’s strike rate stands high above the other 3
2. Imran Khan has the best cumulative average runs followed by Kapil
3. Botham is the most effective wicket taker followed by Hadlee
4. Hadlee is the most economical bowler and is followed by Kapil Dev
5. For a hypothetical Balls Faced and Minutes at creases Kapil will score the most runs followed by Botham
6. The moving average of indicates that the best is yet to come for Imran and Hadlee. Kapil and Botham were on the decline

Also see my other posts in R

For a full list of posts see Index of posts

# IBM Data Science Experience:  First steps with yorkr

Fresh, and slightly dizzy, from my foray into Quantum Computing with IBM’s Quantum Experience, I now turn my attention to IBM’s Data Science Experience (DSE).

I am on the verge of completing a really great 3 module ‘Data Science and Engineering with Spark XSeries’ from the University of California, Berkeley and I have been thinking of trying out some form of integrated delivery platform for performing analytics, for quite some time.  Coincidentally,  IBM comes out with its Data Science Experience. a month back. There are a couple of other collaborative platforms available for playing around with Apache Spark or Data Analytics namely Jupyter notebooks, Databricks, Data.world.

I decided to go ahead with IBM’s Data Science Experience as  the GUI is a lot cooler, includes shared data sets and integrates with Object Storage, Cloudant DB etc,  which seemed a lot closer to the cloud, literally!  IBM’s DSE is an interactive, collaborative, cloud-based environment for performing data analysis with Apache Spark. DSE is hosted on IBM’s PaaS environment, Bluemix. It should be possible to access in DSE the plethora of cloud services available on Bluemix. IBM’s DSE uses Jupyter notebooks for creating and analyzing data which can be easily shared and has access to a few hundred publicly available datasets

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

In this post, I use IBM’s DSE and my R package yorkr, for analyzing the performance of 1 ODI match (Aus-Ind, 2 Feb 2012)  and the batting performance of Virat Kohli in IPL matches. These are my ‘first’ steps in DSE so, I use plain old “R language” for analysis together with my R package ‘yorkr’. I intend to  do more interesting stuff on Machine learning with SparkR, Sparklyr and PySpark in the weeks and months to come.

You can checkout the Jupyter notebooks created with IBM’s DSE Y at Github  – “Using R package yorkr – A quick overview’ and  on NBviewer at “Using R package yorkr – A quick overview

Working with Jupyter notebooks are fairly straight forward which can handle code in R, Python and Scala. Each cell can either contain code (Python or Scala), Markdown text, NBConvert or Heading. The code is written into the cells and can be executed sequentially. Here is a screen shot of the notebook.

The ‘File’ menu can be used for ‘saving and checkpointing’ or ‘reverting’ to a checkpoint. The ‘kernel’ menu can be used to start, interrupt, restart and run all cells etc. Data Sources icon can be used to load data sources to your code. The data is uploaded to Object Storage with appropriate credentials. You will have to  import this data from Object Storage using the credentials. In my notebook with yorkr I directly load the data from Github.  You can use the sharing to share the notebook. The shared notebook has an extension ‘ipynb’. You can use the ‘Sharing’ icon  to share the notebook. The shared notebook has an extension ‘ipynb’. You an import this notebook directly into your environment and can get started with the code available in the notebook.

You can import existing R, Python or Scala notebooks as shown below. My notebook ‘Using R package yorkr – A quick overview’ can be downloaded using the link ‘yorkrWithDSE’ and clicking the green download icon on top right corner.

I have also uploaded the file to Github and you can download from here too ‘yorkrWithDSE’. This notebook can be imported into your DSE as shown below

Jupyter notebooks have been integrated with Github and are rendered directly from Github.  You can view my Jupyter notebook here  – “Using R package yorkr – A quick overview’. You can also view it on NBviewer at “Using R package yorkr – A quick overview

So there it is. You can download my notebook, import it into IBM’s Data Science Experience and then use data from ‘yorkrData” as shown. As already mentioned yorkrData contains converted data for ODIs, T20 and IPL. For details on how to use my R package yorkr  please my posts on yorkr at “Index of posts

Hope you have fun playing wit IBM’s Data Science Experience and my package yorkr.

I will be exploring IBM’s DSE in weeks and months to come in the areas of Machine Learning with SparkR,SparklyR or pySpark.

Watch this space!!!

Disclaimer: This article represents the author’s viewpoint only and doesn’t necessarily represent IBM’s positions, strategies or opinions

Also see

To see all my posts check
Index of posts