GooglyPlusPlus2022 is the new avatar of last year’s GooglyPlusPlus2021. Roughly, about 5 years back I had written a post on Using linear programming to optimize T20 batting and bowling line up. This post has been on the back of my mind for a long time and I decided to pay this post a revisit. This requires computing performance of individual batsmen vs bowlers and vice-versa for performing the optimization. So in this latest incarnation, there are 4 new functions
batsmanVsBowlerPerf – Performance of batsmen against chosen bowlers
bowlerVsBatsmanPerf – Performance of bowlers versus specific batsmen
battingOptimization – Optimizing batting line up based on strike rates ad remaining overs
bowlingOptimization – Optimizing bowling line up based on economy rates and remaining overs
These 4 functions have been incorporated in all the supported 9 T20 formats namely a. IPL b. Intl. T20(men) c. Intl. T20 (women) d. BBL e. NTB f. PSL g. WBB h. CPL i. SSM
You can clone/fork the code for GooglyPlusPlus2022 from Github from gpp2022-1
With this latest update you can do a myriad of analyses of batsmen, bowlers, teams, matches. This is just-in-time for the IPL Mega-auction!! Do check out these other posts of GooglyPlusPlus for other detailed analysis
A) Batsman Vs Bowlers – This option computes the performance of individual batsman against individual bowlers
a) IPL Batsmen vs Bowlers
Included below are the performances of Dhoni, Raina and Kohli against Malinga, Ashwin and Bumrah. Note: The last 2 text box input are not required for this.
b) Intl. T20 (men) Batsmen vs Bowlers
Note: You can type the name and choose from the drop down list
B) Bowler vs Batsmen – You can check the performance of specific bowlers against specific batsmen
a) Intl. T20 (women) India vs Australia
b) PSL Bowlers vs Batsmen
C) Strategy for optimizing batting and bowling line up
From the above 2 tabs, it is obvious, that different bowlers have different ER and wicket rate against different batsmen. In other words, the effectiveness of the bowlers varies by batsmen. Conversely, batsmen are more comfortable with certain bowlers versus others and this shows up in different strike rates.
Hence during the death overs, when trying to restrict batsmen to a certain score or on the flip side when the batting side needs to score a target within certain overs, we need to take advantage of the relative effectiveness of bowlers vs batsmen for optimising bowling and aggressiveness of batsmen versus bowlers to quickly reach the target.
This is the approach that is used for bowling and batting optimisation. For optimising bowling, we need to formulate a minimisation problem based on ER rates and for optimising batting, a maximisation strategy is chosen based on SR. ‘Integer programming’ is used to compute during the last set of overs
This latest version includes optimization using “integer programming” based on R package lpSolve.
Here are the 2 formulations
Assume there are 3 bowlers – and there are 3 batsmen –
I) LP Formulation for bowling order
Let the economy rate be the Economy Rate of the jth bowler to the ith batsman. Also if remaining overs for the bowlers are and the total number of overs left to be bowled are
Let the economy rate be the Economy Rate of the jth bowler to the ith batsman. Objective function : Minimize – i.e. Constraints Where is the number of overs remaining for the jth bowler against ‘k’ batsmen and if the total number of overs remaining to be bowled is N then or The overs that any bowler can bowl is
II) LP Formulation for batting lineup
Let the strike rate be the Strike Rate of the ith batsman to the jth bowler Objective function : Maximize – i.e. Constraints Where is the number of overs remaining for the jth bowler against ‘k’ batsmen and the total number of overs remaining to be bowled is N then or The overs that any bowler can bowl is
C) Optimized bowling lineup
a) IPL – Optimizing bowling line up
Note: For computing the Optimal bowling lineup, the total number of overs remaining and the number of overs for each bowler have to be entered.
b) PSL – Optimizing batting line up
d) Optimized batting lineup
a) Intl. T20 (men) India vs England
b) Carribean Premier League – Optimizing batting line up
In my recent post My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI),I had recounted my journey in the domains of of Data Science, Machine Learning (ML), and more recently Deep Learning (DL) all of which are useful while analyzing data. Of late, I have come to the realization that there are many facets to data. And to glean insights from data, Data Science, ML and DL alone are not sufficient and one needs to also have a good handle on linear programming and optimization. My colleague at IBM Research also concurred with this view and told me he had arrived at this conclusion several years ago. While ML & DL are useful and interesting to make inferences and predictions of outputs from input variables, optimization computes the choice of input which result in maximum or minimum. So I made a small course correction and started on a course from India’s own NPTEL Introduction to Linear Programming by Prof G. Srinivasan of IIT Madras. The lectures are delivered with remarkable clarity by the Prof and I am just about halfway through the course (each lecture is of 50-55 min duration) when I decided that I needed to try to formulate and solve some real world Linear Programming problem.
As usual, I turned towards cricket for some appropriate situations, and sure enough it was there in the open. For this LP formulation I take International T20 and IPL, though International ODI will also work equally well. You can download the associated code and data for this from Github at LP-cricket-analysis
In T20 matches the captain has to make choice of how to rotate bowlers with the aim of restricting the batting side. Conversely, the batsmen need to take advantage of the bowling strength to maximize the runs scored.
Note: a) A simple and obvious strategy would be – If the ith bowler’s economy rate is less than the economy rate of the jth bowler i.e. < then have bowler ‘i’ to bowl more overs as his/her economy rate is better
b)A better strategy would be to consider the economy rate of each bowler against each batsman. How often we have seen bowlers who have a great bowling average get punished by some batsman, or a bowler who is generally very poor is very effective against a particular batsman. i.e. < where the jth bowler is more effective than the kth bowler against the ith batsman. This now becomes a linear optimization problem as we can have several combinations of number of overs x economy rate for different bowlers and we will have to solve this algorithmically to determine the lowest score for bowling performance or highest score for batting order.
This post uses the latter approach to optimize bowling change and batting lineup.
Let is take a hypothetical situation Assume there are 3 bowlers – and there are 4 batsmen –
Let the economy rate be the Economy Rate of the jth bowler to the ith batsman. Also if remaining overs for the bowlers are and the total number of overs left to be bowled are then the question is
a) Given the economy rate of each bowler per batsman, how many overs should each bowler bowl, so that the total runs scored by all the batsmen are minimum?
b) Alternatively, if the know the individual strike rate of a batsman against the individual bowlers, how many overs should each batsman face with a bowler so that the total runs scored is maximized?
1. LP Formulation for bowling order
Let the economy rate
be the Economy Rate of the jth bowler to the ith batsman. Objective function : Minimize –
Where k is the number overs o, remaining for the jth bowler and the total number of overs remaining to be bowled is N then – Also The overs that any bowler can bowl can be >=0
2. LP Formulation for batting lineup
Where k is the number overs o, remaining for the jth bowler and the total number of overs remaining to be bowled is N then – Also The overs that any bowler can bowl can be >= 0 or any number that the bowler has already bowled.
For this maximization and minimization problem I used lpSolveAPI.
3. LP formulation (Example 1)
Initially I created a test example to ensure that I get the LP formulation and solution correct. Here the er1=4 and er2=3 and o1 & o2 are the overs bowled by bowlers 1 & 2. Also o1+o2=4 In this example as below
o1 o2 Obj Fun(=4o1+3o2) 1 3 13 2 2 14 3 1 15
library(lpSolveAPI)
library(dplyr)
library(knitr)
lprec <- make.lp(0, 2)
a <-lp.control(lprec, sense="min")
set.objfn(lprec, c(4, 3)) # Economy Rate of 4 and 3 for er1 and er2
add.constraint(lprec, c(1, 1), "=",4) # o1 + o2 =4
add.constraint(lprec, c(1, 0), ">",1) # o1 > 1
add.constraint(lprec, c(0, 1), ">",1) # o2 > 1
lprec
Note 1: In the above example 13 runs is the minimum that can be scored and this requires
o1=1
o2=3
Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman.
4. LP formulation (Example 2)
In this formulation there are 2 bowlers and 2 batsmen o11,o12 are the oves bowled by bowler 1 to batsmen 1 & 2 and o21, o22 are the overs bowled by bowler 2 to batsmen 1 & 2 er11=4, er12=2,er21=2,er22=5 o11+o12+o21+o22=5
The solution for this manually computed is B1 B2 B1 B2 Runs 1 1 1 2 18 1 2 1 1 15 2 1 1 1 17 1 1 2 1 15
Note: In the above example 15 runs is the minimum that can be scored and this requires
o11=1
o12=2
o21=1
o22=1
It is possible to keep the minimum to other values and solves also.
5. LP formulation for International T20 India vs Australia (Batting lineup)
To analyze batting and bowling lineups in the cricket world I needed to get the ball-by-ball details of runs scored by each batsman against each of the bowlers. Fortunately I had already created this with my R package yorkr. yorkr processes yaml data from Cricsheet. So I copied the data of all matches between Australia and India in International T20s. You can download my processed data for International T20 at Inswinger
e <- as.data.frame(rbind(c(1,1,1),c(0,3,0),c(2,0,0),c(3,4,1)))
names(e) <- c("S Watson","B Lee","MA Starc")
rownames(e) <- c("Kohli","Yuvraj","Dhoni","Overs")
e
## S Watson B Lee MA Starc
## Kohli 1 1 1
## Yuvraj 0 3 0
## Dhoni 2 0 0
## Overs 3 4 1
Note: This assumes that the batsmen perform at their current Strike Rate. Howvever anything can happen in a real game, but nevertheless this is a fairly reasonable estimate of the performance
Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman.
6. LP formulation for International T20 India vs Australia (Bowling lineup)
For this I compute how the bowling should be rotated between R Ashwin, RA Jadeja and JJ Bumrah when taking into account their performance against batsmen like Shane Watson, AJ Finch and David Warner. For the bowling performance I take the Economy rate of the bowlers. The data is the same as above
computeSR <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6)
a1
}
# RA Jadeja
jadejaWatson<- computeER("SR Watson","RA Jadeja")
jadejaWatson
As in the case of International T20s I also have processed IPL data derived from my R package yorkr. yorkr. yorkr processes yaml data from Cricsheet. The processed data for all IPL matches can be downloaded from GooglyPlus
As I mentioned it is possible to perform a maximation with the same formulation since computeSR<==>computeER
This just flips the problem around and computes the maximum runs that can be scored for the batsman’s Strike rate (this is same as the bowler’s Economy rate)
Conclusion: It is possible to thus determine the optimum no of overs to give to a specific bowler based on his/her Economy Rate with a particular batsman. Similarly one can determine the maximim runs that can be scored by a batsmen based on their strike rate with bowlers. However, while this may provide some indication a cricket like any other game depends on a fair amount of chance.