GooglyPlusPlus now with Win Probability Analysis for all T20 matches

In my 2 earlier posts Computing Win-Probability of T20 matches and Boosting Win Probability accuracy with player embeddings I had discussed the approaches to computing ball-by-ball Win Probability of a T20 match. My best ML models were.

glmnet – Logistic Regression(LR) with lasso regularization and penalty – Accuracy – 0.73
Random Forest (RF) – Accuracy – 0.92

Incidentally, both these models can be used on live streaming ball-by-ball data if available

I have now integrated the trained ML Logistic Regression model with penalty into my Shiny app GooglyPlusPlus. Unfortunately, the Random Forest model, besides being computationally intensive is also heavy-weight (1.29GB) when compared to LR model which is just 91.2 MB. So, I was not able to upload the Random Forest model to Shiny as the memory allowed exceeded that allowed in my paid subscription.

However, I will demonstrate the performance of both models, LR ( in my Web app) and RF (in my local machine). Incidentally the Random Forest model takes a long time to load and even longer (~90 secs) to compute the Win Probability of a T20 match, while the LR model computes in a few seconds. Interestingly, I find the LR model’s Win Probability more intuitive and explainable than the Random Forest. Possibly, the RF model overfits. I need to explore this more. Anyway, take a look at some interesting Win Probability Charts (fortune swings of teams!!!) over the course of the T20 match.

You can try out this latest version here at GooglyPlusPlus !!

Some major upsets in the ICC T20 World Cup, 2022

A) Netherlands vs South Africa – 2022-11-06

B) Zimbabwe vs Pakistan – 2022-10-27

1a) Netherlands vs South Africa – ICC 2022-11-06 (Worm-wicket chart)

Netherlands shocked South Africa and ended South Africa’s hopes for a place in the semi-finals. The match worm-wicket chart for this match is shown below

The 2 circled areas are where the South Africa lost the plot around the 8th over (~120+48=168) and 15th over (~120+90=210)

Around 205-215 ball of the innings South Africa started to lose

1b) Netherlands vs South Africa – ICC 2022-11-06 – Logistic Regression with regularisation (Shiny)

1c) 1b) Netherlands vs South Africa – ICC 2022-11-06 – Random Forest (not in Web app, local)

If you notice, for some reason, Random Forest model decided that Netherland was on the winning side, right from the start. Why would this happen? Possibly overfitting, I presume…

2a) Zimbabwe vs Pakistan – ICC 2022-10-27 Worm-wicket chart

Pakistan seemed to be cruising along with finally 11 runs in the last over, and for some reason they panicked and lost.

2a) Zimbabwe vs Pakistan -ICC 2022 – 2022-10-27 – Logistic Regression with regularisation (Shiny)

It can be seen that Pakistan did seem to have the upper hand , save the last over.

2a) Zimbabwe vs Pakistan ICC 2022-10-27 – Random Forest (not in Web app, local)

Again the Random Forest model implies that Zimbabwe was on a winning foot except in brief stretches for e.g ball 248 of the innings

So while the accuracy of Random Forest model is better by about ~20% I feel it is the Logistic Regression with penalty has generalised better and is more intuitive. Meanwhile, I will see if I can improve LR or try another model which can provide better accuracy besides generalising well

Henceforth, I will only be using the LR model that is in the Shiny app

3a) England vs New Zealand T20 Women – 2021-09-04

Another close match till the 15th over. After that England’s seems to have had a slower strike rate and lost