# Player Performance Estimation using AI Collaborative Filtering

## 1. Introduction

Often times before crucial matches, or in general, we would like to know the performance of a batsman against a bowler or vice-versa, but we may not have the data. We generally have data where different batsmen would have faced different sets of bowlers with certain performance data like ballsFaced, totalRuns, fours, sixes, strike rate and timesOut. Similarly different bowlers would have performance figures(deliveries, runsConceded, economyRate and wicketTaken) against different sets of batsmen. We will never have the data for all batsmen against all bowlers. However, it would be good estimate the performance of batsmen against a bowler, even though we do not have the performance data. This could be done using collaborative filtering which identifies and computes based on the similarity between batsmen vs bowlers & bowlers vs batsmen.

This post shows an approach whereby we can estimate a batsman’s performance against bowlers even though the batsman may not have faced those bowlers, based on his/her performance against other bowlers. It also estimates the performance of bowlers against batsmen using the same approach. This is based on the recommender algorithm which is used to recommend products to customers based on their rating on other products.

This idea came to me while generating the performance of batsmen vs bowlers & vice-versa for 2 IPL teams in this IPL 2022 with my Shiny app GooglyPlusPlus in the optimization tab, I found that there were some batsmen for which there was no data against certain bowlers, probably because they are playing for the first time in their team or because they were new (see picture below)

In the picture above there is no data for Dewald Brevis against Jasprit Bumrah and YS Chahal. Wouldn’t be great to estimate the performance of Brevis against Bumrah or vice-versa? Can we estimate this performance?

While pondering on this problem, I realized that this problem formulation is similar to the problem formulation for the famous Netflix movie recommendation problem, in which user’s ratings for certain movies are known and based on these ratings, the recommender engine can generate ratings for movies not yet seen.

This post estimates a player’s (batsman/bowler) using the recommender engine This post is based on R package recommenderlab

“Michael Hahsler (2021). recommenderlab: Lab for Developing and Testing Recommender Algorithms. R package version 0.2-7. https://github.com/mhahsler/recommenderlab

Note 1: Thw data for this analysis is taken from Cricsheet after being processed by my R package yorkr.

You can also read this post in RPubs at Player Performance Estimation using AI Collaborative Filtering

A PDF copy of this post is available at Player Performance Estimation using AI Collaborative Filtering.pdf

You can download this R Markdown file and the associated data and perform the analysis yourself using any other recommender engine from Github at playerPerformanceEstimation

## Problem statement

In the table below we see a set of bowlers vs a set of batsmen and the number of times the bowlers got these batsmen out.
By knowing the performance of the bowlers against some of the batsmen we can use collaborative filter to determine the missing values. This is done using the recommender engine.

The Recommender Engine works as follows. Let us say that there are feature vectors $x^1$, $x^2$ and $x^3$ for the 3 bowlers which identify the characteristics of these bowlers (“fast”, “lateral drift through the air”, “movement off the pitch”). Let each batsman be identified by parameter vectors $\theta^1$, $\theta^2$ and so on

For e.g. consider the following table

Then by assuming an initial estimate for the parameter vector $\theta$ and the feature vector xx we can formulate this as an optimization problem which tries to minimize the error for $\theta^T*x$ This can work very well as the algorithm can determine features which cannot be captured. So for e.g. some particular bowler may have very impressive figures. This could be due to some aspect of the bowling which cannot be captured by the data for e.g. let’s say the bowler uses the ‘scrambled seam’ when he is most effective, with a slightly different arc to the flight. Though the algorithm cannot identify the feature as we know it, but the ML algorithm should pick up intricacies which cannot be captured in data.

Hence the algorithm can be quite effective.

Note: The recommender lab performance is not very good and the Mean Square Error is quite high. Also, the ROC and AUC curves show that not in aLL cases the algorithm is doing a clean job of separating the True positives (TPR) from the False Positives (FPR)

Note: This is similar to the recommendation problem

The collaborative optimization object can be considered as a minimization of both $\theta$ and the features x and can be written as

J($x^{(1)},x^{(2)},..x^{(n_{u})}$, $\theta^{(1)},\theta^{(2)},..,\theta^{(n_{m})}$}= 1/2$\sum(\theta^{j})^{T}x^{i}- y^{(i,j)})^{2} + \lambda\sum\sum (x_{k}^{i})^{2} + \lambda\sum\sum (_\theta{k}^{j})^{2}$

The collaborative filtering algorithm can be summarized as follows

1. Initialize $\theta^1$, $\theta^2$$\theta^{n_{u}}$ and the set of features be $x^1$,$x^2$, … ,$x^{n_{m}}$ to small random values
2. Minimize J($\theta^1$, $\theta^2$$\theta^{n_{u}}$,$x^1$, $x^2$, … ,$x^{n_{m}}$) using gradient descent. For every
j=1,2, …$n_{u}$, i= 1,2,.., $n_{m}$
3. $x_{k}^{i}$ := $x_{k}^{i}$$\alpha$ ( $\sigma$ $(\theta^j)^T$)$x^i$$y^(i,j)\theta_{k}^{j} + \lambda x_{k}^i$

&

$\theta_{k}^{i}$ := $\theta_{k}^{i}$$\alpha$ ( $\sigma$ $(\theta^j)^T)x^i - y^(i,j)\theta_{k}^{j} + \lambda x_{k}^i$
4. Hence for a batsman with parameters $\theta$ and a bowler with (learned) features x, predict the “times out” for the player where the value is not known using $\theta^Tx$

The above derivation for the recommender problem is taken from Machine Learning by Prof Andrew Ng at Coursera from the lecture Collaborative filtering

There are 2 main types of Collaborative Filtering(CF) approaches

1. User based Collaborative Filtering User-based CF is a memory-based algorithm which tries to mimics word-of-mouth by analyzing rating data from many individuals. The assumption is that users with similar preferences will rate items similarly.
2. Item based Collaborative Filtering Item-based CF is a model-based approach which produces recommendations based on the relationship between items inferred from the rating matrix. The assumption behind this approach is that users will prefer items that are similar to other items they like.

## 1a. A note on ROC and Precision-Recall curves

A small note on interpreting ROC & Precision-Recall curves in the post below

ROC Curve: The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR). Ideally the TPR should increase faster than the FPR and the AUC (area under the curve) should be close to 1

Precision-Recall: The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate

library(reshape2)
library(dplyr)
library(ggplot2)
library(recommenderlab)
library(tidyr)


## 2. Define recommender lab helper functions

Helper functions for the RMarkdown notebook are created

• eval – Gives details of RMSE, MSE and MAE of ML algorithm
• evalRecomMethods – Evaluates different recommender methods and plot the ROC and Precision-Recall curves
# This function returns the error for the chosen algorithm and also predicts the estimates
# for the given data
eval <- function(data, train1, k1,given1,goodRating1,recomType1="UBCF"){
set.seed(2022)
e<- evaluationScheme(data,
method = "split",
train = train1,
k = k1,
given = given1,
goodRating = goodRating1)

r1 <- Recommender(getData(e, "train"), recomType1)
print(r1)

p1 <- predict(r1, getData(e, "known"), type="ratings")
print(p1)

error = calcPredictionAccuracy(p1, getData(e, "unknown"))

print(error)
p2 <- predict(r1, data, type="ratingMatrix")
p2
}
# This function will evaluate the different recommender algorithms and plot the AUC and ROC curves
evalRecomMethods <- function(data,k1,given1,goodRating1){
set.seed(2022)
e<- evaluationScheme(data,
method = "cross",
k = k1,
given = given1,
goodRating = goodRating1)

models_to_evaluate <- list(
IBCF Cosinus = list(name = "IBCF",
param = list(method = "cosine")),
IBCF Pearson = list(name = "IBCF",
param = list(method = "pearson")),
UBCF Cosinus = list(name = "UBCF",
param = list(method = "cosine")),
UBCF Pearson = list(name = "UBCF",
param = list(method = "pearson")),
Zufälliger Vorschlag = list(name = "RANDOM", param=NULL)
)

n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results <- evaluate(x = e,
method = models_to_evaluate,
n = n_recommendations)
plot(list_results, annotate=c(1,3), legend="bottomright")
plot(list_results, "prec/rec", annotate=3, legend="topleft")
}


## 3. Batsman performance estimation

The section below regenerates the performance for batsmen based on incomplete data for the different fields in the data frame namely balls faced, fours, sixes, strike rate, times out. The recommender lab allows one to test several different algorithms all at once namely

1. User based – Cosine similarity method, Pearson similarity
2. Item based – Cosine similarity method, Pearson similarity
3. Popular
4. Random
5. SVD and a few others

## 3a. Batting dataframe

head(df)

##   batsman1         bowler1 ballsFaced totalRuns fours sixes  SR timesOut
## 1 A Badoni        A Mishra          0         0     0     0 NaN        0
## 2 A Badoni        A Nortje          0         0     0     0 NaN        0
## 3 A Badoni         A Zampa          0         0     0     0 NaN        0
## 4 A Badoni     Abdul Samad          0         0     0     0 NaN        0
## 5 A Badoni Abhishek Sharma          0         0     0     0 NaN        0
## 6 A Badoni      AD Russell          0         0     0     0 NaN        0


## 3b Data set and data preparation

For this analysis the data from Cricsheet has been processed using my R package yorkr to obtain the following 2 data sets – batsmenVsBowler – This dataset will contain the performance of the batsmen against the bowler and will capture a) ballsFaced b) totalRuns c) Fours d) Sixes e) SR f) timesOut – bowlerVsBatsmen – This data set will contain the performance of the bowler against the difference batsmen and will include a) deliveries b) runsConceded c) EconomyRate d) wicketsTaken

Obviously many rows/columns will be empty

This is a large data set and hence I have filtered for the period > Jan 2020 and < Dec 2022 which gives 2 datasets a) batsmanVsBowler20_22.rdata b) bowlerVsBatsman20_22.rdata

I also have 2 other datasets of all batsmen and bowlers in these 2 dataset in the files c) all-batsmen20_22.rds d) all-bowlers20_22.rds

You can download the data and this RMarkdown notebook from Github at PlayerPerformanceEstimation

Feel free to download and analyze the data and use any recommendation engine you choose

## 3c. Exploratory analysis

Initially an exploratory analysis is done on the data

df3 <- select(df, batsman1,bowler1,timesOut)
df6 <- xtabs(timesOut ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
print(df8[1:10,1:10])

##                 A Mishra A Nortje A Zampa Abdul Samad Abhishek Sharma
## A Badoni              NA       NA      NA          NA              NA
## A Manohar             NA       NA      NA          NA              NA
## A Nortje              NA       NA      NA          NA              NA
## AB de Villiers        NA        4       3          NA              NA
## Abdul Samad           NA       NA      NA          NA              NA
## Abhishek Sharma       NA       NA      NA          NA              NA
## AD Russell             1       NA      NA          NA              NA
## AF Milne              NA       NA      NA          NA              NA
## AJ Finch              NA       NA      NA          NA               3
## AJ Tye                NA       NA      NA          NA              NA
##                 AD Russell AF Milne AJ Tye AK Markram Akash Deep
## A Badoni                NA       NA     NA         NA         NA
## A Manohar               NA       NA     NA         NA         NA
## A Nortje                NA       NA     NA         NA         NA
## AB de Villiers           3       NA      3         NA         NA
## Abdul Samad             NA       NA     NA         NA         NA
## Abhishek Sharma         NA       NA     NA         NA         NA
## AD Russell              NA       NA      6         NA         NA
## AF Milne                NA       NA     NA         NA         NA
## AJ Finch                NA       NA     NA         NA         NA
## AJ Tye                  NA       NA     NA         NA         NA


The dots below represent data for which there is no performance data. These cells need to be estimated by the algorithm

set.seed(2022)
r <- as(df8,"realRatingMatrix")
getRatingMatrix(r)[1:15,1:15]

## 15 x 15 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 15 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## A Badoni         . . . . . . . . . . . . . . .
## A Manohar        . . . . . . . . . . . . . . .
## A Nortje         . . . . . . . . . . . . . . .
## AB de Villiers   . 4 3 . . 3 . 3 . . . 4 3 . .
## Abdul Samad      . . . . . . . . . . . . . . .
## Abhishek Sharma  . . . . . . . . . . . 1 . . .
## AD Russell       1 . . . . . . 6 . . . 3 3 3 .
## AF Milne         . . . . . . . . . . . . . . .
## AJ Finch         . . . . 3 . . . . . . 1 . . .
## AJ Tye           . . . . . . . . . . . 1 . . .
## AK Markram       . . . 3 . . . . . . . . . . .
## AM Rahane        9 . . . . 3 . 3 . . . 3 3 . .
## Anmolpreet Singh . . . . . . . . . . . . . . .
## Anuj Rawat       . . . . . . . . . . . . . . .
## AR Patel         . . . . . . . 1 . . . . . . .

r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:15,1:15]

## 15 x 15 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 15 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers  . 4 3 . . 3 . 3 . . . 4 3 . .
## Abdul Samad     . . . . . . . . . . . . . . .
## Abhishek Sharma . . . . . . . . . . . 1 . . .
## AD Russell      1 . . . . . . 6 . . . 3 3 3 .
## AJ Finch        . . . . 3 . . . . . . 1 . . .
## AM Rahane       9 . . . . 3 . 3 . . . 3 3 . .
## AR Patel        . . . . . . . 1 . . . . . . .
## AT Rayudu       2 . . . . . 1 . . . . 3 . . .
## B Kumar         3 . 3 . . . . . . . . . . 3 .
## BA Stokes       . . . . . . 3 4 . . . 3 . . .
## CA Lynn         . . . . . . . 9 . . . 3 . . .
## CH Gayle        . . . . . 6 . 3 . . . 6 . . .
## CH Morris       . 3 . . . . . . . . . 3 . . .
## D Padikkal      . 4 . . . 3 . . . . . . 3 . .
## DA Miller       . . . . . 3 . . . . . 3 . . .

# Get the summary of the data
summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1.000   3.000   3.000   3.463   4.000  21.000

# Normalize the data
r0_m <- normalize(r0)
getRatingMatrix(r0_m)[1:15,1:15]

## 15 x 15 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 15 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers   .         -0.7857143 -1.7857143 .  .       -1.7857143
## Abdul Samad      .          .          .         .  .        .
## Abhishek Sharma  .          .          .         .  .        .
## AD Russell      -2.6562500  .          .         .  .        .
## AJ Finch         .          .          .         . -0.03125  .
## AM Rahane        4.6041667  .          .         .  .       -1.3958333
## AR Patel         .          .          .         .  .        .
## AT Rayudu       -2.1363636  .          .         .  .        .
## B Kumar          0.3636364  .          0.3636364 .  .        .
## BA Stokes        .          .          .         .  .        .
## CA Lynn          .          .          .         .  .        .
## CH Gayle         .          .          .         .  .        1.5476190
## CH Morris        .          0.3500000  .         .  .        .
## D Padikkal       .          0.6250000  .         .  .       -0.3750000
## DA Miller        .          .          .         .  .       -0.7037037
##
## AB de Villiers   .         -1.7857143 . . . -0.7857143 -1.785714  .         .
## Abdul Samad      .          .         . . .  .          .         .         .
## Abhishek Sharma  .          .         . . . -1.6000000  .         .         .
## AD Russell       .          2.3437500 . . . -0.6562500 -0.656250 -0.6562500 .
## AJ Finch         .          .         . . . -2.0312500  .         .         .
## AM Rahane        .         -1.3958333 . . . -1.3958333 -1.395833  .         .
## AR Patel         .         -2.3333333 . . .  .          .         .         .
## AT Rayudu       -3.1363636  .         . . . -1.1363636  .         .         .
## B Kumar          .          .         . . .  .          .         0.3636364 .
## BA Stokes       -0.6086957  0.3913043 . . . -0.6086957  .         .         .
## CA Lynn          .          5.3200000 . . . -0.6800000  .         .         .
## CH Gayle         .         -1.4523810 . . .  1.5476190  .         .         .
## CH Morris        .          .         . . .  0.3500000  .         .         .
## D Padikkal       .          .         . . .  .         -0.375000  .         .
## DA Miller        .          .         . . . -0.7037037  .         .         .


## 4. Create a visual representation of the rating data before and after the normalization

The histograms show the bias in the data is removed after normalization

r0=r[(m=rowCounts(r) > 10),]
getRatingMatrix(r0)[1:15,1:10]

## 15 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers  . 4 3 . . 3 . 3 . .
## Abdul Samad     . . . . . . . . . .
## Abhishek Sharma . . . . . . . . . .
## AD Russell      1 . . . . . . 6 . .
## AJ Finch        . . . . 3 . . . . .
## AM Rahane       9 . . . . 3 . 3 . .
## AR Patel        . . . . . . . 1 . .
## AT Rayudu       2 . . . . . 1 . . .
## B Kumar         3 . 3 . . . . . . .
## BA Stokes       . . . . . . 3 4 . .
## CA Lynn         . . . . . . . 9 . .
## CH Gayle        . . . . . 6 . 3 . .
## CH Morris       . 3 . . . . . . . .
## D Padikkal      . 4 . . . 3 . . . .
## DA Miller       . . . . . 3 . . . .

#Plot ratings
image(r0, main = "Raw Ratings")

#Plot normalized ratings
r0_m <- normalize(r0)
getRatingMatrix(r0_m)[1:15,1:15]

## 15 x 15 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 15 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers   .         -0.7857143 -1.7857143 .  .       -1.7857143
## Abdul Samad      .          .          .         .  .        .
## Abhishek Sharma  .          .          .         .  .        .
## AD Russell      -2.6562500  .          .         .  .        .
## AJ Finch         .          .          .         . -0.03125  .
## AM Rahane        4.6041667  .          .         .  .       -1.3958333
## AR Patel         .          .          .         .  .        .
## AT Rayudu       -2.1363636  .          .         .  .        .
## B Kumar          0.3636364  .          0.3636364 .  .        .
## BA Stokes        .          .          .         .  .        .
## CA Lynn          .          .          .         .  .        .
## CH Gayle         .          .          .         .  .        1.5476190
## CH Morris        .          0.3500000  .         .  .        .
## D Padikkal       .          0.6250000  .         .  .       -0.3750000
## DA Miller        .          .          .         .  .       -0.7037037
##
## AB de Villiers   .         -1.7857143 . . . -0.7857143 -1.785714  .         .
## Abdul Samad      .          .         . . .  .          .         .         .
## Abhishek Sharma  .          .         . . . -1.6000000  .         .         .
## AD Russell       .          2.3437500 . . . -0.6562500 -0.656250 -0.6562500 .
## AJ Finch         .          .         . . . -2.0312500  .         .         .
## AM Rahane        .         -1.3958333 . . . -1.3958333 -1.395833  .         .
## AR Patel         .         -2.3333333 . . .  .          .         .         .
## AT Rayudu       -3.1363636  .         . . . -1.1363636  .         .         .
## B Kumar          .          .         . . .  .          .         0.3636364 .
## BA Stokes       -0.6086957  0.3913043 . . . -0.6086957  .         .         .
## CA Lynn          .          5.3200000 . . . -0.6800000  .         .         .
## CH Gayle         .         -1.4523810 . . .  1.5476190  .         .         .
## CH Morris        .          .         . . .  0.3500000  .         .         .
## D Padikkal       .          .         . . .  .         -0.375000  .         .
## DA Miller        .          .         . . . -0.7037037  .         .         .

image(r0_m, main = "Normalized Ratings")

set.seed(1234)
hist(getRatings(r0), breaks=25)

hist(getRatings(r0_m), breaks=25)


## 4a. Data for analysis

The data frame of the batsman vs bowlers from the period 2020 -2022 is read as a dataframe. To remove rows with very low number of ratings(timesOut, SR, Fours, Sixes etc), the rows are filtered so that there are at least more 10 values in the row. For the player estimation the dataframe is converted into a wide-format as a matrix (m x n) of batsman x bowler with each of the columns of the dataframe i.e. timesOut, SR, fours or sixes. These different matrices can be considered as a rating matrix for estimation.

A similar approach is taken for estimating bowler performance. Here a wide form matrix (m x n) of bowler x batsman is created for each of the columns of deliveries, runsConceded, ER, wicketsTaken

## 5. Batsman’s times Out

The code below estimates the number of times the batsmen would lose his/her wicket to the bowler. As discussed in the algorithm above, the recommendation engine will make an initial estimate features for the bowler and an initial estimate for the parameter vector for the batsmen. Then using gradient descent the recommender engine will determine the feature and parameter values such that the over Mean Squared Error is minimum

From the plot for the different algorithms it can be seen that UBCF performs the best. However the AUC & ROC curves are not optimal and the AUC> 0.5

df3 <- select(df, batsman1,bowler1,timesOut)
df6 <- xtabs(timesOut ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
# Filter only rows where the row count is > 10
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers  . 4 3 . . 3 . 3 . .
## Abdul Samad     . . . . . . . . . .
## Abhishek Sharma . . . . . . . . . .
## AD Russell      1 . . . . . . 6 . .
## AJ Finch        . . . . 3 . . . . .
## AM Rahane       9 . . . . 3 . 3 . .
## AR Patel        . . . . . . . 1 . .
## AT Rayudu       2 . . . . . 1 . . .
## B Kumar         3 . 3 . . . . . . .
## BA Stokes       . . . . . . 3 4 . .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1.000   3.000   3.000   3.463   4.000  21.000

# Evaluate the different plotting methods
evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

#Evaluate the error
a=eval(r0[1:dim(r0)[1]],0.8,k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 70 users.
## 18 x 145 rating matrix of class 'realRatingMatrix' with 1755 ratings.
##     RMSE      MSE      MAE
## 2.069027 4.280872 1.496388

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
m=as(c,"data.frame")
names(m) =c("batsman","bowler","TimesOut")


## 6. Batsman’s Strike rate

This section deals with the Strike rate of batsmen versus bowlers and estimates the values for those where the data is incomplete using UBCF method.

Even here all the algorithms do not perform too efficiently. I did try out a few variations but could not lower the error (suggestions welcome!!)

df3 <- select(df, batsman1,bowler1,SR)
df6 <- xtabs(SR ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers   96.8254 171.4286  33.33333  . 66.66667 223.07692   .
## Abdul Samad       .      228.0000   .        .  .       100.00000   .
## Abhishek Sharma 150.0000   .        .        .  .        66.66667   .
## AD Russell      111.4286   .        .        .  .         .         .
## AJ Finch        250.0000 116.6667   .        . 50.00000  85.71429 112.5000
## AJ Tye            .        .        .        .  .         .       100.0000
## AK Markram        .        .        .       50  .         .         .
## AM Rahane       121.1111   .        .        .  .       113.82979 117.9487
## AR Patel        183.3333   .      200.00000  .  .       433.33333   .
## AT Rayudu       126.5432 200.0000 122.22222  .  .       105.55556   .
##
## AB de Villiers  109.52381 .   .
## Abdul Samad       .       .   .
## Abhishek Sharma   .       .   .
## AD Russell      195.45455 .   .
## AJ Finch          .       .   .
## AJ Tye            .       .   .
## AK Markram        .       .   .
## AM Rahane        33.33333 . 200
## AR Patel        171.42857 .   .
## AT Rayudu       204.76190 .   .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   5.882  85.714 116.667 128.529 160.606 600.000

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

a=eval(r0[1:dim(r0)[1]],0.8, k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 105 users.
## 27 x 145 rating matrix of class 'realRatingMatrix' with 3220 ratings.
##       RMSE        MSE        MAE
##   77.71979 6040.36508   58.58484

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
n=as(c,"data.frame")
names(n) =c("batsman","bowler","SR")


## 7. Batsman’s Sixes

The snippet of code estimes the sixes of the batsman against bowlers. The ROC and AUC curve for UBCF looks a lot better here, as it significantly greater than 0.5

df3 <- select(df, batsman1,bowler1,sixes)
df6 <- xtabs(sixes ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers  3 3 . . . 18 .  3 . .
## AD Russell      3 . . . .  . . 12 . .
## AJ Finch        2 . . . .  . .  . . .
## AM Rahane       7 . . . .  3 1  . . .
## AR Patel        4 . 3 . .  6 .  1 . .
## AT Rayudu       5 2 . . .  . .  1 . .
## BA Stokes       . . . . .  . .  . . .
## CA Lynn         . . . . .  . .  9 . .
## CH Gayle       17 . . . . 17 .  . . .
## CH Morris       . . 3 . .  . .  . . .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    3.00    3.00    4.68    6.00   33.00

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

## Timing stopped at: 0.003 0 0.002

a=eval(r0[1:dim(r0)[1]],0.8, k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 52 users.
## 14 x 145 rating matrix of class 'realRatingMatrix' with 1634 ratings.
##      RMSE       MSE       MAE
##  3.529922 12.460350  2.532122

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
o=as(c,"data.frame")
names(o) =c("batsman","bowler","Sixes")


## 8. Batsman’s Fours

The code below estimates 4s for the batsmen

df3 <- select(df, batsman1,bowler1,fours)
df6 <- xtabs(fours ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## AB de Villiers   . 1 . . . 24 . 3 . .
## Abhishek Sharma  . . . . .  . . . . .
## AD Russell       1 . . . .  . . 9 . .
## AJ Finch         . 1 . . .  3 2 . . .
## AK Markram       . . . . .  . . . . .
## AM Rahane       11 . . . .  8 7 . . 3
## AR Patel         . . . . .  . . 3 . .
## AT Rayudu       11 2 3 . .  6 . 6 . .
## BA Stokes        1 . . . .  . . . . .
## CA Lynn          . . . . .  . . 6 . .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1.000   3.000   4.000   6.339   9.000  55.000

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

## Timing stopped at: 0.008 0 0.008

## Warning in .local(x, method, ...):
##   Recommender 'UBCF Pearson' has failed and has been removed from the results!

a=eval(r0[1:dim(r0)[1]],0.8, k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 67 users.
## 17 x 145 rating matrix of class 'realRatingMatrix' with 2083 ratings.
##      RMSE       MSE       MAE
##  5.486661 30.103447  4.060990

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
p=as(c,"data.frame")
names(p) =c("batsman","bowler","Fours")


## 9. Batsman’s Total Runs

The code below estimates the total runs that would have scored by the batsman against different bowlers

df3 <- select(df, batsman1,bowler1,totalRuns)
df6 <- xtabs(totalRuns ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## A Badoni         .  . . . .   . .   . . .
## A Manohar        .  . . . .   . .   . . .
## A Nortje         .  . . . .   . .   . . .
## AB de Villiers  61 36 3 . 6 261 .  69 . .
## Abdul Samad      . 57 . . .  12 .   . . .
## Abhishek Sharma  3  . . . .   6 .   . . .
## AD Russell      39  . . . .   . . 129 . .
## AF Milne         .  . . . .   . .   . . .
## AJ Finch        15  7 . . 3  18 9   . . .
## AJ Tye           .  . . . .   . 4   . . .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    9.00   24.00   41.36   54.00  452.00

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given1=7,goodRating1=median(getRatings(r0)))

a=eval(r0[1:dim(r0)[1]],0.8, k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 105 users.
## 27 x 145 rating matrix of class 'realRatingMatrix' with 3256 ratings.
##       RMSE        MSE        MAE
##   41.50985 1723.06788   29.52958

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
q=as(c,"data.frame")
names(q) =c("batsman","bowler","TotalRuns")


## 10. Batsman’s Balls Faced

The snippet estimates the balls faced by batsmen versus bowlers

df3 <- select(df, batsman1,bowler1,ballsFaced)
df6 <- xtabs(ballsFaced ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Mishra', 'A Nortje', 'A Zampa' ... ]]

##
## A Badoni         .  . . . .   . .  . . .
## A Manohar        .  . . . .   . .  . . .
## A Nortje         .  . . . .   . .  . . .
## AB de Villiers  63 21 9 . 9 117 . 63 . .
## Abdul Samad      . 25 . . .  12 .  . . .
## Abhishek Sharma  2  . . . .   9 .  . . .
## AD Russell      35  . . . .   . . 66 . .
## AF Milne         .  . . . .   . .  . . .
## AJ Finch         6  6 . . 6  21 8  . . .
## AJ Tye           .  . . . .   9 4  . . .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    9.00   18.00   30.21   39.00  384.00

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

a=eval(r0[1:dim(r0)[1]],0.8, k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 112 users.
## 28 x 145 rating matrix of class 'realRatingMatrix' with 3378 ratings.
##       RMSE        MSE        MAE
##   33.91251 1150.05835   23.39439

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
r=as(c,"data.frame")
names(r) =c("batsman","bowler","BallsFaced")


## 11. Generate the Batsmen Performance Estimate

This code generates the estimated dataframe with known and ‘predicted’ values

a1=merge(m,n,by=c("batsman","bowler"))
a2=merge(a1,o,by=c("batsman","bowler"))
a3=merge(a2,p,by=c("batsman","bowler"))
a4=merge(a3,q,by=c("batsman","bowler"))
a5=merge(a4,r,by=c("batsman","bowler"))
a6= select(a5, batsman,bowler,BallsFaced,TotalRuns,Fours, Sixes, SR,TimesOut)

##          batsman          bowler BallsFaced TotalRuns Fours Sixes  SR TimesOut
## 1 AB de Villiers        A Mishra         94       124     7     5 144        5
## 2 AB de Villiers        A Nortje         26        42     4     3 148        3
## 3 AB de Villiers         A Zampa         28        42     5     7 106        4
## 4 AB de Villiers Abhishek Sharma         22        28     0    10 136        5
## 5 AB de Villiers      AD Russell         70       135    14    12 207        4
## 6 AB de Villiers        AF Milne         31        45     6     6 130        3


## 12. Bowler analysis

Just like the batsman performance estimation we can consider the bowler’s performances also for estimation. Consider the following table

As in the batsman analysis, for every batsman a set of features like (“strong backfoot player”, “360 degree player”,“Power hitter”) can be estimated with a set of initial values. Also every bowler will have an associated parameter vector θθ. Different bowlers will have performance data for different set of batsmen. Based on the initial estimate of the features and the parameters, gradient descent can be used to minimize actual values {for e.g. wicketsTaken(ratings)}.

load("recom_data/bowlerVsBatsman20_22.rdata")


## 12a. Bowler dataframe

Inspecting the bowler dataframe

head(df2)

##    bowler1        batsman1 balls runsConceded       ER wicketTaken
## 1 A Mishra        A Badoni     0            0 0.000000           0
## 2 A Mishra       A Manohar     0            0 0.000000           0
## 3 A Mishra        A Nortje     0            0 0.000000           0
## 4 A Mishra  AB de Villiers    63           61 5.809524           0
## 5 A Mishra     Abdul Samad     0            0 0.000000           0
## 6 A Mishra Abhishek Sharma     2            3 9.000000           0

names(df2)

## [1] "bowler1"      "batsman1"     "balls"        "runsConceded" "ER"
## [6] "wicketTaken"


## 13. Balls bowled by bowler

The below section estimates the balls bowled for each bowler. We can see that UBCF Pearson and UBCF Cosine both perform well

df3 <- select(df2, bowler1,batsman1,balls)
df6 <- xtabs(balls ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Badoni', 'A Manohar', 'A Nortje' ... ]]

##
## A Mishra        . . .  63  .  2 35 .  6 .
## A Nortje        . . .  21 25  .  . .  6 .
## A Zampa         . . .   9  .  .  . .  . .
## Abhishek Sharma . . .   9  .  .  . .  6 .
## AD Russell      . . . 117 12  9  . . 21 9
## AF Milne        . . .   .  .  .  . .  8 4
## AJ Tye          . . .  63  .  . 66 .  . .
## Akash Deep      . . .   .  .  .  . .  . .
## AR Patel        . . . 188  5  1 84 . 29 5
## Arshdeep Singh  . . .   6  6 24 18 . 12 .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    9.00   18.00   29.61   36.00  384.00

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

a=eval(r0[1:dim(r0)[1]],0.8,k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 96 users.
## 24 x 195 rating matrix of class 'realRatingMatrix' with 3954 ratings.
##      RMSE       MSE       MAE
##  30.72284 943.89294  19.89204

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
s=as(c,"data.frame")
names(s) =c("bowler","batsman","BallsBowled")


## 14. Runs conceded by bowler

This section estimates the runs conceded by the bowler. The UBCF Cosinus algorithm performs the best with TPR increasing fastewr than FPR

df3 <- select(df2, bowler1,batsman1,runsConceded)
df6 <- xtabs(runsConceded ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Badoni', 'A Manohar', 'A Nortje' ... ]]

##
## A Mishra        . . .  61  .  3  41 . 15  .
## A Nortje        . . .  36 57  .   . .  8  .
## A Zampa         . . .   3  .  .   . .  .  .
## Abhishek Sharma . . .   6  .  .   . .  3  .
## AD Russell      . . . 276 12  6   . . 21  .
## AF Milne        . . .   .  .  .   . . 10  4
## AJ Tye          . . .  69  .  . 138 .  .  .
## Akash Deep      . . .   .  .  .   . .  .  .
## AR Patel        . . . 205  5  . 165 . 33 13
## Arshdeep Singh  . . .  18  3 51  51 .  6  .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    9.00   24.00   41.34   54.00  458.00

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

## Timing stopped at: 0.004 0 0.004

## Warning in .local(x, method, ...):
##   Recommender 'UBCF Pearson' has failed and has been removed from the results!

a=eval(r0[1:dim(r0)[1]],0.8,k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 95 users.
## 24 x 195 rating matrix of class 'realRatingMatrix' with 3820 ratings.
##       RMSE        MSE        MAE
##   43.16674 1863.36749   30.32709

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
t=as(c,"data.frame")
names(t) =c("bowler","batsman","RunsConceded")


## 15. Economy Rate of the bowler

This section computes the economy rate of the bowler. The performance is not all that good

df3 <- select(df2, bowler1,batsman1,ER)
df6 <- xtabs(ER ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Badoni', 'A Manohar', 'A Nortje' ... ]]

##
## A Mishra        . . .  5.809524  .     9.00  7.028571 . 15.000000  .
## A Nortje        . . . 10.285714 13.68  .     .        .  8.000000  .
## A Zampa         . . .  2.000000  .     .     .        .  .         .
## Abhishek Sharma . . .  4.000000  .     .     .        .  3.000000  .
## AD Russell      . . . 14.153846  6.00  4.00  .        .  6.000000  .
## AF Milne        . . .  .         .     .     .        .  7.500000  6.0
## AJ Tye          . . .  6.571429  .     .    12.545455 .  .         .
## Akash Deep      . . .  .         .     .     .        .  .         .
## AR Patel        . . .  6.542553  6.00  .    11.785714 .  6.827586 15.6
## Arshdeep Singh  . . . 18.000000  3.00 12.75 17.000000 .  3.000000  .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##  0.3529  5.2500  7.1126  7.8139  9.8000 36.0000

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

## Timing stopped at: 0.003 0 0.004

## Warning in .local(x, method, ...):
##   Recommender 'UBCF Pearson' has failed and has been removed from the results!

a=eval(r0[1:dim(r0)[1]],0.8,k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 95 users.
## 24 x 195 rating matrix of class 'realRatingMatrix' with 3839 ratings.
##      RMSE       MSE       MAE
##  4.380680 19.190356  3.316556

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
u=as(c,"data.frame")
names(u) =c("bowler","batsman","EconomyRate")


## 16. Wickets Taken by bowler

The code below computes the wickets taken by the bowler versus different batsmen

df3 <- select(df2, bowler1,batsman1,wicketTaken)
df6 <- xtabs(wicketTaken ~ ., df3)
df7 <- as.data.frame.matrix(df6)
df8 <- data.matrix(df7)
df8[df8 == 0] <- NA
r <- as(df8,"realRatingMatrix")
r0=r[(rowCounts(r) > 10),]
getRatingMatrix(r0)[1:10,1:10]

## 10 x 10 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 10 column names 'A Badoni', 'A Manohar', 'A Nortje' ... ]]

##
## A Mishra       . . . . . . 1 . . .
## A Nortje       . . . 4 . . . . . .
## A Zampa        . . . 3 . . . . . .
## AD Russell     . . . 3 . . . . . .
## AJ Tye         . . . 3 . . 6 . . .
## AR Patel       . . . 4 . 1 3 . 1 1
## Arshdeep Singh . . . 3 . . 3 . . .
## AS Rajpoot     . . . . . . 3 . . .
## Avesh Khan     . . . . . . 1 . 3 .
## B Kumar        . . . 9 . . 3 . 1 .

summary(getRatings(r0))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1.000   3.000   3.000   3.423   3.000  21.000

evalRecomMethods(r0[1:dim(r0)[1]],k1=5,given=7,goodRating1=median(getRatings(r0)))

## Timing stopped at: 0.003 0 0.003

## Warning in .local(x, method, ...):
##   Recommender 'UBCF Pearson' has failed and has been removed from the results!

a=eval(r0[1:dim(r0)[1]],0.8,k1=5,given1=7,goodRating1=median(getRatings(r0)),"UBCF")

## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 64 users.
## 16 x 195 rating matrix of class 'realRatingMatrix' with 1908 ratings.
##     RMSE      MSE      MAE
## 2.672677 7.143203 1.956934

b=round(as(a,"matrix")[1:10,1:10])
c <- as(b,"realRatingMatrix")
v=as(c,"data.frame")
names(v) =c("bowler","batsman","WicketTaken")


## 17. Generate the Bowler Performance estmiate

The entire dataframe is regenerated with known and ‘predicted’ values

r1=merge(s,t,by=c("bowler","batsman"))
r2=merge(r1,u,by=c("bowler","batsman"))
r3=merge(r2,v,by=c("bowler","batsman"))
r4= select(r3,bowler, batsman, BallsBowled,RunsConceded,EconomyRate, WicketTaken)

##     bowler         batsman BallsBowled RunsConceded EconomyRate WicketTaken
## 1 A Mishra  AB de Villiers         102          144           8           4
## 2 A Mishra     Abdul Samad          13           20           7           4
## 3 A Mishra Abhishek Sharma          14           26           8           2
## 4 A Mishra      AD Russell          47           85           9           3
## 5 A Mishra        AJ Finch          45           61          11           4
## 6 A Mishra          AJ Tye          14           20           5           4


## 18. Conclusion

This post showed an approach for performing the Batsmen Performance Estimate & Bowler Performance Estimate. The performance of the recommender engine could have been better. In any case, I think this approach will work for player estimation provided the recommender algorithm is able to achieve a high degree of accuracy. This will be a good way to estimate as the algorithm will be able to determine features and nuances of batsmen and bowlers which cannot be captured by data.

## Also see

To see all posts click Index of posts

# Close encounters with the future

Published in Telecom Asia, Oct 22,2013 – Close encounters with the future

Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons.—POPULAR MECHANICS, 1949

Introduction: Ray Kurzweil in his non-fiction book “The Singularity is near – When humans transcend biology” predicts that by the year 2045 the Singularity will allow humans to transcend our ‘frail biological bodies’ and our ‘petty, derivative and circumscribed brains’ . Specifically the book claims “that there will be a ‘technological singularity’ in the year 2045, a point where progress is so rapid it outstrips humans’ ability to comprehend it. Irreversibly transformed, people will augment their minds and bodies with genetic alterations, nanotechnology, and artificial intelligence”.

He believes that advances in robotics, AI, nanotechnology and genetics will grow exponentially and will lead us into a future realm of intelligence that will far exceed biological intelligence. This explosion will be the result of ‘accelerating returns from significant advances in technology”

Futurescape

Here is a look at some of the more fascinating key trends in technology. You can decide whether we are heading to Singularity or not.

Autonomous Vehicles (AVs): Self driving cars have moved from the realm of science fiction to reality in recent times. Google’s autonomous cars has already driven around half a million miles. All the major car manufacturers of the world from BMW, Mercedes, Toyota, Nissan, Ford or GM are all coming with their own versions of autonomous cars. These cars are equipped with Adaptive Cruise Control and Collision Avoidance technologies and are already taking away control drivers. Moreover AVs alert drivers, if their attention strays from the road ahead, for too long. Autonomous Vehicles work with the help of Vehicular Communication Technology.

Vehicular Communication along with the Intelligent Transport Systems (ITS) achieves safety by enabling communication between vehicles, people and roads. Vehicle-to-vehicle communications are the fundamental building block of autonomous, self-driving cars. It enables the exchange of data between vehicles and allows automobiles to “see” and adapt to driving obstacles more completely, preventing accidents besides resulting in more efficient driving.

Smart Assistants: From the defeat of Kasparov in chess by IBM’s Deep Blue in 1997, and then subsequently to  the resounding victory of IBM’s Watson in Jeopardy, capable of understanding natural human language, to the more prevalent Apple’s intelligent assistant Siri, Artificially Intelligent  (AI) systems have come a long way. The newest trend in this area is Smart Assistants.  Robots are currently analyzing documents, filling prescriptions, and handling other tasks that were once exclusively done by humans. Smart Assistants are already taking over the tasks of BPO operators, paralegals, store clerks, baby sitters. Robots, in many ways, are not only smarter than humans, but also do not get easily bored,

Intelligent homes and intelligent offices. Rapid advances in technology will be closer to the home both literally and figuratively. The future home will have the ability to detect the presence of people, pets, smoke and changes to humidity, moisture, lighting, temperature. Smart devices will monitor the environment and take appropriate steps to save energy, improve safety and enhance security of homes.  Devices will start learning your habits and enhance your comfort and convenience. Everything from thermostats, fire detectors, washing machines, refrigerators will be equipped electronics that will be capable of adapting to the environment. All gadgets at home will be accessible through laptops, tablets or smartphones from anywhere. We will be able to monitor all aspects of our intelligent home from anywhere.

Smart devices will also make major inroads into offices leading to the birth of intelligent offices where the lighting, heating, cooling will be based on the presence of people in the offices. This will result in an enormous savings in energy. The advances in intelligent homes and intelligent offices will be in the greater context of the Smart Grid.

Swarms of drones: Contrary to the use of weaponized drones for unmanned aerial survey of enemy territory we will soon have commercial drones. Drone will start being used for civilian purposes.  The most compelling aspect of drones these days is the fact that they can be easily manufactured in large quantities, are cheap and can perform complex tasks either singly or collectively. Remotely controlled drones can perform hundreds of civilian jobs, including traffic monitoring, aerial surveying, and oil pipeline inspections and monitoring of crop conditions. Drones are also being employed for conservation of wildlife. In the wilderness of Africa, drones are already helping in providing aerial footage of the landscape, tracking poachers and in also herding elephants. However, before drones become a common sight, it is necessary to ensure that appropriate laws are made for maintaining the safety and security of civilians. This is likely to happen in US in 2015, when the Federal Aviation Administration (FAA) will come up with rules to safely integrate drones into the American skies.

MOOC (Massive Online Open Course): The concept of MOOC, or the ‘Massive Open Online Course’ from top colleges, though just a few years old, is already taking the world by storm. Coursera, edX and Udacity are the top 3 MOOCs besides many others and offer a variety of courses on technology, philosophy, sociology, computer science etc.  As more courses are available online, the requirements of having a uniform start and end date will diminish gradually. The availability of course lectures at all times and through all devices, namely the laptop, tablet or smartphone, will result in large scale adoption by students of all ages.

Contrary to regimented classes MOOCs now allow students to take classes at their own pace. It is likely that some students will breeze through an entire semester worth of classes in a few weeks. It is also likely that a few students will graduate in 4 years with more than a couple of degrees. MOOCs are a natural development considering that the world is going to be more knowledge driven where there will be the need for experts with a diverse set of in-depth skills. Here is an interesting article in WSJ “What College will be like in 2023

3D Printing: This is another technology that is bound to become ubiquitous in our future. 3D printers will revolutionize manufacturing in ways we could never imagine. A 3-D printer is similar to a hot-glue gun attached to a robotic arm. A 3-D printer creates an object by stacking one layer of material, typically plastic or metal, on top of another.  3D printers have been used for making everything from prosthetic limbs, phone cases, lamps all the way to a NASA funded 3D pizza. Here is a great article in New York Times “Dinner is Printed” It is likely that a 3D printer would be indispensable to our future homes much like the refrigerator and microwave.

Artificial sense organs: A recent news items in Science 2.0 “The Future touch sensitive prosthetic limbs”   discusses the invention of a prosthetic limb that can actually provide the sense of touch by stimulating the regions of the brain that deal with the sense of touch. The researchers identified the neural activity that occurs when grasping or feeling an object and successfully induced these patterns in the brain. Two parallel efforts are underway to understand how the human brain works. They are “The Human Brain Project” which has 130 members of the European Union and Obama’s BRAIN project. Both these projects attempt to ‘to give us a deeper and more meaningful understanding of how the human brain operates”. Possibilities as in the movies ‘Avatar’ or ‘Terminator’ may not be far away.

The Others: Besides the above, technologies like Big Data, Cloud Computing, Semantic Web, Internet of Things and Smart Grid will also be swamp us in the future and much has already been said about it.

Conclusion: The above sets of technologies represent seismic shifts and are bound to explode in our future in a million ways.

Given the advances in bionic limbs, Machine Intelligent AI systems, MOOCs, Autonomous Vehicles are we on target for the Singularity?

I wouldn’t be surprised at all!

# Singularity

Pete Mettle felt drowsy. He had been working for days on his new inference algorithm. Pete had been in the field of Artificial Intelligence (AI) for close to 3 decades and had established himself as the father of “semantics”. He was particularly renowned for his 3 principles of Artificial Intelligence. He had postulated the Principles of Learning as

The Principle of Knowledge Acquisition: This principle laid out the guidelines for knowledge acquisition by an algorithm. It clearly laid out the rules of what was knowledge and what was not. It could clearly delineate between the wheat and chaff from any textbook or research article.

The Principle of Knowledge Assimilation: This law gave the process for organizing the acquired knowledge in facts, rules and underlying principles. Knowledge assimilation involved storing the individual rules, the relation between the rules and provided the basis for drawing conclusions from them

The Principle of Knowledge Application: This principle according to Pete was the most important. It showed how all knowledge acquired and assimilated could be used to draw inferences andconclusions. In fact it also showed how knowledge could be extrapolated to make safe conclusions.

Zengine The above 3 principles of Pete were hailed as a major landmark in AI. Pete started to work on an inference engine known as “Zengine” based on his above 3 principles. Pete was almost finished fine tuning his algorithm. Pete wanted to test his Zengine on the World Wide Web. The World Wide Web had grown into gigantic proportions. A report in May 2025 issue of Wall Street Journal mentioned that the total data that was held in the internet had crossed 400 zettabytes and that the daily data stored on the web was close to 20 terabytes. It was a well known fact that there an enormous amount of information on the web on a wide variety of topics. Wikis, blogs, articles, ideas, social networks and so on there was a lot of information on almost every conceivable topic under the sun.

Pete was given special permission by the governments of the world to run his Zengine on the internet. It was Pete’s theory that it would take the Zengine close to at least a year to process the information on the web and make any reasonable inferences from them. Accompanied by world wide publicity Zengine started its work of trying to assimilate the information on the World Wide Web. The Zengine was programmed to periodically give a status update of its progress to Pete.

A few months passed. Zengine kept giving updates on the number of sites, periodicals, blogs it had condensed into its knowledge database. After about 10 months Pete received a mail. It read “Markets will crash on March 2026. Petrol prices will sky rocket – Zengine. Pete was surprised at the forecast. So he invoked the API to check on what basis the claim had been made. To his surprise and amazement he found that a lot events happening in the world had been used to make that claim which clearly seemed to point in that direction. A couple of months down the line there was another terse statement “Rebellion very likely in Mogadishu in Dec 2027″. – Zengine.The Zengine also came with corollaries to Fermat’s last theorem. It was becoming clear to Pete and everybody that the Zengine was indeed becoming smarter by the day..It became apparent to everybody when Zengine would become more powerful than human beings.

Celestial events: Around this time peculiar events were observed all over the world. There were a lot of celestial events that were happening. Phenomenon like the aurora borealis became common place. On Dec 12, 2026 there was an unusual amount of electrical activity in the sky. Everywhere there were streaks of lightning. By evening time slivers of lightning hit the earth in several parts of the world. In fact if anybody had viewed the earth from outer space then it would have a resembled a “nebula sphere” with lightning streaks racing towards the earth in all directions. This seemed to happen for many days. Simultaneously the Zengine was getting more and more powerful. In fact it had learnt to spawn of multiple processes to get information and return to it.

Time-space discontinuity: People everywhere were petrified of this strange phenomenon. On the one hand there was the fear of the takeover of the web by the Zengine and on the other was this increased celestial activity. Finally on the morning of Jan 2028 there was a powerful crack followed by a sonic boom and everywhere people had a moment of discontinuity. In the briefest of moments there was a natural time-space discontinuity and mankind had progressed to the next stage in evolution.

The unconscious, sub conscious and the conscious all became a single faculty of super consciousness. It has always been known from the time of Plato that man knows everything there is to know. According to Platonic doctrine of Recollection, human beings are born with a soul possessing all knowledge, and learning is just discovering or recollecting what the soul already knows. Similarly according to Hindu philosophy, behind the individual consciousness of the Atman, is the reality known as the Brahman which is universal consciousness attained in a deep state of mysticism through self-inquiry.

However this evolution by some strange quirk of coincidence seemed to coincide with the development of the world’s first truly learning machine. In this super conscious state a learning machine was not something to be feared but something which could be used to benefit mankind. Just like cranes can lift and earthmovers perform tasks that are beyond our physical capacity so also a learning machine was a useful invention that could be used to harness the knowledge from mankind’s storehouse – the World Wide Web.