My book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

‘Cricket analytics with cricketr and cricpy – Analytics harmony with R and Python’ is now available on Amazon in both paperback ($21.99) and kindle ($9.99/Rs 449) versions. The book includes analysis of cricketers using both my R package ‘cricketr’ and my python package ‘cricpy’ for all formats of the game namely Test, ODI and T20. Both packages use data from ESPN Cricinfo Statsguru. The paperback is available on Amazon for $21.99 and the kindle version is available for $9.99/Rs 449

Pick up your copy today!

The book includes the following chapters

CONTENTS

Introduction 7
1. Cricket analytics with cricketr 9
1.1. Introducing cricketr! : An R package to analyze performances of cricketers 10
1.2. Taking cricketr for a spin – Part 1 48
1.2. cricketr digs the Ashes! 69
1.3. cricketr plays the ODIs! 97
1.4. cricketr adapts to the Twenty20 International! 139
1.5. Sixer – R package cricketr’s new Shiny avatar 168
1.6. Re-introducing cricketr! : An R package to analyze performances of cricketers 178
1.7. cricketr sizes up legendary All-rounders of yesteryear 233
1.8. cricketr flexes new muscles: The final analysis 277
1.9. The Clash of the Titans in Test and ODI cricket 300
1.10. Analyzing performances of cricketers using cricketr template 338
2. Cricket analytics with cricpy 352
2.1 Introducing cricpy:A python package to analyze performances of cricketers 353
2.2 Cricpy takes a swing at the ODIs 405
Analysis of Top 4 batsman 448
2.3 Cricpy takes guard for the Twenty20s 449
2.4 Analyzing batsmen and bowlers with cricpy template 490
9. Average runs against different opposing teams 493
3. Other cricket posts in R 500
3.1 Analyzing cricket’s batting legends – Through the mirage with R 500
3.2 Mirror, mirror … the best batsman of them all? 527
4. Appendix 541
Cricket analysis with Machine Learning using Octave 541
4.1 Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid 542
4.2 Informed choices through Machine Learning-2 Pitting together Kumble, Kapil, Chandra 555
Further reading 569
Important Links 570

Also see
1. My book “Deep Learning from first principles” now on Amazon
2. Practical Machine Learning with R and Python – Part 1
3. Revisiting World Bank data analysis with WDI and gVisMotionChart
4. Natural language processing: What would Shakespeare say?
5. Optimal Cloud Computing
6. Pitching yorkpy … short of good length to IPL – Part 1
7. Computer Vision: Ramblings on derivatives, histograms and contours

To see all posts click Index of posts

Revisiting World Bank data analysis with WDI and gVisMotionChart

Note: I had written a post about 3 years back on World Bank Data Analysis using World Development Indicators (WDI) & gVisMotionCharts. But the motion charts stopped working  some time ago. I have always been wanting to fix this and I now got to actually doing it. The issue was 2 of the WDI indicators had changed. After I fixed this I was able to host the generated motion chart using github.io pages. Please make sure that you enable flash player if you open the motion charts with Google Chrome. You may also have to enable flash if using Firefox, IE etc

Please check out the 2 motions charts with World Bank data

1. World Bank Chart 1
2. World Bank Chart 2

If you are using Chrome please enable (Allow)  ‘flash player’ by clicking on the lock sign in the URL as shown

 

 

 

 

 

 

Introduction

Recently I was surfing the web, when I came across a real cool post New R package to access World Bank data, by Markus Gesmann on using googleVis and motion charts with World Bank Data. The post also introduced me to Hans Rosling, Professor of Sweden’s Karolinska Institute. Hans Rosling, the creator of the famous Gapminder chart, the “Heath and Wealth of Nations” displays global trends through animated charts (A must see!!!). As they say, in Hans Rosling’s hands, data dances and sings. Take a look at  his Ted talks for e.g. Hans Rosling:New insights on poverty. Prof Rosling developed the breakthrough software behind the visualizations, in the Gapminder. The free software, which can be loaded with any data – was purchased by Google in March 2007.

In this post, I recreate some of the Gapminder charts with the help of R packages WDI and googleVis. The WDI  package of  Vincent Arel-Bundock, provides a set of really useful functions to get to data based on the World Bank Data indicators.  googleVis provides motion charts with which you can animate the data.

You can clone/download the code from Github at worldBankAnalysis which is in the form of an Rmd file.

library(WDI)
library(ggplot2)
library(googleVis)
library(plyr)

1.Get the data from 1960 to 2019 for the following

  1. Population – SP.POP.TOTL
  2. GDP in US $ – NY.GDP.MKTP.CD
  3. Life Expectancy at birth (Years) – SP.DYN.LE00.IN
  4. GDP Per capita income – NY.GDP.PCAP.PP.CD
  5. Fertility rate (Births per woman) – SP.DYN.TFRT.IN
  6. Poverty headcount ratio – SI.POV.NAHC
# World population total
population = WDI(indicator='SP.POP.TOTL', country="all",start=1960, end=2019)
# GDP in US $
gdp= WDI(indicator='NY.GDP.MKTP.CD', country="all",start=1960, end=2019)
# Life expectancy at birth (Years)
lifeExpectancy= WDI(indicator='SP.DYN.LE00.IN', country="all",start=1960, end=2019)
# GDP Per capita
income = WDI(indicator='NY.GDP.PCAP.PP.CD', country="all",start=1960, end=2019)
# Fertility rate (births per woman)
fertility = WDI(indicator='SP.DYN.TFRT.IN', country="all",start=1960, end=2019)
# Poverty head count
poverty= WDI(indicator='SI.POV.NAHC', country="all",start=1960, end=2019)

2.Rename the columns

names(population)[3]="Total population"
names(lifeExpectancy)[3]="Life Expectancy (Years)"
names(gdp)[3]="GDP (US$)"
names(income)[3]="GDP per capita income"
names(fertility)[3]="Fertility (Births per woman)"
names(poverty)[3]="Poverty headcount ratio"

3.Join the data frames

Join the individual data frames to one large wide data frame with all the indicators for the countries
j1 <- join(population, gdp)

j2 <- join(j1,lifeExpectancy)

j3 <- join(j2,income)

j4 <- join(j3,poverty)

wbData <- join(j4,fertility)

4.Use WDI_data

Use WDI_data to get the list of indicators and the countries. Join the countries and region

#This returns  list of 2 matrixes
wdi_data =WDI_data
# The 1st matrix is the list is the set of all World Bank Indicators
indicators=wdi_data[[1]]
# The 2nd  matrix gives the set of countries and regions
countries=wdi_data[[2]]
df = as.data.frame(countries)
aa <- df$region != "Aggregates"
# Remove the aggregates
countries_df <- df[aa,]
# Subset from the development data only those corresponding to the countries
bb = subset(wbData, country %in% countries_df$country)
cc = join(bb,countries_df)
dd = complete.cases(cc)
developmentDF = cc[dd,]

5.Create and display the motion chart

gg<- gvisMotionChart(cc,
                                idvar = "country",
                                timevar = "year",
                                xvar = "GDP",
                                yvar = "Life Expectancy",
                                sizevar ="Population",
                                colorvar = "region")
plot(gg)
cat(gg$html$chart, file="chart1.html")

Note: Unfortunately it is not possible to embed the motion chart in WordPress. It is has to hosted on a server as a Webpage. After exploring several possibilities I came up with the following process to display the animation graph. The plot is saved as a html file using ‘cat’ as shown above. The WorldBank_chart1.html page is then hosted as a Github page (gh-page) on Github.

Here is the ggvisMotionChart

Do give  World Bank Motion Chart1  a spin.  Here is how the Motion Chart has to be used

untitled

You can select Life Expectancy, Population, Fertility etc by clicking the black arrows. The blue arrow shows the ‘play’ button to set animate the motion chart. You can also select the countries and change the size of the circles. Do give it a try. Here are some quick analysis by playing around with the motion charts with different parameters chosen

The set of charts below are screenshots captured by running the motion chart World Bank Motion Chart1

a. Life Expectancy vs Fertility chart

This chart is used by Hans Rosling in his Ted talk. The left chart shows low life expectancy and high fertility rate for several sub Saharan and East Asia Pacific countries in the early 1960’s. Today the fertility has dropped and the life expectancy has increased overall. However the sub Saharan countries still have a high fertility rate

pic1

b. Population vs GDP

The chart below shows that GDP of India and China have the same GDP from 1973-1994 with US and Japan well ahead.

pic2

From 1998- 2014 China really pulls away from India and Japan as seen below

pic3

c. Per capita income vs Life Expectancy

In the 1990’s the per capita income and life expectancy of the sub -saharan countries are low (42-50). Japan and US have a good life expectancy in 1990’s. In 2014 the per capita income of the sub-saharan countries are still low though the life expectancy has marginally improved.

pic4

d. Population vs Poverty headcount

pic5

In the early 1990’s China had a higher poverty head count ratio than India. By 2004 China had this all figured out and the poverty head count ratio drops significantly. This can also be seen in the chart below.

pop_pov3

In the chart above China shows a drastic reduction in poverty headcount ratio vs India. Strangely Zambia shows an increase in the poverty head count ratio.

6.Get the data for the 2nd set of indicators

  1. Total population  – SP.POP.TOTL
  2. GDP in US$ – NY.GDP.MKTP.CD
  3. Access to electricity (% population) – EG.ELC.ACCS.ZS
  4. Electricity consumption KWh per capita -EG.USE.ELEC.KH.PC
  5. CO2 emissions -EN.ATM.CO2E.KT
  6. Basic Sanitation Access – SH.STA.BASS.ZS
# World population
population = WDI(indicator='SP.POP.TOTL', country="all",start=1960, end=2016)
# GDP in US $
gdp= WDI(indicator='NY.GDP.MKTP.CD', country="all",start=1960, end=2016)
# Access to electricity (% population)
elecAccess= WDI(indicator='EG.ELC.ACCS.ZS', country="all",start=1960, end=2016)
# Electric power consumption Kwh per capita
elecConsumption= WDI(indicator='EG.USE.ELEC.KH.PC', country="all",start=1960, end=2016)
#CO2 emissions
co2Emissions= WDI(indicator='EN.ATM.CO2E.KT', country="all",start=1960, end=2016)
# Access to sanitation (% population)
sanitationAccess= WDI(indicator='SH.STA.ACSN', country="all",start=1960, end=2016)

7.Rename the columns

names(population)[3]="Total population"
names(gdp)[3]="GDP US($)"
names(elecAccess)[3]="Access to Electricity (% popn)"
names(elecConsumption)[3]="Electric power consumption (KWH per capita)"
names(co2Emissions)[3]="CO2 emisions"
names(sanitationAccess)[3]="Access to sanitation(% popn)"

8.Join the individual data frames

Join the individual data frames to one large wide data frame with all the indicators for the countries


j1 <- join(population, gdp)
j2 <- join(j1,elecAccess)
j3 <- join(j2,elecConsumption)
j4 <- join(j3,co2Emissions)
wbData1 <- join(j3,sanitationAccess)

 

9.Use WDI_data

Use WDI_data to get the list of indicators and the countries. Join the countries and region

#This returns  list of 2 matrixes
wdi_data =WDI_data
# The 1st matrix is the list is the set of all World Bank Indicators
indicators=wdi_data[[1]]
# The 2nd  matrix gives the set of countries and regions
countries=wdi_data[[2]]
df = as.data.frame(countries)
aa <- df$region != "Aggregates"
# Remove the aggregates
countries_df <- df[aa,]
# Subset from the development data only those corresponding to the countries
ee = subset(wbData1, country %in% countries_df$country)
ff = join(ee,countries_df)
## Joining by: iso2c, country

10.Create and display the motion chart

gg1<- gvisMotionChart(ff,
                                idvar = "country",
                                timevar = "year",
                                xvar = "GDP",
                                yvar = "Access to Electricity",
                                sizevar ="Population",
                                colorvar = "region")
plot(gg1)
cat(gg1$html$chart, file="chart2.html")

This is World Bank Motion Chart2  which has a different set of parameters like Access to Energy, CO2 emissions etc

The set of charts below are screenshots of the motion chart World Bank Motion Chart 2

a. Access to Electricity vs Population
pic6The above chart shows that in China 100% population have access to electricity. India has made decent progress from 50% in 1990 to 79% in 2012. However Pakistan seems to have been much better in providing access to electricity. Pakistan moved from 59% to close 98% access to electricity

b. Power consumption vs population

powercon

The above chart shows the Power consumption vs Population. China and India have proportionally much lower consumption that Norway, US, Canada

c. CO2 emissions vs Population

pic7

In 1963 the CO2 emissions were fairly low and about comparable for all countries. US, India have shown a steady increase while China shows a steep increase. Interestingly UK shows a drop in CO2 emissions

d.  Access to sanitation
san

India shows an improvement but it has a long way to go with only 40% of population with access to sanitation. China has made much better strides with 80% having access to sanitation in 2015. Strangely Nigeria shows a drop in sanitation by almost about 20% of population.

The code is available at Github at worldBankAnalysis

Conclusion: So there you have it. I have shown some screenshots of some sample parameters of the World indicators. Please try to play around with World Bank Motion Chart1 & World Bank Motion Chart 2  with your own set of parameters and countries.  You can also create your own motion chart from the 100s of WDI indicators avaialable at  World Bank Data indicator.

Also see
1. My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
2.  Dabbling with Wiener filter using OpenCV
3. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
4. Design Principles of Scalable, Distributed Systems
5. Re-introducing cricketr! : An R package to analyze performances of cricketers
6. Natural language processing: What would Shakespeare say?
7. Brewing a potion with Bluemix, PostgreSQL, Node.js in the cloud
8. Simulating an Edge Shape in Android

To see all posts Index of posts

The Clash of the Titans in Test and ODI cricket

Who looks outside, dreams; who looks inside, awakes.
Show me a sane man and I will cure him for you.

            Carl Jung 

 

We’re made of star stuff. We are a way for the cosmos to know itself.
If you want to make an apple pie from scratch, you must first create the universe.

            Carl Sagan

Introduction

The biggest nag in the collective psyche of cricketing fraternity these days, is whether Virat Kohli has surpassed Sachin Tendulkar. This question has been troubling cricket lovers the world over and particularly in India, for quite a while. This nagging question has only grown stronger with Kohli’s 41st ODI century and with Michael Vaughan bestowing the GOAT title to Virat Kohli for ODI cricket. Hence, I decided to do my bit in addressing this, by doing analysis of Kohli’s and Tendulkar’s performance in ODI cricket. I also wanted to address the the best among the cricketing idols of India in Test cricket, namely Sunil Gavaskar, Sachin Tendulkar and Virat Kohli. Hence this post has 2 parts

  1. Analysis of Tendulkar, Gavaskar and Kohli in Test cricket
  2. Analysis of Tendulkar and Kohli in ODIs

In this post, I analyze the performances of these titans in Test and ODI cricket using my R package cricketr. While some may feel that comparisons are not possible as these batsmen are from different eras. To some extent this is true. I would give some leeway to Gavaskar as he had to bat in a pre-helmet era. But with Tendulkar and Kohli a fair and objective comparison is possible. There were pre-eminient bowlers in the times of Tendulkar as there are now.

From the analysis below, it can be seen that Tendulkar is ahead  of everybody else in Test cricket. However it must be noted that Tendulkar’s performance deteriorated towards the end of his career. Such was not the case with Gavaskar. Kohli has some catching up to do and he still has a lot of Test cricket in him.

In ODI Kohli can be seen to pulling ahead of Tendulkar in several aspects.

My R package cricketr can be installed directly from CRAN and you can use it analyze cricketers.

This package uses the statistics info available in ESPN Cricinfo Statsguru. The current version of this package supports all formats of the game including Test, ODI and Twenty20 versions.

You should be able to install the package from GitHub and use the many functions available in the package. Please mindful of the ESPN Cricinfo Terms of Use

Take a look at my short video tutorial on my R package cricketr on Youtube – R package cricketr – A short tutorial

Do check out my interactive Shiny app implementation using the cricketr package – Sixer – R package cricketr’s new Shiny avatar

Note 1: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton cricketr templatefrom Github (which is the R Markdown file I have used for the analysis below).

Note 2: I sprinkle the charts with my observations. Feel free to look at them more closely and come to your conclusions.

If you are passionate about cricket, and love analyzing cricket performances, then check out my racy book on cricket ‘Cricket analytics with cricketr and cricpy – Analytics harmony with R & Python’! This book discusses and shows how to use my R package ‘cricketr’ and my Python package ‘cricpy’ to analyze batsmen and bowlers in all formats of the game (Test, ODI and T20). The paperback is available on Amazon at $21.99 and  the kindle version at $9.99/Rs 449/-. A must read for any cricket lover! Check it out!!

Untitled

Important note: Do check out the python avatar of cricketr, ‘cricpy’ in my post Introducing cricpy:A python package to analyze performances of cricketers

1 Load the cricketr package

if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)

A Test cricket  – Analysis of Gavaskar, Tendulkar and Kohli

2. Get player data

tendulkar <- getPlayerData(35320,dir=".",file="tendulkar.csv",type="batting")
kohli <- getPlayerData(253802,dir=".",file="kohli.csv",type="batting")
gavaskar <- getPlayerData(28794,dir=".",file="gavaskar.csv",type="batting")

3a. Basic analyses for Tendulkar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkar.csv","Tendulkar")
batsmanMeanStrikeRate("./tendulkar.csv","Tendulkar")
batsmanRunsRanges("./tendulkar.csv","Tendulkar")
dev.off()

3b Basic analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohli.csv","Kohli")
batsmanMeanStrikeRate("./kohli.csv","Kohli")
batsmanRunsRanges("./kohli.csv","Kohli")
dev.off()

3c Basic analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./gavaskar.csv","Gavaskar")
batsmanMeanStrikeRate("./gavaskar.csv","Gavaskar")
batsmanRunsRanges("./gavaskar.csv","Gavaskar")
dev.off()

4a.More analyses for Tendulkar

It can be seen that Tendulkar and Gavaskar has been bowled more often than Kohli. Also Kohli does not have as many sixes in Test cricket as Tendulkar and Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./tendulkar.csv","Tendulkar")
batsman6s("./tendulkar.csv","Tendulkar")
batsmanDismissals("./tendulkar.csv","Tendulkar")
dev.off()

4b. More analyses for Kohli

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanDismissals("./kohli.csv","Kohli")
dev.off()

4c More analyses for Gavaskar

par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./gavaskar.csv","Gavaskar")
batsman6s("./gavaskar.csv","Gavaskar")
batsmanDismissals("./gavaskar.csv","Gavaskar")
dev.off()

5 Performance of batsmen on different grounds

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkar.csv","Tendulkar")
batsmanAvgRunsGround("./kohli.csv","Kohli")
batsmanAvgRunsGround("./gavaskar.csv","Gavaskar")

a

#dev.off()

6. Performance if batsmen against different Opposition

  1. Tendulkar averages 50 against the following countries – Australia, Bangladesh, England, Sri Lanka, West Indies and Zimbabwe
  2. Kohli average almost 50 against all the nations he has played – Australia, Bangladesh, England, New Zealand, Sri Lanka and West Indies
  3. Gavaskar averages 50 against Australia, Pakistan, West Indies, Sri Lanka
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkar.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohli.csv","Kohli")
batsmanAvgRunsOpposition("./gavaskar.csv","Gavaskar")

7. Get player data special

This is required for the next 2 function calls

tendulkarsp <- getPlayerDataSp(35320,tdir=".",tfile="tendulkarsp.csv",ttype="batting")
kohlisp <- getPlayerDataSp(253802,tdir=".",tfile="kohlisp.csv",ttype="batting")
gavaskarsp <- getPlayerDataSp(28794,tdir=".",tfile="gavaskarsp.csv",ttype="batting")

#dev.off()

8 Get contribution of batsmen in matches won and lost

Kohli contribution has had an equal contribution in won and lost matches. Tendulkar’s runs seem to have not helped in winning as much as only 50% of matches he has played have been won

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanContributionWonLost("tendulkarsp.csv","Tendulkar")
batsmanContributionWonLost("./kohlisp.csv","Kohli")
batsmanContributionWonLost("./gavaskarsp.csv","Gavaskar")
  

a

9 Performance of batsmen at home and overseas

The boxplots show that Kohli performs better overseas than at home. The 3rd quartile is higher, though the median seems to lower overseas. For Tendulkar the performance is similar both ways. Gavaskar’s median runs scored overseas is higher.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))


batsmanPerfHomeAway("tendulkarsp.csv","Tendulkar")
batsmanPerfHomeAway("./kohlisp.csv","Kohli")
batsmanPerfHomeAway("./gavaskarsp.csv","Gavaskar")

10. Moving average of runs

Gavaskar’s moving average was very good at the time of his retirement. Kohli seems to be going very strong. Tendulkar’s performance shows signs of deterioration around the time of his retirement.

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))

batsmanMovingAverage("./tendulkar.csv","Tendulkar")
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./gavaskar.csv","Gavaskar")

#dev.off()

11 Boxplot and histogram of runs

Kohli has a marginally higher average (50.69) than Tendulkar (48.65) while Gavaskar 46. The median runs are same for Tendulkar and Kohli at 32

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfBoxHist("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfBoxHist("./kohli.csv","Kohli")
batsmanPerfBoxHist("./gavaskar.csv","Gavaskar")

12 Cumulative average Runs for batsmen

Looking at the cumulative average runs we can see a gradual drop in the cumulative average for Tendulkar while Kohli and Gavaskar’s performance seems to be getting better

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkar.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohli.csv","Kohli")
batsmanCumulativeAverageRuns("./gavaskar.csv","Gavaskar")

13. Cumulative average strike rate of batsmen

Tendulkar’s strike rate is better than Kohli and Gavaskar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkar.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohli.csv","Kohli")
batsmanCumulativeStrikeRate("./gavaskar.csv","Gavaskar")

14 Performance forecast of batsmen

The forecasted performance for Kohli and Gavaskar is higher than that of Tendulkar

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkar.csv","Sachin Tendulkar")
batsmanPerfForecast("./kohli.csv","Kohli")
batsmanPerfForecast("./gavaskar.csv","Gavaskar")

#dev.off()

15. Relative strike rate of batsmen

par(mar=c(4,4,2,2))

frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanSR(frames,names)
#dev.off()

16. Relative Runs frequency of batsmen

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeRunsFreqPerf(frames,names)
#dev.off()

17. Relative cumulative average runs of batsmen

Tendulkar leads the way here, but it can be seem Kohli catching up.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

18. Relative cumulative average strike rate

Tendulkar has better strike rate than the other two.

par(mar=c(4,4,2,2))
frames <- list("./tendulkar.csv","./kohli.csv","gavaskar.csv")
names <- list("Tendulkar","Kohli","Gavaskar")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

19. Check batsman in form

As in the moving average and performance forecast and cumulative average runs, Kohli and Gavaskar are in-form while Tendulkar was out-of-form towards the end.

checkBatsmanInForm("./tendulkar.csv","Sachin Tendulkar")
## [1] "**************************** Form status of Sachin Tendulkar ****************************
\n\n Population size: 294  Mean of population: 50.48 \n Sample size: 33  Mean of sample: 32.42 SD of 
sample: 29.8 \n\n Null hypothesis H0 : Sachin Tendulkar 's sample average is within 95% confidence interval 
of population average\n Alternative hypothesis Ha : Sachin Tendulkar 's sample average is below 
the 95% confidence interval of population average\n\n 
Sachin Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 117
  Mean of population: 50.35 \n Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n Null 
hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n 
Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population
 average\n\n Kohli 's Form Status: In-Form because the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./gavaskar.csv","Gavaskar")
## [1] "**************************** Form status of Gavaskar ****************************\n\n 
Population size: 125  Mean of population: 44.67 \n Sample size: 14  Mean of sample: 57.86 SD of sample:
 58.55 \n\n Null hypothesis H0 : Gavaskar 's sample average is within 95% confidence interval of population
 average\n Alternative hypothesis Ha : Gavaskar 's sample average is below the 95% confidence interval of 
population average\n\n Gavaskar 's Form Status: In-Form because the p value: 0.793276  is greater 
than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

20. Performance 3D

A 3D regression plane is fitted between the the Balls faced, Minutes at crease and Runs scored

par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
battingPerf3d("./tendulkar.csv","Sachin Tendulkar")
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./gavaskar.csv","Gavaskar")
#dev.off()

20. Runs likelihood

This functions computes the K-Means and determines the runs the batsmen are likely to score.

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkar.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 16.51 % likelihood that Tendulkar  will make  139 Runs in  251 balls over 353  Minutes 
## There is a 25.08 % likelihood that Tendulkar  will make  66 Runs in  122 balls over  167  Minutes 
## There is a 58.41 % likelihood that Tendulkar  will make  16 Runs in  31 balls over 44  Minutes
batsmanRunsLikelihood("./kohli.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 20 % likelihood that Kohli  will make  143 Runs in  232 balls over 330  Minutes 
## There is a 33.85 % likelihood that Kohli  will make  51 Runs in  92 balls over  127  Minutes 
## There is a 46.15 % likelihood that Kohli  will make  11 Runs in  24 balls over 31  Minutes
batsmanRunsLikelihood("./gavaskar.csv","Gavaskar")
## Summary of  Gavaskar 's runs scoring likelihood
## **************************************************
## 
## There is a 33.81 % likelihood that Gavaskar  will make  69 Runs in  159 balls over 214  Minutes 
## There is a 8.63 % likelihood that Gavaskar  will make  172 Runs in  364 balls over  506  Minutes 
## There is a 57.55 % likelihood that Gavaskar  will make  13 Runs in  35 balls over 48  Minutes

21. Predict runs for a random combination of Balls faced and runs scored

BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
tendulkar <- batsmanRunsPredict("./tendulkar.csv","Tendulkar",newdataframe=newDF)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
gavaskar <- batsmanRunsPredict("./gavaskar.csv","Gavaskar",newdataframe=newDF)
batsmen <-cbind(round(tendulkar$Runs),round(kohli$Runs),round(gavaskar$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli","Gavaskar")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli Gavaskar
## 1          10           30         7     6        4
## 2          38           71        23    24       17
## 3          66          111        39    42       30
## 4          94          152        54    60       43
## 5         121          193        70    78       56
## 6         149          234        86    96       69
## 7         177          274       102   114       82
## 8         205          315       118   132       95
## 9         233          356       134   150      108
## 10        261          396       150   168      121
## 11        289          437       165   186      134
## 12        316          478       181   204      147
## 13        344          519       197   222      160
## 14        372          559       213   240      173
## 15        400          600       229   258      186
#dev.off()

Key findings

  1. Kohli has a marginally higher average than Tendulkar
  2. Tendulkar has the best strike rate of all the 3.
  3. The cumulative average runs and the performance forecast for Kohli and Gavaskar show an improving trend, while Tendulkar’s numbers deteriorate towards the end of his career
  4. Kohli is fast catching up Tendulkar on cumulative average runs vs innings in career.

B ODI Cricket – Analysis of Tendulkar and Kohli

The functions below get the ODI data for Tendulkar and Kohli as CSV files so that the analyses can be done

22 Get player data for ODIs

tendulkarOD <- getPlayerDataOD(35320,dir=".",file="tendulkarOD.csv",type="batting")
kohliOD <- getPlayerDataOD(253802,dir=".",file="kohliOD.csv",type="batting")

#dev.off()

23a Basic performance of Tendulkar in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./tendulkarOD.csv","Tendulkar")
batsmanRunsRanges("./tendulkarOD.csv","Tendulkar")
batsman4s("./tendulkarOD.csv","Tendulkar")
batsman6s("./tendulkarOD.csv","Tendulkar")
batsmanScoringRateODTT("./tendulkarOD.csv","Tendulkar")
#dev.off()

23b. Basic performance of Kohli in ODI

par(mfrow=c(3,2))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("./kohliOD.csv","Kohli")
batsmanRunsRanges("./kohliOD.csv","Kohli")
batsman4s("./kohliOD.csv","Kohli")
batsman6s("./kohliOD.csv","Kohli")
batsmanScoringRateODTT("./kohliOD.csv","Kohli")
#dev.off()

24. Performance forecast in ODIs

Kohli’s forecasted runs are much higher than Tendulkar’s in ODIs

par(mar=c(4,4,2,2))
batsmanPerfForecast("./tendulkarOD.csv","Tendulkar")
batsmanPerfForecast("./kohliOD.csv","Kohli")

25. Batting performance

A 3D regression plane is fitted between Balls faced, Minutes at crease and Runs scored.

par(mar=c(4,4,2,2))
battingPerf3d("./tendulkarOD.csv","Tendulkar")
battingPerf3d("./kohliOD.csv","Kohli")

26. Predicting runs scored for the ODI batsmen

Kohli will score runs than Tendulkar for the same minutes at crease and balls faced.

BF <- seq( 10, 200,length=10)
Mins <- seq(30,220,length=10)
newDF <- data.frame(BF,Mins)
tendulkarDF <- batsmanRunsPredict("./tendulkarOD.csv","Tendulkar",newdataframe=newDF)
kohliDF <- batsmanRunsPredict("./kohliOD.csv","Kohli",newdataframe=newDF)
batsmen <-cbind(round(tendulkarDF$Runs),round(kohliDF$Runs))
colnames(batsmen) <- c("Tendulkar","Kohli")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Tendulkar Kohli
## 1          10           30         7     8
## 2          31           51        26    28
## 3          52           72        45    48
## 4          73           93        64    68
## 5          94          114        83    88
## 6         116          136       102   108
## 7         137          157       121   128
## 8         158          178       140   149
## 9         179          199       159   169
## 10        200          220       178   189

27. Runs likelihood for the ODI batsmen

Tendulkar has clusters around 13, 53 and 111 runs while Kohli has clusters around 13, 63,116. So it more likely that Kohli will tend to score higher

par(mar=c(4,4,2,2))
batsmanRunsLikelihood("./tendulkarOD.csv","Tendulkar")
## Summary of  Tendulkar 's runs scoring likelihood
## **************************************************
## 
## There is a 18.09 % likelihood that Tendulkar  will make  111 Runs in  118 balls over 172  Minutes 
## There is a 28.39 % likelihood that Tendulkar  will make  53 Runs in  63 balls over  95  Minutes 
## There is a 53.52 % likelihood that Tendulkar  will make  13 Runs in  18 balls over 27  Minutes
batsmanRunsLikelihood("./kohliOD.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 31.41 % likelihood that Kohli  will make  63 Runs in  69 balls over 97  Minutes 
## There is a 49.74 % likelihood that Kohli  will make  13 Runs in  18 balls over  24  Minutes 
## There is a 18.85 % likelihood that Kohli  will make  116 Runs in  113 balls over 163  Minutes

28. Runs in different venues for the ODI batsmen

par(mar=c(4,4,2,2))
batsmanAvgRunsGround("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsGround("./kohliOD.csv","Kohli")

28. Runs against different opposition for the ODI batsmen

Tendulkar’s has 50+ average against Bermuda, Kenya and Namibia. While Kohli has a 50+ average against New Zealand, West Indies, South Africa, Zimbabwe and Bangladesh

par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("./tendulkarOD.csv","Tendulkar")
batsmanAvgRunsOpposition("./kohliOD.csv","Kohli")

29. Moving average of runs for the ODI batsmen

Tendulkar’s moving average shows an improvement (50+) towards the end of his career, but Kohli shows a marked increase 60+ currently

par(mar=c(4,4,2,2))
batsmanMovingAverage("./tendulkarOD.csv","Tendulkar")
batsmanMovingAverage("./kohliOD.csv","Kohli")

30. Cumulative average runs of ODI batsmen

Tendulkar plateaus at 40+ while Kohli’s cumulative average runs goes up and up!!!

par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeAverageRuns("./kohliOD.csv","Kohli")

31 Cumulative strike rate of ODI batsmen

par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("./tendulkarOD.csv","Tendulkar")
batsmanCumulativeStrikeRate("./kohliOD.csv","Kohli")

32. Relative batsmen strike rate

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanSRODTT(frames,names)
#dev.off()

33. Relative Run Frequency percentages

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeRunsFreqPerfODTT(frames,names)
#dev.off()

34. Relative cumulative average runs of ODI batsmen

Kohli breaks away from Tendulkar in cumulative average runs after 100 innings

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeAvgRuns(frames,names)
#dev.off()

35. Relative cumulative strike rate of ODI batsmen

This seems to be tussle with Kohli having an edge till about 40 innings and then from 40+ to 180 innings Tendulkar leads. Kohli just seems to be edging forward.

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
relativeBatsmanCumulativeStrikeRate(frames,names)
#dev.off()

36. Batsmen 4s and 6s

par(mar=c(4,4,2,2))

frames <- list("./tendulkarOD.csv","./kohliOD.csv")
names <- list("Tendulkar","Kohli")
batsman4s6s(frames,names)
##                Tendulkar Kohli
## Runs(1s,2s,3s)     66.29 69.67
## 4s                 29.65 25.90
## 6s                  4.06  4.43
#dev.off()

37. Check ODI batsmen form

par(mar=c(4,4,2,2))

checkBatsmanInForm("./tendulkar.csv","Tendulkar")
## [1] "**************************** Form status of Tendulkar ********
********************\n\n Population size: 294  Mean of population: 50.48 \n
 Sample size: 33  Mean of sample: 32.42 SD of sample: 29.8 \n\n 
Null hypothesis H0 : Tendulkar 's sample average is within 95% confidence
 interval of population average\n Alternative hypothesis 
Ha : Tendulkar 's sample average is below the 95% confidence interval 
of population average\n\n Tendulkar 's Form Status: Out-of-Form because the p value: 0.000713  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("./kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ***********
*****************\n\n Population size: 117  Mean of population: 50.35 \n
 Sample size: 13  Mean of sample: 53.77 SD of sample: 46.15 \n\n 
Null hypothesis H0 : Kohli 's sample average is within 95% confidence 
interval of population average\n Alternative hypothesis 
Ha : Kohli 's sample average is below the 95% confidence interval 
of population average\n\n Kohli 's Form Status: In-Form because 
the p value: 0.603244  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
#dev.off()

Key Findings

  1. Kohli has a better performance against oppositions like West Indies, South Africa and New Zealand
  2. Kohli breaks away from Tendulkar in cumulative average runs
  3. Tendulkar has been leading the strike rate rate but Kohli in recent times seems to be breaking loose.

Check out some other players with my R package cricketr

Important note: Do check out my other posts using cricketr at cricketr-posts

Also see

  1. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
  2. A primer on Qubits, Quantum gates and Quantum Operations
  3. De-blurring revisited with Wiener filter using OpenCV
  4. Deep Learning from first principles in Python, R and Octave – Part 4
  5. The Many Faces of Latency
  6. Fun simulation of a Chain in Android
  7. Presentation on Wireless Technologies – Part 1
  8. yorkr crashes the IPL party ! – Part 1

To see all posts click Index of posts

Analyzing T20 matches with yorkpy templates

1. Introduction

In this post I create yorkpy templates for end-to-end analysis of any T20 matches that are available on Cricsheet as yaml format. These templates can be used to analyze Intl. T20, IPL, BBL and Natwest T20. In fact they can be used for any T20 games which have been saved in the yaml format as specified by Cricsheet Cricheet.

Noteyorkpy is the clone of my R package yorkr see yorkr pads up for the Twenty20s: Part 1- Analyzing team”s match performance

With these templates you can convert all T20 match data which is in yaml format to Pandas dataframes and save them as CSV. Note The data for Intl T20, IPL, BBL and Natwest T20 have already been converted and are available at allYorkpyData. This templates is also available at Github at yorkpyTemplate. The template includes the following steps

  1. Template for conversion and setup
  2. Analysis of Any T20 match
  3. Analysis of a T20 team in all matches against another T20 team
  4. Analysis of a T20 team in all matches against all other teams
  5. Analysis of T20 batsmen and bowlers

You can recreate the files as more matches are added to Cricsheet site in IPL 2017 and future seasons. This post contains all the steps needed for detailed analysis of IPL matches, teams and IPL player. This will also be my reference in future if I decide to analyze IPL in future!

Install yorkpy with pip install yorkpy

Data conversion of the yaml files have to be done before any analysis of T20 batsmen, bowlers, any T20 match matches between any 2 T20 team or analysis of a teams performance against all other team can be done

The first step is To convert the YAML files that are available for the different T20 leagues namely Intl. T20, IPL, BBL, Natwest T20 which are available in yaml format in Cricsheet. For initial data setup we need to use slighly different functions for each of the T20 leagues since the teams are different. The function to convert yaml to Pandas dataframe and save as CSV is common for all leagues

A. For International T20

import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")

# Save all matches between any 2 Intl T20 countries
#yka.saveAllMatchesBetween2IntlT20s(dir1)

#Save all matches between an Intl.T20 country and all other countries
#yka.saveAllMatchesAllOppositionIntlT20(dir1)

# Get batting details for a country
#yka.getTeamBattingDetails(<country>,dir=dir1, save=True)

#Get bowling details
#yka.getTeamBowlingDetails(<country>,dir=dir1, save=True)

B. For Indian Premier League (IPL)

import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")

# Save all matches between any 2 IPL teams
#yka.saveAllMatchesBetween2IPLTeams(dir1)

#Save all matches between an IPL team and all other teams
#yka.saveAllMatchesAllOppositionIPLT20(dir1)

# Get batting details for an IPL team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)

#Get bowling details for an IPL team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)

C. For Big Bash League (BBL)

import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")

# Save all matches between any 2 BBL teams
#yka.saveAllMatchesBetween2BBLTeams(dir1)

#Save all matches between an BBL team and all other teams
#yka.saveAllMatchesAllOppositionBBLT20(dir1)

# Get batting details for an BBL team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)

#Get bowling details for an BBL team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)

D For Natwest T20

import yorkpy.analytics as yka
# COnvert yaml to pandas and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")

# Save all matches between any 2 NWB teams
#yka.saveAllMatchesBetween2NWBTeams(dir1)

#Save all matches between an NWB team and all other teams
#yka.saveAllMatchesAllOppositionNWBT20(dir1)

# Get batting details for an NWB team
#yka.getTeamBattingDetails(<team1>,dir=dir1, save=True)

#Get bowling details for an NWB team
#yka.getTeamBowlingDetails(<team1>>,dir=dir1, save=True)

Once the conversion has been done and the data has been setup we can use any of the yorkpy functions for the the 4 leagues (Intl. T20, IPL, BBL or Natwest T20) There are four classes of functions. These functions can be used for any of the

  1. Class 1 – Functions that analyze a single T20 match
  2. Class 2 – Functions that analyze the performance of a T20 team in all matches against another T20 team
  3. Class 3 – Functions that analyze the performance of a T20 team against all other teams
  4. Class 4 – Functions that analyze individual T20 batsmen or bowler

2. Class 1 functions

These functions analyze a single T20 match (Intl T20, BBL, IPL or Natwest T20) To see actual usage of Class 1 function see Pitching yorkpy … short of good length to IPL – Part 1

import yorkpy.analytics as yka
# Get scorecard
#scorecard,extras=yka.teamBattingScorecardMatch(<team1>,"Name of Team")

#Get partnership
#match=pd.read_csv("<match.csv>")
#yka.teamBatsmenPartnershipMatch(match,<team1>,<team2>,plot=True/False)

#Batsmen vs bowler
#match=pd.read_csv("<match.csv>")
#yka.teamBatsmenVsBowlersMatch(match,<team1>,<team2>,plot=True/False)

#Bowling scorecard
#match=pd.read_csv("<match.csv>")
#a=yka.teamBowlingScorecardMatch(match,<team1>)

#Wicket Kind
#match=pd.read_csv("<match.csv>")
#yka.teamBowlingWicketKindMatch((match,<team1>,<team2>)

#Wicket Match
#match=pd.read_csv("<match.csv>")
#yka.teamBowlingWicketMatch(match,<team1>,<team2>,plot=True/False)

#Bowler vs Batsman
#match=pd.read_csv("<match.csv>")
#yka.teamBowlersVsBatsmenMatch(match,<team1>,<team2>)

#Match worm chart
#match=pd.read_csv("<match.csv>")
#yka.matchWormChart(match,<team1>,<team2>,)

3. Class 2 functions

These set of functions analyze the performance a T20 team for e.g. Intl T20, BBL or Natwest T20 in all matches against another T20 team (country or IPL, BBL or Natwest T20 team. To see usages of Class 2 functions see Pitching yorkpy…on the middle and outside off-stump to IPL – Part 2

import yorkpy.analytics as yka

# Batting partnerships - Table
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#m=yka.teamBatsmenPartnershiOppnAllMatches(team1_team2_matches,<team1/team2>,report="summary/detailed", top=<n>)

# Batting partnerships - Plot
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBatsmenPartnershipOppnAllMatchesChart(team1_team2_matches,<team1>,<team2> plot=<True/False>, top=<N>, partnershipRuns=<M>)

#Batsmen vs Bowlers
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBatsmenVsBowlersOppnAllMatches(team1_team2_matches,<team1>,<team2> plot=<True/False>, top=<N>,runsScored=<M>)

# Batting scorecard
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#scorecard=yka.teamBattingScorecardOppnAllMatches(team1_team2_matches,<team1>,<team2>)

#Bowling scorecard
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#scorecard=yka.teamBowlingScorecardOppnAllMatches(team1_team2_matches,<team1>,<team2>)

#Bowling wicket kind
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBowlingWicketKindOppositionAllMatches(team1_team2_matches,<team1>,<team2>,plot=<True/False>,top=<N>,wickets=<M>)

#Bowler vs batsman
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.teamBowlersVsBatsmenOppnAllMatches(team1_team2_matches,<team1>,<team2>,plot=<True/False>,top=<N>,runsConceded=<M>)

# Wins vs losses
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinLossBetweenTeams(team1_team2_matches,<team1>,<team2>)

#Wins by win type
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinsByRunOrWickets(team1_team2_matches,<team1>)

#Wins by toss decision
#team1_team2_matches = pd.read_csv(<matches_between_2_teams.csv)
#yka.plotWinsbyTossDecision(team1_team2_matches,<team1>,tossDecision=<field/bat>)

4. Class 3 functions

This set of functions deals with analyzing the performance of a T20 team (Intl. T20, IPL, BBL or Natwest T20) in all matches against all other teams. To see usages of Class 3 functions see Pitching yorkpy…swinging away from the leg stump to IPL – Part 3. After the data is save all matches between all oppositions we can use this data

import yorkpy.analytics as yka
#Batsman partnerships
#allmatches = pd.read_csv("<allmatchesForteam")
#m=yka.teamBatsmenPartnershiAllOppnAllMatches(allmatches,<team1>,report=<"summary"/"detailed", top=<N>,partnershipRuns=<M>)

#Batsmen vs Bowlers
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBatsmenVsBowlersAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=N>,runsScored=<M>)

#Batting scorecard
#allmatches = pd.read_csv("<allmatchesForteam")
#scorecard=yka.teamBattingScorecardAllOppnAllMatches(allmatches,<team1>)

#Bowling scorecard
#allmatches = pd.read_csv("<allmatchesForteam")
#scorecard=yka.teamBowlingScorecardAllOppnAllMatches(allmatches,<team1>)

#Bowling wicket kind
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBowlingWicketKindAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=<N>,wickets=<M>)

# Bowler vs Batsmen
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.teamBowlersVsBatsmenAllOppnAllMatches(allmatches,<team1>,plot=<True/False>,top=<N>,runsConceded=<M>)

# Wins vs losses
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinLossByTeamAllOpposition(allmatches,<team1>,plot=<"summary"/"detailed">)

# Wins by win type
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinsByRunOrWicketsAllOpposition(allmatches,<team1>)

# Wins by toss decision
#allmatches = pd.read_csv("<allmatchesForteam")
#yka.plotWinsbyTossDecisionAllOpposition(allmatches,<team1>,tossDecision='bat'/'field',plot='summary'/'detailed')

5. Class 4 functions

This set of functions are used for analyzing individual batsman/bowler. From the converted xxx-BattingDetails.csv and xxx-BowlingDetails.csv we can get the batsman and bowler details as shown below. Subsequenly we can perform analyses of the individual batsman and bowler. To see actual usages of Class 4 functions see Pitching yorkpy … in the block hole – Part 4

import yorkpy.analytics as yka

#Batsman analyses
#Get batsman Dataframe
#batsmanDF=yka.getBatsmanDetails(<team1>,<batsman>,dir=dir1)

#Batsman Runs vs Deliveries
#yka.batsmanRunsVsDeliveries(batsmanDF,<batsmanName>)

#Batsman fours and sixes
#yka.batsmanFoursSixes(batsmanDF,<batsmanName>)


#Batsman dismissals
#yka.batsmanDismissals(batsmanDF,<batsmanName>)

#Batsman Runs vs Strike Rate
#yka.batsmanRunsVsStrikeRate(batsmanDF,<batsmanName>)

#Batsman Moving average
#yka.batsmanMovingAverage(batsmanDF,<batsmanName>)


#Batsman Cumulative average
#yka.batsmanCumulativeAverageRuns(batsmanDF,<batsmanName>)

#Batsman Cumulative Strike rate
#yka.batsmanCumulativeStrikeRate(batsmanDF,<batsmanName>)

#Batsman Runs against opposition
#yka.batsmanRunsAgainstOpposition(batsmanDF,<batsmanName>)

#Batsman Runs against opposition
#yka.batsmanRunsVenue(batsmanDF,<batsmanName>)


#Bowler analyses
#Get bowler dataframe
#bowlerDF=yka.getBowlerWicketDetails(<team1>,<bowler>dir=dir1)

#Mean economy rate
#yka.bowlerMeanEconomyRate(bowlerDF,<bowlerName>)


#Mean Economy rate
#yka.bowlerMeanEconomyRate(bowlerDF,<bowlerName>)

#Mean Runs conceded
#yka.bowlerMeanRunsConceded(bowlerDF,<bowlerName>)

#Moving average of wickets
#yka.bowlerMovingAverage((bowlerDF,<bowlerName>)

# Cumulative average of wickets
#yka.bowlerCumulativeAvgWickets(bowlerDF,<bowlerName>)

# Cumulative economy rate
#yka.bowlerCumulativeAvgEconRate(bowlerDF,<bowlerName>)

# Wicket plot
#yka.bowlerWicketPlot(df,name)

# Wicket against opposition
#yka.bowlerWicketsAgainstOpposition(bowlerDF,<bowlerName>)

# Wickets at venue
#yka.bowlerWicketsVenue(bowlerDF,<bowlerName>)

Important note: Do check out my other posts using yorkpy at yorkpy-posts

Conclusion

With the above templates detailed analyis can be done on

  • A T20 match
  • Performance of a team in all matches against another team
  • Performance of a team in all matches against all other teams
  • Individual batting and bowling performances

See also

  1. Deep Learning from first principles in Python, R and Octave – Part 5
  2. My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI)
  3. Practical Machine Learning with R and Python – Part 4
  4. Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
  5. A method to crowd source pothole marking on (Indian) roads

To see all posts click Index of posts

yorkpy takes a hat-trick, bowls out Intl. T20s, BBL and Natwest T20!!!

“Dear, dear! How queer everything is to-day! And yesterday things went on just as usual. I wonder if I’ve been changed in the night? Let me think: was I the same when I got up this morning? I almost think I can remember feeling a little different. But if I’m not the same, the next question is ’Who in the world am I? Ah, that’s the great puzzle!”

             Alice's adventures  in Wonderland, Lewis Carroll

1. Introduction

In this post, yorkpy clean bowls the following T20 formats namely International T20s, Big Bash League and Natwest T20 Blast. I take yorkpy on a spin through these T20 leagues. In the post below,I choose a random set of about 10-12 of the overall 63 functions that yorkpy has, and execute them for each of the different T20 leagues – Intl T20s, BBL and Natwest T20s. yorkpy, is the python avatar of my R package yorkr, see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!

There were a couple of new functions that needed to be added for each of the T20 leagues – Intl T20, BBL and Natwest T20 to take into account the different teams in each of these leagues. Further some bugs were also ironed out in tje latest version of yorkpy. yorkpy uses data from Cricsheet . The match data is in the form of YAML files. yorkpy converts these YAML files to dataframes. YAML files are very detailed and include a ball-by-ball account of the match.

– You can clone/fork the latest code for yorkpy from github yorkpy
– This post has also been published in RPubs at yorkpy takes a hat-trick
– You can download the PDF version of this post at yorkpy takes a hat-trick

The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data)

yorkpy has the following 4 main classes of functions

A.Functions analyzing individual T20 match (Class 1)

This was demonstrated in Pitching yorkpy . short of good length to IPL – Part 1 The functions deal with individual T20 matches. The functions are

  1. convertYaml2PandasDataframeT20()
  2. convertAllYaml2PandasDataframesT20()
  3. teamBattingScorecardMatch()
  4. teamBatsmenPartnershipMatch()
  5. teamBatsmenVsBowlersMatch()
  6. teamBowlingScorecardMatch()
  7. teamBowlingWicketKindMatch()
  8. teamBowlingWicketRunsMatch()
  9. teamBowlingWicketMatch()
  10. teamBowlersVsBatsmenMatch()
  11. matchWormChart()

B. Functions that analyze all matches between 2 T20 teams (Class 2

Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 included functions that analyze head-to-head confrontation between any 2 T20 teams The functions are

  1. getAllMatchesBetweenTeams()
  2. saveAllMatchesBetween2IPLTeams()
  3. getAllMatchesBetweenTeams()
  4. saveAllMatchesBetween2IPLTeams()
  5. teamBatsmenPartnershiOppnAllMatches()
  6. teamBatsmenPartnershipOppnAllMatchesChart()
  7. teamBatsmenVsBowlersOppnAllMatches()
  8. teamBattingScorecardOppnAllMatches()
  9. teamBowlingScorecardOppnAllMatches()
  10. teamBowlingWicketKindOppositionAllMatches()
  11. teamBowlersVsBatsmenOppnAllMatches()
  12. plotWinLossBetweenTeams()
  13. plotWinsByRunOrWickets() 23.plotWinsbyTossDecision()

C. Functions that analyze the performance of a T20 team against all other teams (Class 3)

The post Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 is based on Class C set of functions shown below

  1. getAllMatchesAllOpposition()
  2. saveAllMatchesAllOppositionIPLT20(dir1)
  3. getAllMatchesAllOpposition()
  4. saveAllMatchesAllOppositionIPLT20()
  5. teamBatsmenPartnershiAllOppnAllMatches()
  6. teamBatsmenPartnershipAllOppnAllMatchesChart()
  7. teamBatsmenVsBowlersAllOppnAllMatches()
  8. teamBattingScorecardAllOppnAllMatches()
  9. teamBowlingScorecardAllOppnAllMatches()
  10. teamBowlingWicketKindAllOppnAllMatches()
  11. teamBowlersVsBatsmenAllOppnAllMatches()
  12. plotWinLossByTeamAllOpposition()
  13. plotWinsByRunOrWicketsAllOpposition()
  14. plotWinsbyTossDecisionAllOpposition()

D. Functions that analyze performances of T20 batsmen and bowlers (Class 4)

These set of functions analyze individual batsmen and bowlers and have been used in Pitching yorkpy . in the block hole – Part 4 The functions are

  1. getTeamBattingDetails()
  2. getBatsmanDetails()
  3. batsmanRunsVsDeliveries()
  4. batsmanFoursSixes()
  5. batsmanDismissals()
  6. batsmanRunsVsStrikeRate()
  7. batsmanMovingAverage()
  8. batsmanCumulativeAverageRuns()
  9. batsmanCumulativeStrikeRate()
  10. batsmanRunsAgainstOpposition()
  11. batsmanRunsVenue
  12. getTeamBowlingDetails()
  13. getBowlerWicketDetails()
  14. bowlerMeanEconomyRate()
  15. bowlerMeanRunsConceded()
  16. bowlerMovingAverage()
  17. bowlerCumulativeAvgWickets()
  18. bowlerCumulativeAvgEconRate()
  19. bowlerWicketPlot()
  20. bowlerWicketsAgainstOpposition()
  21. bowlerWicketsVenue()

Additional new functions were added to handle Intl T20s, Big Bash League and Natwest T20 Blast, since the teams are different. They are

59. saveAllMatchesBetween2IntlT20s()
60. saveAllMatchesAllOppositionIntlT20()
61. saveAllMatchesBetween2BBLTeams()
62 saveAllMatchesAllOppositionBBLT20()
63. saveAllMatchesBetween2NWBTeams()
64. saveAllMatchesAllOppositionNWBT20()

All other functions can be used as is! You can get the help of any function in yorkpy using

import yorkpy.analytics as yka
help(yka.teamBatsmenPartnershiOppnAllMatches)
## Help on function teamBatsmenPartnershiOppnAllMatches in module yorkpy.analytics:
## 
## teamBatsmenPartnershiOppnAllMatches(matches, theTeam, report='summary', top=5)
##     Team batting partnership against a opposition all IPL matches
##     
##     Description
##     
##     This function computes the performance of batsmen against all bowlers of an oppositions in 
##     all matches. This function returns a dataframe
##     
##     Usage
##     
##     teamBatsmenPartnershiOppnAllMatches(matches,theTeam,report="summary")
##     Arguments
##     
##     matches     
##     All the matches of the team against the oppositions
##     theTeam     
##     The team for which the the batting partnerships are sought
##     report      
##     If the report="summary" then the list of top batsmen with the highest partnerships 
##     is displayed. If report="detailed" then the detailed break up of partnership is returned 
##     as a dataframe
##     top
##     The number of players to be displayed from the top
##     Value
##     
##     partnerships The data frame of the partnerships
##     
##     Note
##     
##     Maintainer: Tinniam V Ganesh tvganesh.85@gmail.com
##     
##     Author(s)
##     
##     Tinniam V Ganesh
##     
##     References
##     
##     http://cricsheet.org/
##     https://gigadom.wordpress.com/
##     
##     
##     See Also
##     
##     teamBatsmenVsBowlersOppnAllMatchesPlot
##     teamBatsmenPartnershipOppnAllMatchesChart

As I mentioned above I will be randomly choosing a set of 12 functions from Class 1,2,3,4 for each of the T20 leagues (Intl T20, BBL and NWB T20) for analysis

2. International T20s

The following functions were added for handling Intl. T20s

  1. saveAllMatchesBetween2IntlT20s()
  2. saveAllMatchesAllOppositionIntlT20()

To handle the countries in Intl. T20s below

Afghanistan, Australia, Bangladesh, Bermuda, Canada, England,Hong Kong,India, Ireland, Kenya, Nepal, Netherlands, “New Zealand, Oman,Pakistan,Scotland,South Africa, Sri Lanka, United Arab Emirates,West Indies, Zimbabwe

import os
#os.chdir('C:\\software\\cricket-package\\yorkpyT20\\t20s')
#import yorkpy.analytics as yka
#1.  Convert all YAML files to dataframes and CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\data1")
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches'
#2. Save all matches between 2 T20 teams
#yka.saveAllMatchesBetween2IntlT20s(dir1)
#3. Save all matches between a T20 team and all other teams
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches'
#yka.saveAllMatchesAllOppositionIntlT20(dir1)
#4. Get batting details
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches
#yka.getTeamBattingDetails("Afghanistan",dir=dir1, save=True)
#yka.getTeamBattingDetails("Australia",dir=dir1,save=True)
#yka.getTeamBattingDetails("Bangladesh",dir=dir1,save=True)
#...
#5. Get bowling details
#dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches
#yka.getTeamBowlingDetails("Afghanistan",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Australia",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Bangladesh",dir=dir1,save=True)
# ...

Once the data is converted you can use the yorkpy functions. The data has been converted for Intl T20 and is available at Github at IntlT20

To use the yorkpy functions for a new league we need to initial convert the YAML files into appropriate format for processing by yorkpy functions

This will create the necessary files which are are used in the functions below

2.2 2.1 Intl. T20 – Team score card  (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\India-New Zealand-2007-09-16.csv")
ind_nz=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(ind_nz,"India")
print(scorecard)
##             batsman  runs  balls  4s  6s          SR
## 0         G Gambhir    51     34   5   2  150.000000
## 1          V Sehwag    40     18   6   2  222.222222
## 2        RV Uthappa     0      2   0   0    0.000000
## 3          MS Dhoni    24     20   2   0  120.000000
## 4      Yuvraj Singh     5      7   0   0   71.428571
## 5        KD Karthik    17     12   3   0  141.666667
## 6         IK Pathan    11     10   2   0  110.000000
## 7        AB Agarkar     1      2   0   0   50.000000
## 8   Harbhajan Singh     7      6   1   0  116.666667
## 9       S Sreesanth    19     10   4   0  190.000000
## 10         RP Singh     1      1   0   0  100.000000
print(extras)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    370      6        0        8     0        0      14

2.2 Intl. T20 -Team batsmen partnership (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\South Africa-Australia-2009-03-27.csv")
sa_aus=pd.read_csv(path)
yka.teamBatsmenPartnershipMatch(sa_aus,'Australia','New Zealand',plot=True)

2.3 Intl. T20 -Team bowling scorecard match (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\Sri Lanka-West Indies-2012-09-28.csv")
sl_wi=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(sl_wi,'Sri Lanka')
print(a)
##          bowler  overs  runs  maidens  wicket  econrate
## 0    A Mohammed      2    13        0       0       6.5
## 1  SA Campbelle      1     8        0       1       8.0
## 2     SC Selman      1     3        0       0       3.0
## 3      SF Daley      2     5        0       1       2.5
## 4     SR Taylor      2     4        0       1       2.0
## 5     TD Smartt      2    17        0       0       8.5

2.4 Intl. T20 -Match Worm chart (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches"
path=os.path.join(dir1,".\\England-India-2012-09-29.csv")
eng_ind=pd.read_csv(path)
yka.matchWormChart(eng_ind,"England", "India")

path=os.path.join(dir1,".\\Bangladesh-Ireland-2015-12-05.csv")
ban_ire=pd.read_csv(path)
yka.matchWormChart(ban_ire,"Bangladesh", "Ireland")

2.5 Intl. T20 -Team Batting partnerships all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"India-England-allMatches.csv")
dc_mi_matches = pd.read_csv(path)
theTeam='India'
m=yka.teamBatsmenPartnershiOppnAllMatches(dc_mi_matches,theTeam,report="detailed", top=4)
print(m)
##      batsman  totalPartnershipRuns    non_striker  partnershipRuns
## 0   SK Raina                   265      G Gambhir                2
## 1   SK Raina                   265       KL Rahul               40
## 2   SK Raina                   265      MK Tiwary               24
## 3   SK Raina                   265       MS Dhoni              124
## 4   SK Raina                   265        P Kumar                0
## 5   SK Raina                   265      PP Chawla                4
## 6   SK Raina                   265       R Ashwin                1
## 7   SK Raina                   265      RG Sharma               16
## 8   SK Raina                   265        V Kohli               47
## 9   SK Raina                   265   Yuvraj Singh                7
## 10  MS Dhoni                   264       A Mishra                1
## 11  MS Dhoni                   264      AT Rayudu               18
## 12  MS Dhoni                   264      HH Pandya                8
## 13  MS Dhoni                   264      IK Pathan                2
## 14  MS Dhoni                   264      JJ Bumrah                2
## 15  MS Dhoni                   264      MK Pandey                3
## 16  MS Dhoni                   264  Parvez Rasool               21
## 17  MS Dhoni                   264       R Ashwin               11
## 18  MS Dhoni                   264      RA Jadeja               11
## 19  MS Dhoni                   264      RG Sharma                9
## 20  MS Dhoni                   264        RR Pant                6
## 21  MS Dhoni                   264     RV Uthappa                5
## 22  MS Dhoni                   264       SK Raina               98
## 23  MS Dhoni                   264      YK Pathan               36
## 24  MS Dhoni                   264   Yuvraj Singh               33
## 25   V Kohli                   236      AM Rahane                3
## 26   V Kohli                   236      G Gambhir               78
## 27   V Kohli                   236       KL Rahul               46
## 28   V Kohli                   236      RG Sharma                2
## 29   V Kohli                   236     RV Uthappa                4
## 30   V Kohli                   236       S Dhawan               45
## 31   V Kohli                   236       SK Raina               48
## 32   V Kohli                   236   Yuvraj Singh               10
## 33     M Raj                   176       A Sharma                2
## 34     M Raj                   176         H Kaur               18
## 35     M Raj                   176      J Goswami                6
## 36     M Raj                   176        KV Jain                5
## 37     M Raj                   176       L Kumari                5
## 38     M Raj                   176    N Niranjana                3
## 39     M Raj                   176       N Tanwar               17
## 40     M Raj                   176        PG Raut               41
## 41     M Raj                   176     R Malhotra                5
## 42     M Raj                   176     S Mandhana                8
## 43     M Raj                   176         S Naik               10
## 44     M Raj                   176       S Pandey               19
## 45     M Raj                   176       SK Naidu               37

2.6 Intl. T20 -Team Batsmen vs Bowlers all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Ireland-Netherlands-allMatches.csv")
ire_nl_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersOppnAllMatches(ire_nl_matches,'Ireland',"Netherlands",plot=True,top=3,runsScored=10)

2.7 Intl. T20 -Team Bowling scorecard all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Bangladesh-Nepal-allMatches.csv")
bang_nep_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardOppnAllMatches(bang_nep_matches,'Bangladesh',"Nepal")
print(scorecard)
##         bowler  overs  runs  maidens  wicket   econrate
## 0      B Regmi      3    14        0       1   4.666667
## 3   SP Gauchan      4    40        0       1  10.000000
## 1   JK Mukhiya      2    16        0       0   8.000000
## 2     P Khadka      3    23        0       0   7.666667
## 4    Sagar Pun      1    16        0       0  16.000000
## 5  Sompal Kami      2    21        0       0  10.500000

2.8 Intl. T20 -Team Batsmen vs Bowlers all Oppositions (Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\"
path=os.path.join(dir1,"Australia-allMatchesAllOpposition.csv")
aus_matches = pd.read_csv(path)
yka.teamBatsmenVsBowlersAllOppnAllMatches(aus_matches,"Australia",plot=True,top=3,runsScored=40)

2.9 Intl. T20 -Wins vs Losses of a team against all other teams (Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\"
path=os.path.join(dir1,"South Africa-allMatchesAllOpposition.csv")
sa_matches = pd.read_csv(path)
team1='South Africa'
yka.plotWinLossByTeamAllOpposition(sa_matches,team1,plot="detailed")

2.10 Intl. T20 -Batsmen analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\"
# Rohit Sharma
name="RG Sharma"
team='India'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)

# MJ Guptill
name="MJ Guptill"
team='New Zealand'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)

2.11 Intl. T20 -Bowler analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\"
# Shakib Al Hasan
name="Shakib Al Hasan"
team='Bangladesh'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)

# Rashid Khan
name="SL Malinga"
team='Sri Lanka'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)

3. Big Bash League

The following functions for added to handle BBL teams

  1. saveAllMatchesBetween2BBLTeams()
  2. saveAllMatchesAllOppositionBBLT20

The BBL teams are included are Adelaide Strikers, Brisbane Heat, Hobart Hurricanes, Melbourne Renegades, Perth Scorchers, Sydney Sixers, Sydney Thunder

To use the yorkpy functions first the YAML files have to be converted into pandas dataframe and then saved as CSV as shown below

import os
import yorkpy.analytics as yka
os.chdir('C:\\software\\cricket-package\\yorkpyBBL\\bbl')
#1. Convert all YAML files to dataframes and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\BBLT20-Matches")
#2. Save all matches between 2 BBL teams
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.saveAllMatchesBetween2BBLTeams(dir1)
#3. Save T20 matches between a BBL team and all other teams
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.saveAllMatchesAllOppositionBBLT20(dir1)
#4. Get the batting details
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.getTeamBattingDetails("Adelaide Strikers",dir=dir1, save=True)
#yka.getTeamBattingDetails("Brisbane Heat",dir=dir1,save=True)
#yka.getTeamBattingDetails("Hobart Hurricanes",dir=dir1,save=True)
#...
# Get the bowling details
dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches'
#yka.getTeamBowlingDetails("Adelaide Strikers",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Brisbane Heat",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Hobart Hurricanes",dir=dir1,save=True)
#...

The functions below perform analysis on the generated files from above. The YAML files have already been converted and are available at Github at BBL

3.1 Big Bash League – Team score card (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Adelaide Strikers-Brisbane Heat-2012-12-13.csv")
as_bh=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(as_bh,"Brisbane Heat")
print(scorecard)
##          batsman  runs  balls  4s  6s          SR
## 0  LA Pomersbach    65     42   8   2  154.761905
## 1       JR Hopes     1      2   0   0   50.000000
## 2       JA Burns    37     31   2   2  119.354839
## 3   DT Christian    12     15   0   0   80.000000
## 4    NLTC Perera    12      4   0   2  300.000000
## 5        CA Lynn    19     18   1   1  105.555556
## 6    BCJ Cutting    13      5   0   2  260.000000
## 7     PJ Forrest    12      8   0   1  150.000000
## 8     CD Hartley     5      2   1   0  250.000000
print(extras)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    371     10        2        5     0        0      17

3.2 Big Bash League -Team batsmen vs Bowlers (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Hobart Hurricanes-Melbourne Renegades-2012-01-18.csv")
hh_mr=pd.read_csv(path)
yka.teamBatsmenVsBowlersMatch(hh_mr,'Hobart Hurricanes','Melbourne Renegades',plot=True)

3.3 Big Bash League -Team bowling scorecard match (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Melbourne Stars-Sydney Thunder-2016-01-24.csv")
ms_st=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(ms_st,'Sydney Thunder')
print(a)
##           bowler  overs  runs  maidens  wicket   econrate
## 0        A Zampa      4    32        0       2   8.000000
## 1  BW Hilfenhaus      2    21        0       0  10.500000
## 2      DJ Hussey      1     9        0       1   9.000000
## 3     DJ Worrall      3    42        0       0  14.000000
## 4      EP Gulbis      2    19        0       0   9.500000
## 5        MA Beer      3    25        0       1   8.333333
## 6     MP Stoinis      4    30        0       3   7.500000

3.4 Big Bash League – Match Worm chart (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches"
path=os.path.join(dir1,".\\Sydney Sixers-Melbourne Stars-2011-12-27.csv")
ss_ms=pd.read_csv(path)
yka.matchWormChart(ss_ms,"Melbourne Stars", "Sydney Sixers")

path=os.path.join(dir1,".\\Hobart Hurricanes-Brisbane Heat-2015-01-02.csv")
hh_bh=pd.read_csv(path)
yka.matchWormChart(hh_bh,"Hobart Hurricanes", "Brisbane Heat")

3.5 Big Bash League -Team Batting partnerships all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Brisbane Heat-Adelaide Strikers-allMatches.csv")
bh_as_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipOppnAllMatchesChart(bh_as_matches,"Brisbane Heat","Adelaide Strikers",plot=True, top=4, partnershipRuns=20)

3.6 Big Bash League -Team Bowling wicket kind all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Sydney Sixers-Perth Scorchers-allMatches.csv")
ss_ps_matches = pd.read_csv(path)
yka.teamBowlingWicketKindOppositionAllMatches(ss_ps_matches,'Perth Scorchers','Sydney Sixers',plot=True,top=5,wickets=1)

3.7 Big Bash League -Team Bowling scorecard all teams (Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Hobart Hurricanes-allMatchesAllOpposition.csv")
hh_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(hh_matches,"Hobart Hurricanes")
print(scorecard)
##              bowler  overs  runs  maidens  wicket   econrate
## 16            B Lee     20   132        0       9   6.600000
## 30         CJ McKay     13   110        0       9   8.461538
## 88    NJ Rimmington     16   103        1       9   6.437500
## 67      JW Hastings     15    88        0       8   5.866667
## 63      JP Faulkner     15   146        0       7   9.733333
## 27        CJ Gannon     17   147        1       7   8.647059
## 93          NM Lyon      8    51        0       7   6.375000
## 20      BCJ Cutting     27   226        0       7   8.370370
## 48          GB Hogg     22   167        0       7   7.590909
## 107       SM Boland     12    96        0       7   8.000000
## 15       B Laughlin     13    99        0       7   7.615385
## 87      MT Steketee     15   134        0       5   8.933333
## 121    Yasir Arafat      9    48        0       4   5.333333
## 96       PJ Cummins      8    83        0       4  10.375000
## 46      Fawad Ahmed     11    64        0       4   5.818182
## 76          MA Beer     12    63        0       4   5.250000
## 108     SNJ O'Keefe     15   104        0       4   6.933333
## 75   M Muralitharan      7    31        0       4   4.428571
## 10           AJ Tye     16   127        0       4   7.937500
## 52          J Botha     13    94        0       4   7.230769
## 56     JL Pattinson      7    71        0       4  10.142857
## 62   JP Behrendorff     16   119        0       4   7.437500
## 3           AC Agar     12    87        0       4   7.250000
## 24     BM Edmondson      4    40        0       4  10.000000
## 37        DJ Hussey      8    47        0       3   5.875000
## 49       GJ Maxwell      8    65        0       3   8.125000
## 84       MN Samuels      4    22        0       3   5.500000
## 81         MG Neser      5    54        0       3  10.800000
## 44     DT Christian      9   114        0       3  12.666667
## 50        GS Sandhu      7    51        0       3   7.285714
## ..              ...    ...   ...      ...     ...        ...
## 43        DP Nannes      8    58        0       1   7.250000
## 51         IA Moran      4    25        0       1   6.250000
## 55         JK Lalor     10    82        0       1   8.200000
## 54        JH Kallis      3    18        0       1   6.000000
## 73   LR Butterworth      4    25        0       1   6.250000
## 4      AC McDermott      2    28        0       1  14.000000
## 70         LA Doran      4    38        0       1   9.500000
## 69    KW Richardson      6    44        0       1   7.333333
## 119     WD Sheridan      2     6        0       0   3.000000
## 2       AB McDonald      1    15        0       0  15.000000
## 115      TD Andrews      3    23        0       0   7.666667
## 11          AK Heal      4    33        0       0   8.250000
## 7        AD Russell      4    40        0       0  10.000000
## 8          AJ Finch      2    15        0       0   7.500000
## 9         AJ Turner      3    28        0       0   9.333333
## 60        JM Mennie      1    20        0       0  20.000000
## 18        BA Stokes      1     9        0       0   9.000000
## 26         CH Gayle      1    16        0       0  16.000000
## 28         CJ Green      4    44        0       0  11.000000
## 95   PD Collingwood      2    20        0       0  10.000000
## 31       CJ Simmons      4    21        0       0   5.250000
## 59       JM Holland      3    34        0       0  11.333333
## 36         DJ Bravo      6    64        0       0  10.666667
## 38     DJ Pattinson      2    16        0       0   8.000000
## 41       DJ Worrall      8    90        0       0  11.250000
## 72      LN O'Connor      6    56        0       0   9.333333
## 71        LJ Wright      3    27        0       0   9.000000
## 68       KA Pollard      1     7        0       0   7.000000
## 58       JM Herrick      4    23        0       0   5.750000
## 92       NM Hauritz      5    42        0       0   8.400000
## 
## [122 rows x 6 columns]

3.8 Big Bash League -Plot wins vs losses against all teams(Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Sydney Sixers-allMatchesAllOpposition.csv")
ss_matches = pd.read_csv(path)
yka.plotWinLossByTeamAllOpposition(ss_matches,'Sydney Sixers')

3.9 Big Bash League -Wins vs losses by toss decision (Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Adelaide Strikers-allMatchesAllOpposition.csv")
as_matches = pd.read_csv(path)
yka.plotWinsByRunOrWicketsAllOpposition(as_matches,'Adelaide Strikers')

3.10 Big Bash League -Batsmen Analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails"
# CA Lynn
name="CA Lynn"
team='Brisbane Heat'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)

# UT Khawaja
name="UT Khawaja"
team='Sydney Thunder'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)

3.11Big Bash League – Bowler analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails"
# CJ McKay
name="CJ McKay"
team='Sydney Thunder'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)

# AU Rashid
name="AU Rashid"
team='Adelaide Strikers'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)

4. Natwest T20 Blast

The following functions for added to handle Natwest T20 teams

  1. saveAllMatchesBetween2NWBTeams()
  2. saveAllMatchesAllOppositionNWBT20

The Natwest teams are
Derbyshire, Durham, Essex, Glamorgan, Gloucestershire, Hampshire, Kent,Lancashire, Leicestershire, Middlesex,Northamptonshire, Nottinghamshire, Somerset, Surrey, Sussex, Warwickshire, Worcestershire,Yorkshire

In order to perform analysis with yorkpy, the YAML data has to be converted to pandas dataframe and saves as CSV as shown

#import os
#import yorkpy.analytics as yka
#os.chdir('C:\\software\\cricket-package\\yorkpyNWB\\nwb')
#1. Convert YAML to dataframes and save as CSV
#yka.convertAllYaml2PandasDataframesT20(".", "..\\NWBT20-Matches")
#2. Save all matches between 2 NWBT20 teams
#dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.saveAllMatchesBetween2NWBTeams(dir1)
#3. Save all matches between a NWB T20 team and all other teams
#dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.saveAllMatchesAllOppositionNWBT20(dir1)
#4. Compute the batting details
dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.getTeamBattingDetails("Derbyshire",dir=dir1, save=True)
#yka.getTeamBattingDetails("Durham",dir=dir1,save=True)
#yka.getTeamBattingDetails("Essex",dir=dir1,save=True)
#..
#5. Compute bowling details
dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches'
#yka.getTeamBowlingDetails("Derbyshire",dir=dir1, save=True)
#yka.getTeamBowlingDetails("Durham",dir=dir1,save=True)
#yka.getTeamBowlingDetails("Essex",dir=dir1,save=True)
#...

Once the data is converted all yorkpy functions can be used. This has already been done and is available at github NWB

4.1 Natwest T20 Blast – Team score card (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Durham-Yorkshire-2016-08-20.csv")
d_y=pd.read_csv(path)
scorecard,extras=yka.teamBattingScorecardMatch(d_y,"Durham")
print(scorecard)
##           batsman  runs  balls  4s  6s          SR
## 0     MD Stoneman    25     20   4   0  125.000000
## 1     KK Jennings    11     13   1   0   84.615385
## 2       BA Stokes    56     37   4   3  151.351351
## 3   MJ Richardson    29     23   4   1  126.086957
## 4     JTA Burnham    17     15   1   1  113.333333
## 5      RD Pringle    10      9   1   0  111.111111
## 6  PD Collingwood     2      3   0   0   66.666667
## 7        U Arshad     1      1   0   0  100.000000
print(extras)
##    total  wides  noballs  legbyes  byes  penalty  extras
## 0    305      2        0        5     0        0       7

4.2 Natwest T20 Blast -Team batsmen vs Bowlers (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Derbyshire-Lancashire-2016-07-13.csv")
d_l=pd.read_csv(path)
yka.teamBatsmenVsBowlersMatch(d_l,'Lancashire','Derbyshire',plot=True)

4.3 Natwest T20 Blast -Team bowling scorecard match (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Essex-Surrey-2016-05-20.csv")
e_s=pd.read_csv(path)
a=yka.teamBowlingScorecardMatch(e_s,'Essex')
print(a)
##           bowler  overs  runs  maidens  wicket   econrate
## 0  Azhar Mahmood      3    38        0       4  12.666667
## 1       GJ Batty      4    33        0       1   8.250000
## 2       JE Burke      1    18        0       0  18.000000
## 3     MW Pillans      3    28        0       0   9.333333
## 4      SM Curran      4    23        0       2   5.750000
## 5      TK Curran      4    21        0       3   5.250000

4.4 Natwest T20 Blast -Match Worm chart (Class 1)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches"
path=os.path.join(dir1,".\\Gloucestershire-Glamorgan-2016-06-10.csv")
ss_ms=pd.read_csv(path)
yka.matchWormChart(ss_ms,"Gloucestershire", "Glamorgan")

path=os.path.join(dir1,".\\Leicestershire-Northamptonshire-2016-05-20.csv")
hh_bh=pd.read_csv(path)
yka.matchWormChart(hh_bh,"Northamptonshire", "Leicestershire")

4.5 Natwest T20 Blast -Team Batting partnerships all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Hampshire-Sussex-allMatches.csv")
h_s_matches = pd.read_csv(path)
yka.teamBatsmenPartnershipOppnAllMatchesChart(h_s_matches,"Hampshire","Sussex",plot=True, top=4, partnershipRuns=10)

4.6 Natwest T20 Blast -Team Bowling wicket kind all matches 2 teams (Class 2)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams"
path=os.path.join(dir1,"Kent-Somerset-allMatches.csv")
k_s_matches = pd.read_csv(path)
yka.teamBowlersVsBatsmenOppnAllMatches(k_s_matches,'Kent','Somerset',plot=True,
top=5,runsConceded=10)

4.7 Natwest T20 Blast -Team Bowling scorecard all teams (Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Middlesex-allMatchesAllOpposition.csv")
m_matches = pd.read_csv(path)
scorecard=yka.teamBowlingScorecardAllOppnAllMatches(m_matches,"Middlesex")
print(scorecard)
##               bowler  overs  runs  maidens  wicket   econrate
## 1             AJ Tye      8    75        0       6   9.375000
## 5         BAC Howell      8    41        0       5   5.125000
## 26         GR Napier      7    65        0       5   9.285714
## 15        DI Stevens      4    31        0       4   7.750000
## 19       DW Lawrence      6    37        0       4   6.166667
## 32       JW Dernbach      4    33        0       3   8.250000
## 7          BTJ Wheal      4    43        0       3  10.750000
## 18         DR Briggs      4    24        0       3   6.000000
## 50     RK Kleinveldt      4    24        0       3   6.000000
## 46         R McLaren      7    59        0       3   8.428571
## 47         R Rampaul      3    21        0       3   7.000000
## 34         L Gregory      6    51        0       2   8.500000
## 33   KMDN Kulasekara      2    24        0       2  12.000000
## 40          MG Hogan      3    17        0       2   5.666667
## 43        MTC Waller      4    31        0       2   7.750000
## 49        RJ Gleeson      4    20        0       2   5.000000
## 48  RE van der Merwe      5    24        0       2   4.800000
## 51  RN ten Doeschate      4    32        0       2   8.000000
## 53        S Prasanna      4    20        0       2   5.000000
## 56           SW Tait      3    17        0       2   5.666667
## 57     Shahid Afridi      8    55        0       2   6.875000
## 59  T van der Gugten      3    13        1       2   4.333333
## 64          TS Mills      3    34        0       2  11.333333
## 65          WAT Beer      4    23        0       2   5.750000
## 31          JH Davey      4    28        0       2   7.000000
## 68         ZS Ansari      3    16        0       2   5.333333
## 25         GM Andrew      3    19        0       2   6.333333
## 23          GJ Batty      6    55        0       2   9.166667
## 16          DJ Bravo      3    27        0       2   9.000000
## 41          MR Quinn      6    65        0       1  10.833333
## ..               ...    ...   ...      ...     ...        ...
## 24     GL van Buuren      7    49        0       1   7.000000
## 37           MD Hunn      3    35        0       1  11.666667
## 36        LC Norwell      6    62        0       1  10.333333
## 29       JC Tredwell      4    35        0       1   8.750000
## 35         LA Dawson      6    53        0       1   8.833333
## 62           TL Best      4    51        0       0  12.750000
## 58         T Westley      2    12        0       0   6.000000
## 4         Azharullah      3    24        0       0   8.000000
## 60     TD Groenewald      1    21        0       0  21.000000
## 61         TK Curran      4    35        0       0   8.750000
## 38         MD Taylor      3    30        0       0  10.000000
## 30        JG Myburgh      1     5        0       0   5.000000
## 8          C Overton      2    18        0       0   9.000000
## 2        Ashar Zaidi      1     5        0       0   5.000000
## 66          WR Smith      2    25        0       0  12.500000
## 28         J Overton      2    24        0       0  12.000000
## 6          BJ Taylor      1     6        0       0   6.000000
## 22          GG White      4    31        0       0   7.750000
## 55          SP Crook      1     9        0       0   9.000000
## 39        ME Claydon      4    40        0       0  10.000000
## 52         RS Bopara      4    32        0       0   8.000000
## 10           CD Nash      2    19        0       0   9.500000
## 11         CH Morris      4    36        0       0   9.000000
## 12         DA Cosker      3    32        0       0  10.666667
## 13      DA Griffiths      4    39        0       0   9.750000
## 45          PD Trego      1    11        0       0  11.000000
## 44   PA van Meekeren      2    19        0       0   9.500000
## 42          MS Crane      2    25        0       0  12.500000
## 20        FK Cowdrey      1    19        0       0  19.000000
## 14        DD Masters      2    16        0       0   8.000000
## 
## [69 rows x 6 columns]

4.8 Natwest T20 Blast -Plot wins vs losses against all teams(Class 3)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition"
path=os.path.join(dir1,"Warwickshire-allMatchesAllOpposition.csv")
w_matches = pd.read_csv(path)
yka.plotWinLossByTeamAllOpposition(w_matches,'Warwickshire')

4.9 Natwest T20 Blast -Batsmen Analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails"
# M Klinger
name="M Klinger"
team='Gloucestershire'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)

# CA Ingram
name="CA Ingram"
team='Glamorgan'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)

4.11 Natwest T20 Blast -Bowler analysis (Class 4)

import os
import pandas as pd
import yorkpy.analytics as yka
dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails"
# BAC Howell
name="BAC Howell"
team='Gloucestershire'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)

# GR Napier
name="GR Napier"
team='Essex'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)

Note: yorkpy will work for all T20 leagues which are in YAML format as specified in Cricsheet.

You can clone/fork the latest code for yorkpy from github yorkpy

The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data)

Conclusion This post shows the kind of detailed analysis that can be performed with yorkpy. In fact with all the converted data it should be possible to also train a Machine Learning model, which I will probably keep for another day. You could go ahead and use the data in other innovative ways. Do keep me posted if you do!!

Important note: Do check out my other posts using yorkpy at yorkpy-posts

Have fun with yorkpy!!

See also
1. Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
2. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
3. Hand detection through Haartraining: A hands-on approach
4.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
5. Introducing QCSimulator: A 5-qubit quantum computing simulator in R
6. The 3rd paperback & kindle editions of my books on Cricket, now on Amazon

To see all posts click Index of posts

Pitching yorkpy … in the block hole – Part 4

A good programmer is someone who always looks both ways before crossing a one-way street.  Doug Linder

There are two ways to write error-free programs; only the third one works. Alan J. Perlis

In order to understand recursion, one must first understand recursion. Anonymous

This is the fourth and final part of my Python package yorkpy. In this part yorkpy, the python avatar of my R package yorkr see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!, develops wings and is prepared for take-off. The yorkpy package uses data from Cricsheet

You can clone/download the code at Github yorkpy
This post has been published to RPubs at yorkpy-Part4
You can download this post as PDF at IPLT20-yorkpy-part4
You can download all the data used in this post and the previous post at yorkpyData

This post is a continuation of the earlier posts on yorkpy

1. Pitching yorkpy . short of good length to IPL – Part 1 In this part I included functions that convert the yaml data of IPL matches into Pandas dataframe which are then saved as CSV. This part can perform analysis of individual IPL matches. Note The converted data is available at yorkpyData
2. Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 This part included functions to create a large data frame for head-to-head confrontation between any 2IPL teams says CSK-MI, DD-KKR etc, which can be saved as CSV. Analysis is then performed on these team-2-team confrontations. Note The converted data is available at yorkpyData
3. Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 The 3rd part includes the performance of any IPL team against all other IPL teams. The data can also be saved as CSV.Note The converted data is available at yorkpyData

Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below).

This 4th and final part includes analysis of batting and bowling performances of any IPL player. The batting and bowling details for all teams have already been converted and are available at IPLT20-Batting-BowlingDetails

This part includes the following new functions

Batsman functions

  1. batsmanRunsVsDeliveries
  2. batsmanFoursSixes
  3. batsmanDismissals
  4. batsmanRunsVsStrikeRate
  5. batsmanMovingAverage
  6. batsmanCumulativeAverageRuns
  7. batsmanCumulativeStrikeRate
  8. batsmanRunsAgainstOpposition
  9. batsmanRunsVenue

Bowler functions

  1. bowlerMeanEconomyRate
  2. bowlerMeanRunsConceded
  3. bowlerMovingAverage
  4. bowlerCumulativeAvgWickets
  5. bowlerCumulativeAvgEconRate
  6. bowlerWicketPlot
  7. bowlerWicketsAgainstOpposition
  8. bowlerWicketsVenue

A. Batsman functions

1. Get IPL Team Batting details

The function below gets the overall IPL team batting details based on the CSV files that were saved for IPL T20 matches. This is currently also available in Github at yorkpyData. The batting details of the IPL team in each match is created and a huge data frame is created by combining the batting details from each match. This can be saved as a csv file with name as for e.g. Delhi Daredevils-BattingDetails.csv.

dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
#csk_details = yka.getTeamBattingDetails("Chennai Super Kings",dir=dir1, save=True)
#dd_details = yka.getTeamBattingDetails("Delhi Daredevils",dir=dir1,save=True)
#kkr_details = yka.getTeamBattingDetails("Kolkata Knight Riders",dir=dir1,save=True)

2. Get IPL batsman details

This function is used to get the individual IPL T20 batting record for a the specified batsman of the team as in the functions below.

For the batsmen functions below I have chosen Rishabh Pant, Kane Williamson and Ambati Rayudu for the analysis as they top the batting lists. You can choose any IPL batsmen for the analysis

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
rpant=yka.getBatsmanDetails(team,name,dir=dir1)

3 Batsman Runs vs Deliveries (in IPL matches)

This functions plots the runs vs deliveries faced for batsman

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsDeliveries(df,name)

4. Batsman fours and sixes (in IPL matches)

This plots the fours, sixes and the total runs for a batsman

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)


# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanFoursSixes(df,name)

5. Batsman dismissals (in IPL matches)

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanDismissals(df,name)

6. Batsman Runs vs Strike Rate (in IPL matches)

The plots below give the Runs vs Strike rate for batsmen

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVsStrikeRate(df,name)

7. Batsman Moving average of runs (in IPL matches)

The plots below compute and plot the moving average of batsmen

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanMovingAverage(df,name)

8. Batsman Cumulative average of runs (in IPL matches)

The functions below plot the cumulative average of the batsmen

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeAverageRuns(df,name)

9. Batsman Cumulative Strike Rate (in IPL matches)

The functions below plot the cumulative strike rate of the batsmen

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanCumulativeStrikeRate(df,name)

10. Batsman performance against opposition (in IPL matches)

The plots below show how the batsmen performed against other IPL teams

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsAgainstOpposition(df,name)

11. Batsman performance at different venues (in IPL matches)

The plots below show how the batsmen performed at different venues

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Rishabh Pant
name="RR Pant"
team='Delhi Daredevils'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)

# 2. Kane Williamson
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="KS Williamson"
team='Sunrisers Hyderabad'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)

#3. Ambati Rayudu
name="AT Rayudu"
team='Mumbai Indians'
df=yka.getBatsmanDetails(team,name,dir=dir1)
yka.batsmanRunsVenue(df,name)

B. Bowler functions

12. Get bowling details in IPL matches

The function below gets the overall team IPL T20 bowling details based on the RData file available in IPL T20 matches. This is currently also available in Github at yorkpyData. The IPL T20 bowling details of the IPL team in each match is created, and a huge data frame is created by stacking the individual dataframes. This can be saved as a CSV file for e.g. Chennai Super Kings-BowlingDetails.csv

dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
#kkr_bowling = yka.getTeamBowlingDetails("Kolkata Knight Riders",dir=dir1,save=True)
#csk_bowling = yka.getTeamBowlingDetails("Chennai Super Kings",dir=dir1,save=True)
#kxip_bowling = yka.getTeamBowlingDetails("Kings XI Punjab",dir=dir1,save=True)

13. Get bowling details of the individual IPL bowlers

This function is used to get the individual bowling record for a specified bowler of the country as in the functions below.

The plots below deal with bowler’s performance. For this analysis I have chosen Amit Mishra, Piyush Chawla and Bhuvaneshwar Kumar for the analysis. You can chose any other IPL bowler

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
#df=yka.getBowlerWicketDetails(team,name,dir=dir1)

14. Bowler Economy Rate (in IPL matches)

The plots below show the economy rate of the selected bowlers

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanEconomyRate(df,name)

15. Bowler Mean Runs conceded (in IPL matches)

The plots below show the mean runs conceded by the selected bowlers

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMeanRunsConceded(df,name)

16. Moving average of wickets for bowler (in IPL matches)

The moving average of the bowlers are plotted below

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerMovingAverage(df,name)

17. Cumulative average wickets for bowler (in IPL matches)

The cumulative average wickets for each bowler is computed and plotted

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgWickets(df,name)

18. Cumulative average economy rate for bowler (in IPL matches)

The plots below give the cumulative average economy rate for each bowler

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerCumulativeAvgEconRate(df,name)

19. Bowler wicket plot (in IPL matches)

The plots below give the over vs wickets for bowlers

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketPlot(df,name)

20. Bowler wicket against opposition (in IPL matches)

The performance of the bowlers against different IPL teams is shown below

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsAgainstOpposition(df,name)

21. Bowler wicket in different venues (in IPL matches)

The plots below show how the bowlers perform at different venues

import pandas as pd
import os
import yorkpy.analytics as yka
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
# 1. Amit Mishra
name="A Mishra"
team='Delhi Daredevils'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)

# 2. Piyush Chawla
dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3"
name="PP Chawla"
team='Kolkata Knight Riders'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)

#3. Bhuvneshwar Kumar
name="B Kumar"
team='Sunrisers Hyderabad'
df=yka.getBowlerWicketDetails(team,name,dir=dir1)
yka.bowlerWicketsVenue(df,name)

Note:You can clone/download the code at Github yorkpy

Important note: Do check out my other posts using yorkpy at yorkpy-posts

Conclusion: This concludes the python package yorkpy. Go ahead and give yorkpy a spin!

Also see
1. Take 4+: Presentations on ‘Elements of Neural Networks and Deep Learning’ – Parts 1-8
2. My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon
3. Hand detection through Haartraining: A hands-on approach
4.My book ‘Deep Learning from first principles:Second Edition’ now on Amazon
5. Big Data-1: Move into the big league:Graduate from Python to Pyspark
6. Cricpy takes a swing at the ODIs

To see all posts click Index of posts

My book ‘Practical Machine Learning in R and Python: Third edition’ on Amazon

Are you wondering whether to get into the ‘R’ bus or ‘Python’ bus?
My suggestion is to you is “Why not get into the ‘R and Python’ train?”

The third edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback ($12.99) and kindle ($8.99/Rs449) versions.  In the third edition all code sections have been re-formatted to use the fixed width font ‘Consolas’. This neatly organizes output which have columns like confusion matrix, dataframes etc to be columnar, making the code more readable.  There is a science to formatting too!! which improves the look and feel. It is little wonder that Steve Jobs had a keen passion for calligraphy! Additionally some typos have been fixed.

 

In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code.
1. Practical machine with R and Python: Third Edition – Machine Learning in Stereo(Paperback-$12.99)
2. Practical machine with R and Python Third Edition – Machine Learning in Stereo(Kindle- $8.99/Rs449)

This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Table of Contents
Preface …………………………………………………………………………….4
Introduction ………………………………………………………………………6
1. Essential R ………………………………………………………………… 8
2. Essential Python for Datascience ……………………………………………57
3. R vs Python …………………………………………………………………81
4. Regression of a continuous variable ……………………………………….101
5. Classification and Cross Validation ………………………………………..121
6. Regression techniques and regularization ………………………………….146
7. SVMs, Decision Trees and Validation curves ………………………………191
8. Splines, GAMs, Random Forests and Boosting ……………………………222
9. PCA, K-Means and Hierarchical Clustering ………………………………258
References ……………………………………………………………………..269

Pick up your copy today!!
Hope you have a great time learning as I did while implementing these algorithms!