# Big Data-5: kNiFi-ing through cricket data with yorkpy

“The temptation to form premature theories upon insufficient data is the bane of our profession.”

Sherlock Holmes in the Valley of fear by Arthur Conan Doyle

“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”

Jim Barksdale, former CEO Netscape

In this post I use  Apache NiFi Dataflow Pipeline along with my Python package yorkpy to crunch through cricket data from Cricsheet. The Data Pipelne  flows all the way from the source  to target analytics output. Apache NiFi was created to automate the flow of data between systems.  NiFi dataflows enable the automated and managed flow of information between systems. This post automates the flow of data from Cricsheet, from where the zip file it is downloaded, unpacked, processed, transformed and finally T20 players are ranked.

While this is a straight forward example of what can be done, this pattern can be applied to real Big Data systems. For example hypothetically, we could consider that we get several parallel streams of  cricket data or for that matter any sports related data. There could be parallel Data flow pipelines that get the data from the sources. This would then be  followed by data transformation modules and finally a module for generating analytics. At the other end a UI based on AngularJS or ReactJS could display the results in a cool and awesome way.

Incidentally, the NiFi pipeline that I discuss in this post, is a simplistic example, and does not use the Big Data stack like HDFS, Hive, Spark etc. Nevertheless, the pattern used, has all the modules for a Big Data pipeline namely ingestion, unpacking, transformation and finally analytics. This NiF pipeline demonstrates the flow using the regular file system of Mac and my python based package yorkpy. The concepts mentioned could be used in a real Big Data scenario which has much fatter pipes of data coming. If  this was the case the NiFi pipeline would utilize  HDFS/Hive for storing the ingested data and Pyspark/Scala for the transformation and analytics and other related technologies.

A pictorial representation is given below

In the diagram above each of the vertical boxes could be any technology from the ever proliferating Big Data stack namely HDFS, Hive, Spark, Sqoop, Kafka, Impala and so on.  Such a dataflow automation could be created when any big sporting event happens, as long as the data generated large, and there is a need for dynamic and automated reporting. The UI could be based on AngularJS/ReactJS and could display analytical tables and charts.

This post demonstrates one such scenario in which IPL T20 data is downloaded from Cricsheet site, unpacked and stored in a specific directory. This dataflow automation is based on my yorkpy package. To know more about the yorkpy package  see Pitching yorkpy … short of good length to IPL – Part 1  and the associated parts. The zip file, from Cricsheet, contains individual IPL T20 matches in YAML format. The convertYaml2DataframeT20() function is used to convert the YAML files into Pandas dataframes before storing them as CSV files. After this done, the function rankIPLT20batting() function is used to perform the overall ranking of the T20 players. My yorkpy Python package has about ~ 50+ functions that perform various analytics on any T20 data for e.g it has the following classes of functions

• analyze T20 matches
• analyze performance of a T20 team in all matches against another T20 team
• analyze performance of a T20 team against all other T20 teams
• analyze performance of T20 batsman and bowlers
• rank T20 batsmen and bowlers

The functions of yorkpy generate tables or charts. While this post demonstrates one scenario, we could use any of the yorkpy T20 functions, generate the output and display on a widget in the UI display, created with cool technologies like AngularJS/ReactJS,  possibly in near real time as data keeps coming in.,

To use yorkpy with NiFI the following packages have to be installed in your environment

-pip install yorkpy
-pip install pyyaml
-pip install pandas
-yum install python-devel (equivalent in Windows)
-pip install matplotlib
-pip install seaborn
-pip install sklearn
-pip install datetime

I have created a video of the NiFi Pipeline with the real dataflow fro source to the ranked IPL T20 batsmen. Take a look at RankingT20PlayersWithNiFiYorkpy

You can clone/fork the NiFi template from rankT20withNiFiYorkpy

The NiFi Data Flow Automation is shown below

## 1. Overall flow

The overall NiFi flow contains 2 Process Groups a) DownloadAnd Unpack. b) Convert and Rank IPL batsmen. While it appears that the Process Groups are disconnected, they are not. The first process group downloads the T20 zip file, unpacks the. zip file and saves the YAML files in a specific folder. The second process group monitors this folder and starts processing as soon the YAML files are available. It processes the YAML converting it into dataframes before storing it as CSV file. The next  processor then does the actual ranking of the batsmen before writing the output into IPLrank.txt

This process group is shown below

The ${T20data} variable points to the specific T20 format that needs to be downloaded. I have set this to https://cricsheet.org/downloads/ipl.zip. This could be set any other data set. In fact we could have parallel data flows for different T20/ Sports data sets and generate #### 1.1.2 SaveUnpackedData This processor stores the YAML files in a predetermined folder, so that the data can be picked up by the 2nd Process Group for processing ### 1.2 ProcessAndRankT20Players Process Group This is the second process group which converts the YAML files to pandas dataframes before storing them as. CSV files. The RankIPLPlayers will then read all the CSV files, stack them and then proceed to rank the IPL players. The Process Group is shown below #### 1.2.1 ListFile and FetchFile Processors The left 2 Processors ListFile and FetchFile get all the YAML files from the folder and pass it to the next processor #### 1.2.2 convertYaml2DataFrame Processor The convertYaml2DataFrame Processor uses the ExecuteStreamCommand which call a python script. The Python script invoked the yorkpy function convertYaml2Dataframe() as shown below The${convertYaml2Dataframe} variable points to the python file below which invoked the yorkpy function yka.convertYaml2PandasDataframeT20()

import yorkpy.analytics as yka
import argparse
parser = argparse.ArgumentParser(description='convert')
args=parser.parse_args()
yamlFile=args.yamlFile
yka.convertYaml2PandasDataframeT20(yamlFile,"/Users/tvganesh/backup/software/nifi/ipl","/Users/tvganesh/backup/software/nifi/ipldata")

This function takes as input $filename which comes from FetchFile processor which is a FlowFile. So I have added a concurrency of 8 to handle upto 8 Flowfiles at a time. The thumb rule as I read on the internet is 2x, 4x the number of cores of your system. Since I have an 8 core Mac, I could possibly have gone ~ 30 concurrent threads. Also the number of concurrent threads is less when the flow is run in a Oracle Box VirtualMachine. Box since a vCore < actual Core The scheduling tab is as below Here are the 8 concurrent Python threads on Mac at bottom right… (pretty cool!) I have not fully tested how latency vs throughput slider changes, affects the performance. #### 1.2.3 MergeContent Processor This processor’s only job is to trigger the rankIPLPlayers when all the FlowFiles have merged into 1 file. #### 1.2.4 RankT20Players This processor is an ExecuteStreamCommand Processor that executes a Python script which invokes a yorkpy function rankIPLT20Batting() import yorkpy.analytics as yka rank=yka.rankIPLT20Batting("/Users/tvganesh/backup/software/nifi/ipldata") print(rank.head(15))  #### 1.2.5 OutputRankofT20Player Processor This processor writes the generated rank to an output file. ### 1.3 Final Ranking of IPL T20 players The Nodejs based web server picks up this file and displays on the web page the final ranks (the code is based on a good youtube for reading from file) ## 2. Final thoughts As I have mentioned above though the above NiFi Cricket Dataflow automation does not use the Hadoop ecosystem, the pattern used is valid and can be used with some customization in Big Data flows as parallel stream. I could have also done this on Oracle VirtualBox but I thought since the code is based on Python and Pandas there is no real advantage of running on the VirtualBox. GIve the NiFi flow a shot. Have fun!!! To see all posts click Index of posts # Ranking T20 players in Intl T20, IPL, BBL and Natwest using yorkpy There is a voice that doesn’t use words, listen. When someone beats a rug, the blows are not against the rug, but against the dust in it. I lost my hat while gazing at the moon, and then I lost my mind. Rumi ## Introduction After a long hiatus, I am back to my big, bad, blogging ways! In this post I rank T20 players from several different leagues namely • International T20 • Indian Premier League (IPL) T20 • Big Bash League (BBL) T20 • Natwest Blast (NTB) T20 I have added 8 new functions to my Python Package yorkpy, which will perform the ranking for the above 4 T20 League formats. To know more about my Python package see Pitching yorkpy . short of good length to IPL – Part 1, and the related posts on yorkpy. The code can be easily extended to other leagues which have a the same ‘yaml’ format for the matches. I also fixed some issues which started to crop up, possibly because a few things have changed in the new data. The new functions are 1. rankIntlT20Batting() 2. rankIntlT20Batting() 3. rankIPLT20Batting() 4. rankIPLT20Batting 5. rankBBLT20Batting() 6. rankBBLT20Batting() 7. rankNTBT20Batting() 8. rankNTBT20Batting() The yorkpy package uses data from Cricsheet You can clone/fork the code for yorkpy at yorkpy You can download the PDF of the post from Rank T20 yorkpy can be installed with ‘pip install yorkpy ## 1. International T20 The steps to do before ranking for International T20 matches are 1. Download International T20 zip file from Cricsheet Intl T20 2. Unzip the file. This will create a folder with yaml files import yorkpy.analytics as yka #yka.convertAllYaml2PandasDataframesT20("../t20s","../data") This above step will convert the yaml files into CSV files. Now do the ranking as below ## 1a. Ranking of International T20 batsmen import yorkpy.analytics as yka intlT20RankBatting=yka.rankIntlT20Batting("C:\\software\\cricket-package\\yorkpyPkg\\data\\data") intlT20RankBatting.head(15) ## matches runs_mean SR_mean ## batsman ## V Kohli 58 38.672414 125.212402 ## KS Williamson 42 32.595238 122.884631 ## Mohammad Shahzad 52 31.942308 118.212288 ## CH Gayle 50 31.140000 111.869984 ## BB McCullum 69 29.492754 117.011666 ## MM Lanning 48 28.812500 98.582663 ## SJ Taylor 44 28.659091 98.684856 ## MJ Guptill 68 28.573529 117.673702 ## DA Warner 71 28.507042 121.142746 ## DPMD Jayawardene 53 27.584906 107.787092 ## KC Sangakkara 54 26.407407 106.039838 ## JP Duminy 68 26.294118 114.606717 ## TM Dilshan 78 26.243590 97.910384 ## RG Sharma 65 25.907692 113.056548 ## H Masakadza 53 25.566038 99.453880 ## 1b. Ranking of International T20 bowlers import yorkpy.analytics as yka intlT20RankBowling=yka.rankIntlT20Bowling("C:\\software\\cricket-package\\yorkpyPkg\\data\\data") intlT20RankBowling.head(15) ## matches wicket_mean econrate_mean ## bowler ## Umar Gul 58 1.603448 7.637931 ## SL Malinga 78 1.500000 7.409188 ## Saeed Ajmal 63 1.492063 6.451058 ## DW Steyn 46 1.478261 7.014855 ## A Shrubsole 45 1.422222 6.294444 ## M Morkel 41 1.292683 7.680894 ## KMDN Kulasekara 57 1.280702 7.476608 ## TG Southee 51 1.274510 8.759804 ## SCJ Broad 53 1.264151 inf ## Shakib Al Hasan 58 1.241379 6.836207 ## R Ashwin 44 1.204545 7.162879 ## Nida Dar 44 1.204545 6.083333 ## KH Brunt 44 1.204545 5.982955 ## KD Mills 42 1.166667 8.289683 ## SR Watson 46 1.152174 8.246377 ## 2. Indian Premier League (IPL) T20 The steps to do before ranking for IPL T20 matches are 1. Download IPL T20 zip file from Cricsheet IPL T20 2. Unzip the file. This will create a folder with yaml files import yorkpy.analytics as yka #yka.convertAllYaml2PandasDataframesT20("../ipl","../ipldata") This above step will convert the yaml files into CSV files in the /ipldata folder. Now do the ranking as below ## 2a. Ranking of batsmen in IPL T20 import yorkpy.analytics as yka IPLT20RankBatting=yka.rankIPLT20Batting("C:\\software\\cricket-package\\yorkpyPkg\\data\\ipldata") IPLT20RankBatting.head(15) ## matches runs_mean SR_mean ## batsman ## DA Warner 129 37.589147 119.917864 ## CH Gayle 123 36.723577 125.256818 ## SE Marsh 70 36.314286 114.707578 ## KL Rahul 59 33.542373 123.424971 ## MEK Hussey 60 33.400000 100.439187 ## V Kohli 174 32.413793 115.830849 ## KS Williamson 42 31.690476 120.443172 ## AB de Villiers 143 30.923077 128.967081 ## JC Buttler 45 30.800000 132.561154 ## AM Rahane 118 30.330508 102.240398 ## SR Tendulkar 79 29.949367 101.651959 ## F du Plessis 65 29.415385 112.462114 ## Q de Kock 51 29.333333 110.973836 ## SS Iyer 47 29.170213 102.144222 ## G Gambhir 155 28.741935 103.997558 ## 2b. Ranking of bowlers in IPL T20 import yorkpy.analytics as yka IPLT20RankBowling=yka.rankIPLT20Bowling("C:\\software\\cricket-package\\yorkpyPkg\\data\\ipldata") IPLT20RankBowling.head(15) ## matches wicket_mean econrate_mean ## bowler ## SL Malinga 122 1.540984 7.173361 ## Imran Tahir 43 1.465116 8.155039 ## A Nehra 88 1.375000 7.923295 ## MJ McClenaghan 56 1.339286 8.638393 ## Rashid Khan 46 1.304348 6.543478 ## Sandeep Sharma 79 1.303797 7.860759 ## MM Patel 63 1.301587 7.530423 ## DJ Bravo 131 1.282443 8.458333 ## M Morkel 70 1.257143 7.760714 ## SP Narine 109 1.256881 6.747706 ## YS Chahal 83 1.228916 8.103659 ## R Vinay Kumar 104 1.221154 8.556090 ## RP Singh 82 1.219512 8.149390 ## CH Morris 52 1.211538 7.854167 ## B Kumar 117 1.205128 7.536325 ## 3. Natwest T20 The steps to do before ranking for Natwest T20 matches are 1. Download Natwest T20 zip file from Cricsheet NTB T20 2. Unzip the file. This will create a folder with yaml files import yorkpy.analytics as yka #yka.convertAllYaml2PandasDataframesT20("../ntb","../ntbdata") This above step will convert the yaml files into CSV files in the /ntbdata folder. Now do the ranking as below ## 3a. Ranking of NTB batsmen import yorkpy.analytics as yka NTBT20RankBatting=yka.rankNTBT20Batting("C:\\software\\cricket-package\\yorkpyPkg\\data\\ntbdata") NTBT20RankBatting.head(15) ## matches runs_mean SR_mean ## batsman ## Babar Azam 13 44.461538 121.268809 ## T Banton 13 42.230769 139.376274 ## JJ Roy 12 41.250000 142.182147 ## DJM Short 12 40.250000 131.182294 ## AN Petersen 12 37.916667 132.522727 ## IR Bell 13 37.615385 130.104721 ## M Klinger 26 35.346154 112.682922 ## EJG Morgan 16 35.062500 129.817650 ## AJ Finch 19 34.578947 137.093465 ## MH Wessels 26 33.884615 116.300969 ## S Steel 11 33.545455 140.118207 ## DJ Bell-Drummond 21 33.142857 108.566309 ## Ashar Zaidi 11 33.000000 178.553331 ## DJ Malan 26 33.000000 120.127202 ## T Kohler-Cadmore 23 32.956522 112.493019 ## 3b. Ranking of NTB bowlers import yorkpy.analytics as yka NTBT20RankBowling=yka.rankNTBT20Bowling("C:\\software\\cricket-package\\yorkpyPkg\\data\\ntbdata") NTBT20RankBowling.head(15) ## matches wicket_mean econrate_mean ## bowler ## MW Parkinson 11 2.000000 7.628788 ## HF Gurney 23 1.956522 8.831884 ## GR Napier 12 1.916667 8.694444 ## R Rampaul 19 1.736842 7.131579 ## P Coughlin 11 1.727273 8.909091 ## AJ Tye 26 1.692308 8.227564 ## GC Viljoen 12 1.666667 7.708333 ## BAC Howell 21 1.666667 6.857143 ## BW Sanderson 12 1.583333 7.902778 ## KJ Abbott 14 1.571429 9.398810 ## JE Taylor 13 1.538462 9.839744 ## JDS Neesham 12 1.500000 10.812500 ## MJ Potts 12 1.500000 8.486111 ## TT Bresnan 21 1.476190 8.817460 ## T van der Gugten 13 1.461538 7.211538 ## 4. Big Bash Leagure (BBL) T20 The steps to do before ranking for BBL T20 matches are 1. Download BBL T20 zip file from Cricsheet BBL T20 2. Unzip the file. This will create a folder with yaml files import yorkpy.analytics as yka #yka.convertAllYaml2PandasDataframesT20("../bbl","../bbldata") This above step will convert the yaml files into CSV files in the /bbldata folder. Now do the ranking as below ## 4a. Ranking of BBL batsmen import yorkpy.analytics as yka BBLT20RankBatting=yka.rankBBLT20Batting("C:\\software\\cricket-package\\yorkpyPkg\\data\\bbldata") BBLT20RankBatting.head(15) ## matches runs_mean SR_mean ## batsman ## DJM Short 43 40.883721 118.773047 ## SE Marsh 47 39.148936 113.616053 ## AJ Finch 62 36.306452 120.271231 ## AT Carey 37 34.945946 120.125341 ## UT Khawaja 41 31.268293 107.355655 ## CA Lynn 74 31.162162 121.746578 ## MS Wade 46 30.782609 120.310081 ## TM Head 45 30.000000 126.769564 ## MEK Hussey 23 29.173913 109.492934 ## BJ Hodge 29 29.000000 124.438040 ## BR Dunk 39 28.230769 106.149913 ## AD Hales 31 27.161290 117.678008 ## BB McCullum 34 27.058824 115.486392 ## GJ Bailey 57 27.000000 121.159220 ## MR Marsh 47 26.510638 114.994909 ## 4b. Ranking of BBL bowlers import yorkpy.analytics as yka BBLT20RankBowling=yka.rankBBLT20Bowling("C:\\software\\cricket-package\\yorkpyPkg\\data\\bbldata") BBLT20RankBowling.head(15) ## matches wicket_mean econrate_mean ## bowler ## Yasir Arafat 15 2.000000 7.587778 ## CH Morris 15 1.733333 8.572222 ## TK Curran 27 1.629630 8.716049 ## TT Bresnan 13 1.615385 8.775641 ## JR Hazlewood 18 1.555556 7.361111 ## CJ McKay 15 1.533333 8.555556 ## DR Sams 36 1.527778 8.581019 ## AC McDermott 14 1.500000 9.166667 ## JP Faulkner 20 1.500000 8.345833 ## SP Narine 12 1.500000 7.395833 ## AJ Tye 51 1.490196 8.101307 ## M Kelly 21 1.476190 8.908730 ## SA Abbott 73 1.438356 8.737443 ## B Laughlin 82 1.426829 8.332317 ## SW Tait 31 1.419355 8.895161 ## Conclusion You should be able to now rank players in the above formats as new data is added to Cricsheet. yorkpy can also be used for other leagues which follow the Cricsheet format. To see all posts click Index of posts # yorkpy takes a hat-trick, bowls out Intl. T20s, BBL and Natwest T20!!! “Dear, dear! How queer everything is to-day! And yesterday things went on just as usual. I wonder if I’ve been changed in the night? Let me think: was I the same when I got up this morning? I almost think I can remember feeling a little different. But if I’m not the same, the next question is ’Who in the world am I? Ah, that’s the great puzzle!”  Alice's adventures in Wonderland, Lewis Carroll ## 1. Introduction In this post, yorkpy clean bowls the following T20 formats namely International T20s, Big Bash League and Natwest T20 Blast. I take yorkpy on a spin through these T20 leagues. In the post below,I choose a random set of about 10-12 of the overall 63 functions that yorkpy has, and execute them for each of the different T20 leagues – Intl T20s, BBL and Natwest T20s. yorkpy, is the python avatar of my R package yorkr, see Introducing cricket package yorkr: Part 1- Beaten by sheer pace! There were a couple of new functions that needed to be added for each of the T20 leagues – Intl T20, BBL and Natwest T20 to take into account the different teams in each of these leagues. Further some bugs were also ironed out in tje latest version of yorkpy. yorkpy uses data from Cricsheet . The match data is in the form of YAML files. yorkpy converts these YAML files to dataframes. YAML files are very detailed and include a ball-by-ball account of the match. – You can clone/fork the latest code for yorkpy from github yorkpy – This post has also been published in RPubs at yorkpy takes a hat-trick – You can download the PDF version of this post at yorkpy takes a hat-trick The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data) yorkpy has the following 4 main classes of functions ### A.Functions analyzing individual T20 match (Class 1) This was demonstrated in Pitching yorkpy . short of good length to IPL – Part 1 The functions deal with individual T20 matches. The functions are 1. convertYaml2PandasDataframeT20() 2. convertAllYaml2PandasDataframesT20() 3. teamBattingScorecardMatch() 4. teamBatsmenPartnershipMatch() 5. teamBatsmenVsBowlersMatch() 6. teamBowlingScorecardMatch() 7. teamBowlingWicketKindMatch() 8. teamBowlingWicketRunsMatch() 9. teamBowlingWicketMatch() 10. teamBowlersVsBatsmenMatch() 11. matchWormChart() ### B. Functions that analyze all matches between 2 T20 teams (Class 2 Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 included functions that analyze head-to-head confrontation between any 2 T20 teams The functions are 1. getAllMatchesBetweenTeams() 2. saveAllMatchesBetween2IPLTeams() 3. getAllMatchesBetweenTeams() 4. saveAllMatchesBetween2IPLTeams() 5. teamBatsmenPartnershiOppnAllMatches() 6. teamBatsmenPartnershipOppnAllMatchesChart() 7. teamBatsmenVsBowlersOppnAllMatches() 8. teamBattingScorecardOppnAllMatches() 9. teamBowlingScorecardOppnAllMatches() 10. teamBowlingWicketKindOppositionAllMatches() 11. teamBowlersVsBatsmenOppnAllMatches() 12. plotWinLossBetweenTeams() 13. plotWinsByRunOrWickets() 23.plotWinsbyTossDecision() ### C. Functions that analyze the performance of a T20 team against all other teams (Class 3) The post Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 is based on Class C set of functions shown below 1. getAllMatchesAllOpposition() 2. saveAllMatchesAllOppositionIPLT20(dir1) 3. getAllMatchesAllOpposition() 4. saveAllMatchesAllOppositionIPLT20() 5. teamBatsmenPartnershiAllOppnAllMatches() 6. teamBatsmenPartnershipAllOppnAllMatchesChart() 7. teamBatsmenVsBowlersAllOppnAllMatches() 8. teamBattingScorecardAllOppnAllMatches() 9. teamBowlingScorecardAllOppnAllMatches() 10. teamBowlingWicketKindAllOppnAllMatches() 11. teamBowlersVsBatsmenAllOppnAllMatches() 12. plotWinLossByTeamAllOpposition() 13. plotWinsByRunOrWicketsAllOpposition() 14. plotWinsbyTossDecisionAllOpposition() ### D. Functions that analyze performances of T20 batsmen and bowlers (Class 4) These set of functions analyze individual batsmen and bowlers and have been used in Pitching yorkpy . in the block hole – Part 4 The functions are 1. getTeamBattingDetails() 2. getBatsmanDetails() 3. batsmanRunsVsDeliveries() 4. batsmanFoursSixes() 5. batsmanDismissals() 6. batsmanRunsVsStrikeRate() 7. batsmanMovingAverage() 8. batsmanCumulativeAverageRuns() 9. batsmanCumulativeStrikeRate() 10. batsmanRunsAgainstOpposition() 11. batsmanRunsVenue 12. getTeamBowlingDetails() 13. getBowlerWicketDetails() 14. bowlerMeanEconomyRate() 15. bowlerMeanRunsConceded() 16. bowlerMovingAverage() 17. bowlerCumulativeAvgWickets() 18. bowlerCumulativeAvgEconRate() 19. bowlerWicketPlot() 20. bowlerWicketsAgainstOpposition() 21. bowlerWicketsVenue() Additional new functions were added to handle Intl T20s, Big Bash League and Natwest T20 Blast, since the teams are different. They are 59. saveAllMatchesBetween2IntlT20s() 60. saveAllMatchesAllOppositionIntlT20() 61. saveAllMatchesBetween2BBLTeams() 62 saveAllMatchesAllOppositionBBLT20() 63. saveAllMatchesBetween2NWBTeams() 64. saveAllMatchesAllOppositionNWBT20() All other functions can be used as is! You can get the help of any function in yorkpy using import yorkpy.analytics as yka help(yka.teamBatsmenPartnershiOppnAllMatches) ## Help on function teamBatsmenPartnershiOppnAllMatches in module yorkpy.analytics: ## ## teamBatsmenPartnershiOppnAllMatches(matches, theTeam, report='summary', top=5) ## Team batting partnership against a opposition all IPL matches ## ## Description ## ## This function computes the performance of batsmen against all bowlers of an oppositions in ## all matches. This function returns a dataframe ## ## Usage ## ## teamBatsmenPartnershiOppnAllMatches(matches,theTeam,report="summary") ## Arguments ## ## matches ## All the matches of the team against the oppositions ## theTeam ## The team for which the the batting partnerships are sought ## report ## If the report="summary" then the list of top batsmen with the highest partnerships ## is displayed. If report="detailed" then the detailed break up of partnership is returned ## as a dataframe ## top ## The number of players to be displayed from the top ## Value ## ## partnerships The data frame of the partnerships ## ## Note ## ## Maintainer: Tinniam V Ganesh tvganesh.85@gmail.com ## ## Author(s) ## ## Tinniam V Ganesh ## ## References ## ## http://cricsheet.org/ ## https://gigadom.wordpress.com/ ## ## ## See Also ## ## teamBatsmenVsBowlersOppnAllMatchesPlot ## teamBatsmenPartnershipOppnAllMatchesChart As I mentioned above I will be randomly choosing a set of 12 functions from Class 1,2,3,4 for each of the T20 leagues (Intl T20, BBL and NWB T20) for analysis ## 2. International T20s The following functions were added for handling Intl. T20s 1. saveAllMatchesBetween2IntlT20s() 2. saveAllMatchesAllOppositionIntlT20() To handle the countries in Intl. T20s below Afghanistan, Australia, Bangladesh, Bermuda, Canada, England,Hong Kong,India, Ireland, Kenya, Nepal, Netherlands, “New Zealand, Oman,Pakistan,Scotland,South Africa, Sri Lanka, United Arab Emirates,West Indies, Zimbabwe import os #os.chdir('C:\\software\\cricket-package\\yorkpyT20\\t20s') #import yorkpy.analytics as yka #1. Convert all YAML files to dataframes and CSV #yka.convertAllYaml2PandasDataframesT20(".", "..\\data1") #dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches' #2. Save all matches between 2 T20 teams #yka.saveAllMatchesBetween2IntlT20s(dir1) #3. Save all matches between a T20 team and all other teams #dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches' #yka.saveAllMatchesAllOppositionIntlT20(dir1) #4. Get batting details #dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches #yka.getTeamBattingDetails("Afghanistan",dir=dir1, save=True) #yka.getTeamBattingDetails("Australia",dir=dir1,save=True) #yka.getTeamBattingDetails("Bangladesh",dir=dir1,save=True) #... #5. Get bowling details #dir1='C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches #yka.getTeamBowlingDetails("Afghanistan",dir=dir1, save=True) #yka.getTeamBowlingDetails("Australia",dir=dir1,save=True) #yka.getTeamBowlingDetails("Bangladesh",dir=dir1,save=True) # ... Once the data is converted you can use the yorkpy functions. The data has been converted for Intl T20 and is available at Github at IntlT20 To use the yorkpy functions for a new league we need to initial convert the YAML files into appropriate format for processing by yorkpy functions This will create the necessary files which are are used in the functions below ### 2.2 2.1 Intl. T20 – Team score card (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches" path=os.path.join(dir1,".\\India-New Zealand-2007-09-16.csv") ind_nz=pd.read_csv(path) scorecard,extras=yka.teamBattingScorecardMatch(ind_nz,"India") print(scorecard) ## batsman runs balls 4s 6s SR ## 0 G Gambhir 51 34 5 2 150.000000 ## 1 V Sehwag 40 18 6 2 222.222222 ## 2 RV Uthappa 0 2 0 0 0.000000 ## 3 MS Dhoni 24 20 2 0 120.000000 ## 4 Yuvraj Singh 5 7 0 0 71.428571 ## 5 KD Karthik 17 12 3 0 141.666667 ## 6 IK Pathan 11 10 2 0 110.000000 ## 7 AB Agarkar 1 2 0 0 50.000000 ## 8 Harbhajan Singh 7 6 1 0 116.666667 ## 9 S Sreesanth 19 10 4 0 190.000000 ## 10 RP Singh 1 1 0 0 100.000000 print(extras) ## total wides noballs legbyes byes penalty extras ## 0 370 6 0 8 0 0 14 ### 2.2 Intl. T20 -Team batsmen partnership (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches" path=os.path.join(dir1,".\\South Africa-Australia-2009-03-27.csv") sa_aus=pd.read_csv(path) yka.teamBatsmenPartnershipMatch(sa_aus,'Australia','New Zealand',plot=True) ### 2.3 Intl. T20 -Team bowling scorecard match (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches" path=os.path.join(dir1,".\\Sri Lanka-West Indies-2012-09-28.csv") sl_wi=pd.read_csv(path) a=yka.teamBowlingScorecardMatch(sl_wi,'Sri Lanka') print(a) ## bowler overs runs maidens wicket econrate ## 0 A Mohammed 2 13 0 0 6.5 ## 1 SA Campbelle 1 8 0 1 8.0 ## 2 SC Selman 1 3 0 0 3.0 ## 3 SF Daley 2 5 0 1 2.5 ## 4 SR Taylor 2 4 0 1 2.0 ## 5 TD Smartt 2 17 0 0 8.5 ### 2.4 Intl. T20 -Match Worm chart (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-Matches" path=os.path.join(dir1,".\\England-India-2012-09-29.csv") eng_ind=pd.read_csv(path) yka.matchWormChart(eng_ind,"England", "India") path=os.path.join(dir1,".\\Bangladesh-Ireland-2015-12-05.csv") ban_ire=pd.read_csv(path) yka.matchWormChart(ban_ire,"Bangladesh", "Ireland") ### 2.5 Intl. T20 -Team Batting partnerships all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams" path=os.path.join(dir1,"India-England-allMatches.csv") dc_mi_matches = pd.read_csv(path) theTeam='India' m=yka.teamBatsmenPartnershiOppnAllMatches(dc_mi_matches,theTeam,report="detailed", top=4) print(m) ## batsman totalPartnershipRuns non_striker partnershipRuns ## 0 SK Raina 265 G Gambhir 2 ## 1 SK Raina 265 KL Rahul 40 ## 2 SK Raina 265 MK Tiwary 24 ## 3 SK Raina 265 MS Dhoni 124 ## 4 SK Raina 265 P Kumar 0 ## 5 SK Raina 265 PP Chawla 4 ## 6 SK Raina 265 R Ashwin 1 ## 7 SK Raina 265 RG Sharma 16 ## 8 SK Raina 265 V Kohli 47 ## 9 SK Raina 265 Yuvraj Singh 7 ## 10 MS Dhoni 264 A Mishra 1 ## 11 MS Dhoni 264 AT Rayudu 18 ## 12 MS Dhoni 264 HH Pandya 8 ## 13 MS Dhoni 264 IK Pathan 2 ## 14 MS Dhoni 264 JJ Bumrah 2 ## 15 MS Dhoni 264 MK Pandey 3 ## 16 MS Dhoni 264 Parvez Rasool 21 ## 17 MS Dhoni 264 R Ashwin 11 ## 18 MS Dhoni 264 RA Jadeja 11 ## 19 MS Dhoni 264 RG Sharma 9 ## 20 MS Dhoni 264 RR Pant 6 ## 21 MS Dhoni 264 RV Uthappa 5 ## 22 MS Dhoni 264 SK Raina 98 ## 23 MS Dhoni 264 YK Pathan 36 ## 24 MS Dhoni 264 Yuvraj Singh 33 ## 25 V Kohli 236 AM Rahane 3 ## 26 V Kohli 236 G Gambhir 78 ## 27 V Kohli 236 KL Rahul 46 ## 28 V Kohli 236 RG Sharma 2 ## 29 V Kohli 236 RV Uthappa 4 ## 30 V Kohli 236 S Dhawan 45 ## 31 V Kohli 236 SK Raina 48 ## 32 V Kohli 236 Yuvraj Singh 10 ## 33 M Raj 176 A Sharma 2 ## 34 M Raj 176 H Kaur 18 ## 35 M Raj 176 J Goswami 6 ## 36 M Raj 176 KV Jain 5 ## 37 M Raj 176 L Kumari 5 ## 38 M Raj 176 N Niranjana 3 ## 39 M Raj 176 N Tanwar 17 ## 40 M Raj 176 PG Raut 41 ## 41 M Raj 176 R Malhotra 5 ## 42 M Raj 176 S Mandhana 8 ## 43 M Raj 176 S Naik 10 ## 44 M Raj 176 S Pandey 19 ## 45 M Raj 176 SK Naidu 37 ### 2.6 Intl. T20 -Team Batsmen vs Bowlers all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Ireland-Netherlands-allMatches.csv") ire_nl_matches = pd.read_csv(path) yka.teamBatsmenVsBowlersOppnAllMatches(ire_nl_matches,'Ireland',"Netherlands",plot=True,top=3,runsScored=10) ### 2.7 Intl. T20 -Team Bowling scorecard all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\IntlT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Bangladesh-Nepal-allMatches.csv") bang_nep_matches = pd.read_csv(path) scorecard=yka.teamBowlingScorecardOppnAllMatches(bang_nep_matches,'Bangladesh',"Nepal") print(scorecard) ## bowler overs runs maidens wicket econrate ## 0 B Regmi 3 14 0 1 4.666667 ## 3 SP Gauchan 4 40 0 1 10.000000 ## 1 JK Mukhiya 2 16 0 0 8.000000 ## 2 P Khadka 3 23 0 0 7.666667 ## 4 Sagar Pun 1 16 0 0 16.000000 ## 5 Sompal Kami 2 21 0 0 10.500000 ### 2.8 Intl. T20 -Team Batsmen vs Bowlers all Oppositions (Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\" path=os.path.join(dir1,"Australia-allMatchesAllOpposition.csv") aus_matches = pd.read_csv(path) yka.teamBatsmenVsBowlersAllOppnAllMatches(aus_matches,"Australia",plot=True,top=3,runsScored=40) ### 2.9 Intl. T20 -Wins vs Losses of a team against all other teams (Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-allMatchesAllOpposition\\" path=os.path.join(dir1,"South Africa-allMatchesAllOpposition.csv") sa_matches = pd.read_csv(path) team1='South Africa' yka.plotWinLossByTeamAllOpposition(sa_matches,team1,plot="detailed") ### 2.10 Intl. T20 -Batsmen analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\" # Rohit Sharma name="RG Sharma" team='India' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) # MJ Guptill name="MJ Guptill" team='New Zealand' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) ### 2.11 Intl. T20 -Bowler analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyT20\\\IntlT20-BattingBowlingDetails\\" # Shakib Al Hasan name="Shakib Al Hasan" team='Bangladesh' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) # Rashid Khan name="SL Malinga" team='Sri Lanka' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) ## 3. Big Bash League The following functions for added to handle BBL teams 1. saveAllMatchesBetween2BBLTeams() 2. saveAllMatchesAllOppositionBBLT20 The BBL teams are included are Adelaide Strikers, Brisbane Heat, Hobart Hurricanes, Melbourne Renegades, Perth Scorchers, Sydney Sixers, Sydney Thunder To use the yorkpy functions first the YAML files have to be converted into pandas dataframe and then saved as CSV as shown below import os import yorkpy.analytics as yka os.chdir('C:\\software\\cricket-package\\yorkpyBBL\\bbl') #1. Convert all YAML files to dataframes and save as CSV #yka.convertAllYaml2PandasDataframesT20(".", "..\\BBLT20-Matches") #2. Save all matches between 2 BBL teams dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches' #yka.saveAllMatchesBetween2BBLTeams(dir1) #3. Save T20 matches between a BBL team and all other teams dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches' #yka.saveAllMatchesAllOppositionBBLT20(dir1) #4. Get the batting details dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches' #yka.getTeamBattingDetails("Adelaide Strikers",dir=dir1, save=True) #yka.getTeamBattingDetails("Brisbane Heat",dir=dir1,save=True) #yka.getTeamBattingDetails("Hobart Hurricanes",dir=dir1,save=True) #... # Get the bowling details dir1='C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches' #yka.getTeamBowlingDetails("Adelaide Strikers",dir=dir1, save=True) #yka.getTeamBowlingDetails("Brisbane Heat",dir=dir1,save=True) #yka.getTeamBowlingDetails("Hobart Hurricanes",dir=dir1,save=True) #... The functions below perform analysis on the generated files from above. The YAML files have already been converted and are available at Github at BBL ### 3.1 Big Bash League – Team score card (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches" path=os.path.join(dir1,".\\Adelaide Strikers-Brisbane Heat-2012-12-13.csv") as_bh=pd.read_csv(path) scorecard,extras=yka.teamBattingScorecardMatch(as_bh,"Brisbane Heat") print(scorecard) ## batsman runs balls 4s 6s SR ## 0 LA Pomersbach 65 42 8 2 154.761905 ## 1 JR Hopes 1 2 0 0 50.000000 ## 2 JA Burns 37 31 2 2 119.354839 ## 3 DT Christian 12 15 0 0 80.000000 ## 4 NLTC Perera 12 4 0 2 300.000000 ## 5 CA Lynn 19 18 1 1 105.555556 ## 6 BCJ Cutting 13 5 0 2 260.000000 ## 7 PJ Forrest 12 8 0 1 150.000000 ## 8 CD Hartley 5 2 1 0 250.000000 print(extras) ## total wides noballs legbyes byes penalty extras ## 0 371 10 2 5 0 0 17 ### 3.2 Big Bash League -Team batsmen vs Bowlers (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches" path=os.path.join(dir1,".\\Hobart Hurricanes-Melbourne Renegades-2012-01-18.csv") hh_mr=pd.read_csv(path) yka.teamBatsmenVsBowlersMatch(hh_mr,'Hobart Hurricanes','Melbourne Renegades',plot=True) ### 3.3 Big Bash League -Team bowling scorecard match (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches" path=os.path.join(dir1,".\\Melbourne Stars-Sydney Thunder-2016-01-24.csv") ms_st=pd.read_csv(path) a=yka.teamBowlingScorecardMatch(ms_st,'Sydney Thunder') print(a) ## bowler overs runs maidens wicket econrate ## 0 A Zampa 4 32 0 2 8.000000 ## 1 BW Hilfenhaus 2 21 0 0 10.500000 ## 2 DJ Hussey 1 9 0 1 9.000000 ## 3 DJ Worrall 3 42 0 0 14.000000 ## 4 EP Gulbis 2 19 0 0 9.500000 ## 5 MA Beer 3 25 0 1 8.333333 ## 6 MP Stoinis 4 30 0 3 7.500000 ### 3.4 Big Bash League – Match Worm chart (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-Matches" path=os.path.join(dir1,".\\Sydney Sixers-Melbourne Stars-2011-12-27.csv") ss_ms=pd.read_csv(path) yka.matchWormChart(ss_ms,"Melbourne Stars", "Sydney Sixers") path=os.path.join(dir1,".\\Hobart Hurricanes-Brisbane Heat-2015-01-02.csv") hh_bh=pd.read_csv(path) yka.matchWormChart(hh_bh,"Hobart Hurricanes", "Brisbane Heat") ### 3.5 Big Bash League -Team Batting partnerships all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Brisbane Heat-Adelaide Strikers-allMatches.csv") bh_as_matches = pd.read_csv(path) yka.teamBatsmenPartnershipOppnAllMatchesChart(bh_as_matches,"Brisbane Heat","Adelaide Strikers",plot=True, top=4, partnershipRuns=20) ### 3.6 Big Bash League -Team Bowling wicket kind all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Sydney Sixers-Perth Scorchers-allMatches.csv") ss_ps_matches = pd.read_csv(path) yka.teamBowlingWicketKindOppositionAllMatches(ss_ps_matches,'Perth Scorchers','Sydney Sixers',plot=True,top=5,wickets=1) ### 3.7 Big Bash League -Team Bowling scorecard all teams (Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition" path=os.path.join(dir1,"Hobart Hurricanes-allMatchesAllOpposition.csv") hh_matches = pd.read_csv(path) scorecard=yka.teamBowlingScorecardAllOppnAllMatches(hh_matches,"Hobart Hurricanes") print(scorecard) ## bowler overs runs maidens wicket econrate ## 16 B Lee 20 132 0 9 6.600000 ## 30 CJ McKay 13 110 0 9 8.461538 ## 88 NJ Rimmington 16 103 1 9 6.437500 ## 67 JW Hastings 15 88 0 8 5.866667 ## 63 JP Faulkner 15 146 0 7 9.733333 ## 27 CJ Gannon 17 147 1 7 8.647059 ## 93 NM Lyon 8 51 0 7 6.375000 ## 20 BCJ Cutting 27 226 0 7 8.370370 ## 48 GB Hogg 22 167 0 7 7.590909 ## 107 SM Boland 12 96 0 7 8.000000 ## 15 B Laughlin 13 99 0 7 7.615385 ## 87 MT Steketee 15 134 0 5 8.933333 ## 121 Yasir Arafat 9 48 0 4 5.333333 ## 96 PJ Cummins 8 83 0 4 10.375000 ## 46 Fawad Ahmed 11 64 0 4 5.818182 ## 76 MA Beer 12 63 0 4 5.250000 ## 108 SNJ O'Keefe 15 104 0 4 6.933333 ## 75 M Muralitharan 7 31 0 4 4.428571 ## 10 AJ Tye 16 127 0 4 7.937500 ## 52 J Botha 13 94 0 4 7.230769 ## 56 JL Pattinson 7 71 0 4 10.142857 ## 62 JP Behrendorff 16 119 0 4 7.437500 ## 3 AC Agar 12 87 0 4 7.250000 ## 24 BM Edmondson 4 40 0 4 10.000000 ## 37 DJ Hussey 8 47 0 3 5.875000 ## 49 GJ Maxwell 8 65 0 3 8.125000 ## 84 MN Samuels 4 22 0 3 5.500000 ## 81 MG Neser 5 54 0 3 10.800000 ## 44 DT Christian 9 114 0 3 12.666667 ## 50 GS Sandhu 7 51 0 3 7.285714 ## .. ... ... ... ... ... ... ## 43 DP Nannes 8 58 0 1 7.250000 ## 51 IA Moran 4 25 0 1 6.250000 ## 55 JK Lalor 10 82 0 1 8.200000 ## 54 JH Kallis 3 18 0 1 6.000000 ## 73 LR Butterworth 4 25 0 1 6.250000 ## 4 AC McDermott 2 28 0 1 14.000000 ## 70 LA Doran 4 38 0 1 9.500000 ## 69 KW Richardson 6 44 0 1 7.333333 ## 119 WD Sheridan 2 6 0 0 3.000000 ## 2 AB McDonald 1 15 0 0 15.000000 ## 115 TD Andrews 3 23 0 0 7.666667 ## 11 AK Heal 4 33 0 0 8.250000 ## 7 AD Russell 4 40 0 0 10.000000 ## 8 AJ Finch 2 15 0 0 7.500000 ## 9 AJ Turner 3 28 0 0 9.333333 ## 60 JM Mennie 1 20 0 0 20.000000 ## 18 BA Stokes 1 9 0 0 9.000000 ## 26 CH Gayle 1 16 0 0 16.000000 ## 28 CJ Green 4 44 0 0 11.000000 ## 95 PD Collingwood 2 20 0 0 10.000000 ## 31 CJ Simmons 4 21 0 0 5.250000 ## 59 JM Holland 3 34 0 0 11.333333 ## 36 DJ Bravo 6 64 0 0 10.666667 ## 38 DJ Pattinson 2 16 0 0 8.000000 ## 41 DJ Worrall 8 90 0 0 11.250000 ## 72 LN O'Connor 6 56 0 0 9.333333 ## 71 LJ Wright 3 27 0 0 9.000000 ## 68 KA Pollard 1 7 0 0 7.000000 ## 58 JM Herrick 4 23 0 0 5.750000 ## 92 NM Hauritz 5 42 0 0 8.400000 ## ## [122 rows x 6 columns] ### 3.8 Big Bash League -Plot wins vs losses against all teams(Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition" path=os.path.join(dir1,"Sydney Sixers-allMatchesAllOpposition.csv") ss_matches = pd.read_csv(path) yka.plotWinLossByTeamAllOpposition(ss_matches,'Sydney Sixers') ### 3.9 Big Bash League -Wins vs losses by toss decision (Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-allMatchesAllOpposition" path=os.path.join(dir1,"Adelaide Strikers-allMatchesAllOpposition.csv") as_matches = pd.read_csv(path) yka.plotWinsByRunOrWicketsAllOpposition(as_matches,'Adelaide Strikers') ### 3.10 Big Bash League -Batsmen Analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails" # CA Lynn name="CA Lynn" team='Brisbane Heat' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) # UT Khawaja name="UT Khawaja" team='Sydney Thunder' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) ### 3.11Big Bash League – Bowler analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyBBL\\BBLT20-BattingBowlingDetails" # CJ McKay name="CJ McKay" team='Sydney Thunder' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) # AU Rashid name="AU Rashid" team='Adelaide Strikers' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) ## 4. Natwest T20 Blast The following functions for added to handle Natwest T20 teams 1. saveAllMatchesBetween2NWBTeams() 2. saveAllMatchesAllOppositionNWBT20 The Natwest teams are Derbyshire, Durham, Essex, Glamorgan, Gloucestershire, Hampshire, Kent,Lancashire, Leicestershire, Middlesex,Northamptonshire, Nottinghamshire, Somerset, Surrey, Sussex, Warwickshire, Worcestershire,Yorkshire In order to perform analysis with yorkpy, the YAML data has to be converted to pandas dataframe and saves as CSV as shown #import os #import yorkpy.analytics as yka #os.chdir('C:\\software\\cricket-package\\yorkpyNWB\\nwb') #1. Convert YAML to dataframes and save as CSV #yka.convertAllYaml2PandasDataframesT20(".", "..\\NWBT20-Matches") #2. Save all matches between 2 NWBT20 teams #dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches' #yka.saveAllMatchesBetween2NWBTeams(dir1) #3. Save all matches between a NWB T20 team and all other teams #dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches' #yka.saveAllMatchesAllOppositionNWBT20(dir1) #4. Compute the batting details dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches' #yka.getTeamBattingDetails("Derbyshire",dir=dir1, save=True) #yka.getTeamBattingDetails("Durham",dir=dir1,save=True) #yka.getTeamBattingDetails("Essex",dir=dir1,save=True) #.. #5. Compute bowling details dir1='C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-Matches' #yka.getTeamBowlingDetails("Derbyshire",dir=dir1, save=True) #yka.getTeamBowlingDetails("Durham",dir=dir1,save=True) #yka.getTeamBowlingDetails("Essex",dir=dir1,save=True) #... Once the data is converted all yorkpy functions can be used. This has already been done and is available at github NWB ### 4.1 Natwest T20 Blast – Team score card (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches" path=os.path.join(dir1,".\\Durham-Yorkshire-2016-08-20.csv") d_y=pd.read_csv(path) scorecard,extras=yka.teamBattingScorecardMatch(d_y,"Durham") print(scorecard) ## batsman runs balls 4s 6s SR ## 0 MD Stoneman 25 20 4 0 125.000000 ## 1 KK Jennings 11 13 1 0 84.615385 ## 2 BA Stokes 56 37 4 3 151.351351 ## 3 MJ Richardson 29 23 4 1 126.086957 ## 4 JTA Burnham 17 15 1 1 113.333333 ## 5 RD Pringle 10 9 1 0 111.111111 ## 6 PD Collingwood 2 3 0 0 66.666667 ## 7 U Arshad 1 1 0 0 100.000000 print(extras) ## total wides noballs legbyes byes penalty extras ## 0 305 2 0 5 0 0 7 ### 4.2 Natwest T20 Blast -Team batsmen vs Bowlers (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches" path=os.path.join(dir1,".\\Derbyshire-Lancashire-2016-07-13.csv") d_l=pd.read_csv(path) yka.teamBatsmenVsBowlersMatch(d_l,'Lancashire','Derbyshire',plot=True) ### 4.3 Natwest T20 Blast -Team bowling scorecard match (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches" path=os.path.join(dir1,".\\Essex-Surrey-2016-05-20.csv") e_s=pd.read_csv(path) a=yka.teamBowlingScorecardMatch(e_s,'Essex') print(a) ## bowler overs runs maidens wicket econrate ## 0 Azhar Mahmood 3 38 0 4 12.666667 ## 1 GJ Batty 4 33 0 1 8.250000 ## 2 JE Burke 1 18 0 0 18.000000 ## 3 MW Pillans 3 28 0 0 9.333333 ## 4 SM Curran 4 23 0 2 5.750000 ## 5 TK Curran 4 21 0 3 5.250000 ### 4.4 Natwest T20 Blast -Match Worm chart (Class 1) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\\yorkpyNWB\\NWBT20-Matches" path=os.path.join(dir1,".\\Gloucestershire-Glamorgan-2016-06-10.csv") ss_ms=pd.read_csv(path) yka.matchWormChart(ss_ms,"Gloucestershire", "Glamorgan") path=os.path.join(dir1,".\\Leicestershire-Northamptonshire-2016-05-20.csv") hh_bh=pd.read_csv(path) yka.matchWormChart(hh_bh,"Northamptonshire", "Leicestershire") ### 4.5 Natwest T20 Blast -Team Batting partnerships all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Hampshire-Sussex-allMatches.csv") h_s_matches = pd.read_csv(path) yka.teamBatsmenPartnershipOppnAllMatchesChart(h_s_matches,"Hampshire","Sussex",plot=True, top=4, partnershipRuns=10) ### 4.6 Natwest T20 Blast -Team Bowling wicket kind all matches 2 teams (Class 2) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesBetween2Teams" path=os.path.join(dir1,"Kent-Somerset-allMatches.csv") k_s_matches = pd.read_csv(path) yka.teamBowlersVsBatsmenOppnAllMatches(k_s_matches,'Kent','Somerset',plot=True, top=5,runsConceded=10) ### 4.7 Natwest T20 Blast -Team Bowling scorecard all teams (Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition" path=os.path.join(dir1,"Middlesex-allMatchesAllOpposition.csv") m_matches = pd.read_csv(path) scorecard=yka.teamBowlingScorecardAllOppnAllMatches(m_matches,"Middlesex") print(scorecard) ## bowler overs runs maidens wicket econrate ## 1 AJ Tye 8 75 0 6 9.375000 ## 5 BAC Howell 8 41 0 5 5.125000 ## 26 GR Napier 7 65 0 5 9.285714 ## 15 DI Stevens 4 31 0 4 7.750000 ## 19 DW Lawrence 6 37 0 4 6.166667 ## 32 JW Dernbach 4 33 0 3 8.250000 ## 7 BTJ Wheal 4 43 0 3 10.750000 ## 18 DR Briggs 4 24 0 3 6.000000 ## 50 RK Kleinveldt 4 24 0 3 6.000000 ## 46 R McLaren 7 59 0 3 8.428571 ## 47 R Rampaul 3 21 0 3 7.000000 ## 34 L Gregory 6 51 0 2 8.500000 ## 33 KMDN Kulasekara 2 24 0 2 12.000000 ## 40 MG Hogan 3 17 0 2 5.666667 ## 43 MTC Waller 4 31 0 2 7.750000 ## 49 RJ Gleeson 4 20 0 2 5.000000 ## 48 RE van der Merwe 5 24 0 2 4.800000 ## 51 RN ten Doeschate 4 32 0 2 8.000000 ## 53 S Prasanna 4 20 0 2 5.000000 ## 56 SW Tait 3 17 0 2 5.666667 ## 57 Shahid Afridi 8 55 0 2 6.875000 ## 59 T van der Gugten 3 13 1 2 4.333333 ## 64 TS Mills 3 34 0 2 11.333333 ## 65 WAT Beer 4 23 0 2 5.750000 ## 31 JH Davey 4 28 0 2 7.000000 ## 68 ZS Ansari 3 16 0 2 5.333333 ## 25 GM Andrew 3 19 0 2 6.333333 ## 23 GJ Batty 6 55 0 2 9.166667 ## 16 DJ Bravo 3 27 0 2 9.000000 ## 41 MR Quinn 6 65 0 1 10.833333 ## .. ... ... ... ... ... ... ## 24 GL van Buuren 7 49 0 1 7.000000 ## 37 MD Hunn 3 35 0 1 11.666667 ## 36 LC Norwell 6 62 0 1 10.333333 ## 29 JC Tredwell 4 35 0 1 8.750000 ## 35 LA Dawson 6 53 0 1 8.833333 ## 62 TL Best 4 51 0 0 12.750000 ## 58 T Westley 2 12 0 0 6.000000 ## 4 Azharullah 3 24 0 0 8.000000 ## 60 TD Groenewald 1 21 0 0 21.000000 ## 61 TK Curran 4 35 0 0 8.750000 ## 38 MD Taylor 3 30 0 0 10.000000 ## 30 JG Myburgh 1 5 0 0 5.000000 ## 8 C Overton 2 18 0 0 9.000000 ## 2 Ashar Zaidi 1 5 0 0 5.000000 ## 66 WR Smith 2 25 0 0 12.500000 ## 28 J Overton 2 24 0 0 12.000000 ## 6 BJ Taylor 1 6 0 0 6.000000 ## 22 GG White 4 31 0 0 7.750000 ## 55 SP Crook 1 9 0 0 9.000000 ## 39 ME Claydon 4 40 0 0 10.000000 ## 52 RS Bopara 4 32 0 0 8.000000 ## 10 CD Nash 2 19 0 0 9.500000 ## 11 CH Morris 4 36 0 0 9.000000 ## 12 DA Cosker 3 32 0 0 10.666667 ## 13 DA Griffiths 4 39 0 0 9.750000 ## 45 PD Trego 1 11 0 0 11.000000 ## 44 PA van Meekeren 2 19 0 0 9.500000 ## 42 MS Crane 2 25 0 0 12.500000 ## 20 FK Cowdrey 1 19 0 0 19.000000 ## 14 DD Masters 2 16 0 0 8.000000 ## ## [69 rows x 6 columns] ### 4.8 Natwest T20 Blast -Plot wins vs losses against all teams(Class 3) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-allMatchesAllOpposition" path=os.path.join(dir1,"Warwickshire-allMatchesAllOpposition.csv") w_matches = pd.read_csv(path) yka.plotWinLossByTeamAllOpposition(w_matches,'Warwickshire') ### 4.9 Natwest T20 Blast -Batsmen Analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails" # M Klinger name="M Klinger" team='Gloucestershire' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) # CA Ingram name="CA Ingram" team='Glamorgan' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) ### 4.11 Natwest T20 Blast -Bowler analysis (Class 4) import os import pandas as pd import yorkpy.analytics as yka dir1="C:\\software\\cricket-package\\yorkpyNWB\\NWBT20-BattingBowlingDetails" # BAC Howell name="BAC Howell" team='Gloucestershire' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) # GR Napier name="GR Napier" team='Essex' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) Note: yorkpy will work for all T20 leagues which are in YAML format as specified in Cricsheet. You can clone/fork the latest code for yorkpy from github yorkpy The data for IPL, Intl. T20, BBL and Natwest T20 have already been converted into pandas dataframes and saved as CSVs. You can download the converted files from Github at [allYorkpyT20Data])(https://github.com/tvganesh/allYorkpyT20Data) Conclusion This post shows the kind of detailed analysis that can be performed with yorkpy. In fact with all the converted data it should be possible to also train a Machine Learning model, which I will probably keep for another day. You could go ahead and use the data in other innovative ways. Do keep me posted if you do!! Important note: Do check out my other posts using yorkpy at yorkpy-posts Have fun with yorkpy!! To see all posts click Index of posts # Pitching yorkpy … in the block hole – Part 4 A good programmer is someone who always looks both ways before crossing a one-way street. Doug Linder There are two ways to write error-free programs; only the third one works. Alan J. Perlis In order to understand recursion, one must first understand recursion. Anonymous This is the fourth and final part of my Python package yorkpy. In this part yorkpy, the python avatar of my R package yorkr see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!, develops wings and is prepared for take-off. The yorkpy package uses data from Cricsheet You can clone/download the code at Github yorkpy This post has been published to RPubs at yorkpy-Part4 You can download this post as PDF at IPLT20-yorkpy-part4 You can download all the data used in this post and the previous post at yorkpyData This post is a continuation of the earlier posts on yorkpy 1. Pitching yorkpy . short of good length to IPL – Part 1 In this part I included functions that convert the yaml data of IPL matches into Pandas dataframe which are then saved as CSV. This part can perform analysis of individual IPL matches. Note The converted data is available at yorkpyData 2. Pitching yorkpy.on the middle and outside off-stump to IPL – Part 2 This part included functions to create a large data frame for head-to-head confrontation between any 2IPL teams says CSK-MI, DD-KKR etc, which can be saved as CSV. Analysis is then performed on these team-2-team confrontations. Note The converted data is available at yorkpyData 3. Pitching yorkpy.swinging away from the leg stump to IPL – Part 3 The 3rd part includes the performance of any IPL team against all other IPL teams. The data can also be saved as CSV.Note The converted data is available at yorkpyData Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below). This 4th and final part includes analysis of batting and bowling performances of any IPL player. The batting and bowling details for all teams have already been converted and are available at IPLT20-Batting-BowlingDetails This part includes the following new functions #### Batsman functions 1. batsmanRunsVsDeliveries 2. batsmanFoursSixes 3. batsmanDismissals 4. batsmanRunsVsStrikeRate 5. batsmanMovingAverage 6. batsmanCumulativeAverageRuns 7. batsmanCumulativeStrikeRate 8. batsmanRunsAgainstOpposition 9. batsmanRunsVenue #### Bowler functions 1. bowlerMeanEconomyRate 2. bowlerMeanRunsConceded 3. bowlerMovingAverage 4. bowlerCumulativeAvgWickets 5. bowlerCumulativeAvgEconRate 6. bowlerWicketPlot 7. bowlerWicketsAgainstOpposition 8. bowlerWicketsVenue # A. Batsman functions ### 1. Get IPL Team Batting details The function below gets the overall IPL team batting details based on the CSV files that were saved for IPL T20 matches. This is currently also available in Github at yorkpyData. The batting details of the IPL team in each match is created and a huge data frame is created by combining the batting details from each match. This can be saved as a csv file with name as for e.g. Delhi Daredevils-BattingDetails.csv. dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" #csk_details = yka.getTeamBattingDetails("Chennai Super Kings",dir=dir1, save=True) #dd_details = yka.getTeamBattingDetails("Delhi Daredevils",dir=dir1,save=True) #kkr_details = yka.getTeamBattingDetails("Kolkata Knight Riders",dir=dir1,save=True) ### 2. Get IPL batsman details This function is used to get the individual IPL T20 batting record for a the specified batsman of the team as in the functions below. For the batsmen functions below I have chosen Rishabh Pant, Kane Williamson and Ambati Rayudu for the analysis as they top the batting lists. You can choose any IPL batsmen for the analysis import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' rpant=yka.getBatsmanDetails(team,name,dir=dir1) ### 3 Batsman Runs vs Deliveries (in IPL matches) This functions plots the runs vs deliveries faced for batsman import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name)  # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsDeliveries(df,name) ### 4. Batsman fours and sixes (in IPL matches) This plots the fours, sixes and the total runs for a batsman import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanFoursSixes(df,name) ### 5. Batsman dismissals (in IPL matches) import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanDismissals(df,name) ### 6. Batsman Runs vs Strike Rate (in IPL matches) The plots below give the Runs vs Strike rate for batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVsStrikeRate(df,name) ### 7. Batsman Moving average of runs (in IPL matches) The plots below compute and plot the moving average of batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanMovingAverage(df,name) ### 8. Batsman Cumulative average of runs (in IPL matches) The functions below plot the cumulative average of the batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeAverageRuns(df,name) ### 9. Batsman Cumulative Strike Rate (in IPL matches) The functions below plot the cumulative strike rate of the batsmen import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanCumulativeStrikeRate(df,name) ### 10. Batsman performance against opposition (in IPL matches) The plots below show how the batsmen performed against other IPL teams import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsAgainstOpposition(df,name) ### 11. Batsman performance at different venues (in IPL matches) The plots below show how the batsmen performed at different venues import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Rishabh Pant name="RR Pant" team='Delhi Daredevils' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) # 2. Kane Williamson dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="KS Williamson" team='Sunrisers Hyderabad' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) #3. Ambati Rayudu name="AT Rayudu" team='Mumbai Indians' df=yka.getBatsmanDetails(team,name,dir=dir1) yka.batsmanRunsVenue(df,name) ## B. Bowler functions ### 12. Get bowling details in IPL matches The function below gets the overall team IPL T20 bowling details based on the RData file available in IPL T20 matches. This is currently also available in Github at yorkpyData. The IPL T20 bowling details of the IPL team in each match is created, and a huge data frame is created by stacking the individual dataframes. This can be saved as a CSV file for e.g. Chennai Super Kings-BowlingDetails.csv dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" #kkr_bowling = yka.getTeamBowlingDetails("Kolkata Knight Riders",dir=dir1,save=True) #csk_bowling = yka.getTeamBowlingDetails("Chennai Super Kings",dir=dir1,save=True) #kxip_bowling = yka.getTeamBowlingDetails("Kings XI Punjab",dir=dir1,save=True) ### 13. Get bowling details of the individual IPL bowlers This function is used to get the individual bowling record for a specified bowler of the country as in the functions below. The plots below deal with bowler’s performance. For this analysis I have chosen Amit Mishra, Piyush Chawla and Bhuvaneshwar Kumar for the analysis. You can chose any other IPL bowler import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' #df=yka.getBowlerWicketDetails(team,name,dir=dir1) ### 14. Bowler Economy Rate (in IPL matches) The plots below show the economy rate of the selected bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanEconomyRate(df,name) ### 15. Bowler Mean Runs conceded (in IPL matches) The plots below show the mean runs conceded by the selected bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMeanRunsConceded(df,name) ### 16. Moving average of wickets for bowler (in IPL matches) The moving average of the bowlers are plotted below import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerMovingAverage(df,name) ### 17. Cumulative average wickets for bowler (in IPL matches) The cumulative average wickets for each bowler is computed and plotted import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgWickets(df,name) ### 18. Cumulative average economy rate for bowler (in IPL matches) The plots below give the cumulative average economy rate for each bowler import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerCumulativeAvgEconRate(df,name) ### 19. Bowler wicket plot (in IPL matches) The plots below give the over vs wickets for bowlers import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketPlot(df,name) ### 20. Bowler wicket against opposition (in IPL matches) The performance of the bowlers against different IPL teams is shown below import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsAgainstOpposition(df,name) ### 21. Bowler wicket in different venues (in IPL matches) The plots below show how the bowlers perform at different venues import pandas as pd import os import yorkpy.analytics as yka dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" # 1. Amit Mishra name="A Mishra" team='Delhi Daredevils' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) # 2. Piyush Chawla dir1= "C:\\software\\cricket-package\\yorkpyIPLData\\data3" name="PP Chawla" team='Kolkata Knight Riders' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) #3. Bhuvneshwar Kumar name="B Kumar" team='Sunrisers Hyderabad' df=yka.getBowlerWicketDetails(team,name,dir=dir1) yka.bowlerWicketsVenue(df,name) Note:You can clone/download the code at Github yorkpy Important note: Do check out my other posts using yorkpy at yorkpy-posts Conclusion: This concludes the python package yorkpy. Go ahead and give yorkpy a spin! To see all posts click Index of posts # Pitching yorkpy … short of good length to IPL – Part 1 I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times. Bruce Lee I’ve missed more than 9000 shots in my career. I’ve lost almost 300 games. 26 times, I’ve been trusted to take the game winning shot and missed. I’ve failed over and over and over again in my life. And that is why I succeed. Michael Jordan Man, it doesn’t matter where you come in to bat, the score is still zero Viv Richards ## Introduction “If cricketr is to cricpy, then yorkr is to _____?”. Yes, you guessed it right, it is yorkpy. In this post, I introduce my 2nd python package, yorkpy, which is a python clone of my R package yorkr. This package is based on data from Cricsheet. yorkpy currently handles IPL T20 matches. When I created cricpy, the python avatar, of my R package cricketr, see Introducing cricpy:A python package to analyze performances of cricketers, I had decided that I should avoid doing a python avatar of my R package yorkr (see Introducing cricket package yorkr: Part 1- Beaten by sheer pace!) , as it was more involved, and required the parsing of match data available as yaml files. Just out of curiosity, I tried the python package ‘yaml’ to read the match data, and lo and behold, I was sucked into the developing the package and so, yorkpy was born. Of course, it goes without saying that, usually when I am in the thick of developing something, I occasionally wonder, why I am doing it, for whom and for what purpose? Maybe it is the joy of ideation, the problem-solving, the programmer’s high, for sharing my ideas etc. Anyway, whatever be the reason, I hope you enjoy this post and also find yorkpy useful. You can clone/download the code at Github yorkpy This post has been published to RPubs at yorkpy-Part1 You can download this post as PDF at IPLT20-yorkpy-part1 Note: If you would like to do a similar analysis for a different set of batsman and bowlers, you can clone/download my skeleton yorkpy-template from Github (which is the R Markdown file I have used for the analysis below). The IPL T20 functions in yorkpy are ### 2. Install the package using ‘pip install’ import pandas as pd import yorkpy.analytics as yka #pip install yorkpy ### 3. Load a yaml file from Cricsheet There are 2 functions that can be to convert the IPL Twenty20 yaml files to pandas dataframeare 1. convertYaml2PandasDataframeT20 2. convertAllYaml2PandasDataframesT20 Note 1: While I have already converted the IPL T20 files, you will need to use these functions for future IPL matches ### 4. Convert and save IPL T20 yaml file to pandas dataframe This function will convert a IPL T20 IPL yaml file, in the format as specified in Cricsheet to pandas dataframe. This will be saved as as CSV file in the target directory. The name of the file wil have the following format team1-team2-date.csv. The IPL T20 zip file can be downloaded from Indian Premier League matches. An example of how a yaml file can be converted to a dataframe and saved is shown below. import pandas as pd import yorkpy.analytics as yka #convertYaml2PandasDataframe(".\\1082593.yaml","..\ipl", ..\\data") ### 5. Convert and save all IPL T20 yaml files to dataframes This function will convert all IPL T20 yaml files from a source directory to dataframes, and save it in the target directory, with the names as mentioned above. Since I have already done this, I will not be executing this again. You can download the zip of all the converted RData files from Github at yorkpyData import pandas as pd import yorkpy.analytics as yka #convertAllYaml2PandasDataframes("..\\ipl", "..\\data") You can download the the zip of the files and use it directly in the functions as follows.For the analysis below I chosen a set of random IPL matches The randomly selected IPL T20 matches are • Chennai Super Kings vs Kings Xi Punjab, 2014-05-30 • Deccan Chargers vs Delhi Daredevils, 2012-05-10 • Gujarat Lions vs Mumbai Indians, 2017-04-29 • Kolkata Knight Riders vs Rajasthan Royals, 2010-04-17 • Rising Pune Supergiants vs Royal Challengers Bangalore, 2017-04-29 ### 6. Team batting scorecard The function below computes the batting score card of a team in an IPL match. The scorecard gives the balls faced, the runs scored, 4s, 6s and strike rate. The example below is based on the CSK KXIP match on 30 May 2014. You can check against the actual scores in this match Chennai Super Kings-Kings XI Punjab-2014-05-30 import pandas as pd import yorkpy.analytics as yka csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv") scorecard,extras=yka.teamBattingScorecardMatch(csk_kxip,"Chennai Super Kings") print(scorecard) ## batsman runs balls 4s 6s SR ## 0 DR Smith 7 12 0 0 58.333333 ## 1 F du Plessis 0 1 0 0 0.000000 ## 2 SK Raina 87 26 12 6 334.615385 ## 3 BB McCullum 11 16 0 0 68.750000 ## 4 RA Jadeja 27 22 2 1 122.727273 ## 5 DJ Hussey 1 3 0 0 33.333333 ## 6 MS Dhoni 42 34 3 3 123.529412 ## 7 R Ashwin 10 11 0 0 90.909091 ## 8 MM Sharma 1 3 0 0 33.333333 print(extras) ## total wides noballs legbyes byes penalty extras ## 0 428 14 3 5 5 0 27 print("\n\n") scorecard1,extras1=yka.teamBattingScorecardMatch(csk_kxip,"Kings XI Punjab") print(scorecard1) ## batsman runs balls 4s 6s SR ## 0 V Sehwag 122 62 12 8 196.774194 ## 1 M Vohra 34 33 1 2 103.030303 ## 2 GJ Maxwell 13 8 1 1 162.500000 ## 3 DA Miller 38 19 5 1 200.000000 ## 4 GJ Bailey 1 2 0 0 50.000000 ## 5 WP Saha 6 4 0 1 150.000000 ## 6 MG Johnson 1 1 0 0 100.000000 print(extras1) ## total wides noballs legbyes byes penalty extras ## 0 428 14 3 5 5 0 27 Let’s take another random match between Gujarat Lions and Mumbai Indian on 29 Apr 2017 Gujarat Lions-Mumbai Indians-2017-04-29 import pandas as pd gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv") import yorkpy.analytics as yka scorecard,extras=yka.teamBattingScorecardMatch(gl_mi,"Gujarat Lions") print(scorecard) ## batsman runs balls 4s 6s SR ## 0 Ishan Kishan 48 38 6 2 126.315789 ## 1 BB McCullum 6 4 1 0 150.000000 ## 2 SK Raina 1 3 0 0 33.333333 ## 3 AJ Finch 0 3 0 0 0.000000 ## 4 KD Karthik 2 9 0 0 22.222222 ## 5 RA Jadeja 28 22 2 1 127.272727 ## 6 JP Faulkner 21 29 2 0 72.413793 ## 7 IK Pathan 2 3 0 0 66.666667 ## 8 AJ Tye 25 12 2 2 208.333333 ## 9 Basil Thampi 2 4 0 0 50.000000 ## 10 Ankit Soni 7 2 0 1 350.000000 print(extras) ## total wides noballs legbyes byes penalty extras ## 0 306 8 3 1 0 0 12 print("\n\n") scorecard1,extras1=yka.teamBattingScorecardMatch(gl_mi,"Mumbai Indians") print(scorecard1) ## batsman runs balls 4s 6s SR ## 0 PA Patel 70 45 9 1 155.555556 ## 1 JC Buttler 9 7 2 0 128.571429 ## 2 N Rana 19 16 1 1 118.750000 ## 3 RG Sharma 5 13 0 0 38.461538 ## 4 KA Pollard 15 11 2 0 136.363636 ## 5 KH Pandya 29 20 2 1 145.000000 ## 6 HH Pandya 4 5 0 0 80.000000 ## 7 Harbhajan Singh 0 1 0 0 0.000000 ## 8 MJ McClenaghan 1 1 0 0 100.000000 ## 9 JJ Bumrah 0 1 0 0 0.000000 ## 10 SL Malinga 0 1 0 0 0.000000 print(extras1) ## total wides noballs legbyes byes penalty extras ## 0 306 8 3 1 0 0 12 ### 7. Plot the team batting partnerships The functions below plot the team batting partnership in the match. It shows what the partnership were in the mtach Note: Many of the plots include an additional parameters plot which is either True or False. The default value is plot=True. When plot=True the plot will be displayed. When plot=False the data frame will be returned to the user. The user can use this to create an interactive chart using one of the packages like rcharts, ggvis,googleVis or plotly. import pandas as pd import yorkpy.analytics as yka dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv") yka.teamBatsmenPartnershipMatch(dc_dd,'Deccan Chargers','Delhi Daredevils') yka.teamBatsmenPartnershipMatch(dc_dd,'Delhi Daredevils','Deccan Chargers',plot=True) # Print partnerships as a dataframe rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv") m=yka.teamBatsmenPartnershipMatch(rps_rcb,'Royal Challengers Bangalore','Rising Pune Supergiant',plot=False) print(m) ## batsman non_striker runs ## 0 AB de Villiers V Kohli 3 ## 1 AF Milne V Kohli 5 ## 2 KM Jadhav V Kohli 7 ## 3 P Negi V Kohli 3 ## 4 S Aravind V Kohli 0 ## 5 S Aravind YS Chahal 8 ## 6 S Badree V Kohli 2 ## 7 STR Binny V Kohli 1 ## 8 Sachin Baby V Kohli 2 ## 9 TM Head V Kohli 2 ## 10 V Kohli AB de Villiers 17 ## 11 V Kohli AF Milne 5 ## 12 V Kohli KM Jadhav 4 ## 13 V Kohli P Negi 9 ## 14 V Kohli S Aravind 2 ## 15 V Kohli S Badree 8 ## 16 V Kohli Sachin Baby 1 ## 17 V Kohli TM Head 9 ## 18 YS Chahal S Aravind 4 ### 8. Batsmen vs Bowler The function below computes and plots the performances of the batsmen vs the bowlers. As before the plot parameter can be set to True or False. By default it is plot=True import pandas as pd import yorkpy.analytics as yka gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv") yka.teamBatsmenVsBowlersMatch(gl_mi,"Gujarat Lions","Mumbai Indians", plot=True) # Print  csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv") m=yka.teamBatsmenVsBowlersMatch(csk_kxip,'Chennai Super Kings','Kings XI Punjab',plot=False) print(m) ## batsman bowler runs ## 0 BB McCullum AR Patel 4 ## 1 BB McCullum GJ Maxwell 1 ## 2 BB McCullum Karanveer Singh 6 ## 3 DJ Hussey P Awana 1 ## 4 DR Smith MG Johnson 7 ## 5 DR Smith P Awana 0 ## 6 DR Smith Sandeep Sharma 0 ## 7 F du Plessis MG Johnson 0 ## 8 MM Sharma AR Patel 0 ## 9 MM Sharma MG Johnson 0 ## 10 MM Sharma P Awana 1 ## 11 MS Dhoni AR Patel 12 ## 12 MS Dhoni Karanveer Singh 2 ## 13 MS Dhoni MG Johnson 11 ## 14 MS Dhoni P Awana 15 ## 15 MS Dhoni Sandeep Sharma 2 ## 16 R Ashwin AR Patel 1 ## 17 R Ashwin Karanveer Singh 4 ## 18 R Ashwin MG Johnson 1 ## 19 R Ashwin P Awana 1 ## 20 R Ashwin Sandeep Sharma 3 ## 21 RA Jadeja AR Patel 5 ## 22 RA Jadeja GJ Maxwell 3 ## 23 RA Jadeja Karanveer Singh 19 ## 24 RA Jadeja P Awana 0 ## 25 SK Raina MG Johnson 21 ## 26 SK Raina P Awana 40 ## 27 SK Raina Sandeep Sharma 26 ### 9. Bowling Scorecard This function provides the bowling performance, the number of overs bowled, maidens, runs conceded. wickets taken and economy rate for the IPL match import pandas as pd import yorkpy.analytics as yka dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv") a=yka.teamBowlingScorecardMatch(dc_dd,'Deccan Chargers') print(a) ## bowler overs runs maidens wicket econrate ## 0 AD Russell 4 39 0 0 9.75 ## 1 IK Pathan 4 46 0 1 11.50 ## 2 M Morkel 4 32 0 1 8.00 ## 3 S Nadeem 4 39 0 0 9.75 ## 4 VR Aaron 4 30 0 2 7.50 rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv") b=yka.teamBowlingScorecardMatch(rps_rcb,'Royal Challengers Bangalore') print(b) ## bowler overs runs maidens wicket econrate ## 0 DL Chahar 2 18 0 0 9.00 ## 1 DT Christian 4 25 0 1 6.25 ## 2 Imran Tahir 4 18 0 3 4.50 ## 3 JD Unadkat 4 19 0 1 4.75 ## 4 LH Ferguson 4 7 1 3 1.75 ## 5 Washington Sundar 2 7 0 1 3.50 ### 10. Wicket Kind The plots below provide the kind of wicket taken by the bowler (caught, bowled, lbw etc.) for the IPL match import pandas as pd import yorkpy.analytics as yka kkr_rr=pd.read_csv(".\\Kolkata Knight Riders-Rajasthan Royals-2010-04-17.csv") yka.teamBowlingWicketKindMatch(kkr_rr,'Kolkata Knight Riders','Rajasthan Royals') csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv") m = yka.teamBowlingWicketKindMatch(csk_kxip,'Chennai Super Kings','Kings-Kings XI Punjab',plot=False) print(m) ## bowler kind player_out ## 0 AR Patel run out 1 ## 1 AR Patel stumped 1 ## 2 Karanveer Singh run out 1 ## 3 MG Johnson caught 1 ## 4 P Awana caught 2 ## 5 Sandeep Sharma bowled 1 ### 11. Wicket vs Runs conceded The plots below provide the wickets taken and the runs conceded by the bowler in the IPL T20 match import pandas as pd import yorkpy.analytics as yka dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv") yka.teamBowlingWicketMatch(dc_dd,"Deccan Chargers", "Delhi Daredevils",plot=True) print("\n\n") rps_rcb=pd.read_csv(".\\Rising Pune Supergiant-Royal Challengers Bangalore-2017-04-29.csv") a=yka.teamBowlingWicketMatch(rps_rcb,"Royal Challengers Bangalore", "Rising Pune Supergiant",plot=False) print(a) ## bowler player_out kind ## 0 DT Christian V Kohli 1 ## 1 Imran Tahir AF Milne 1 ## 2 Imran Tahir P Negi 1 ## 3 Imran Tahir S Badree 1 ## 4 JD Unadkat TM Head 1 ## 5 LH Ferguson AB de Villiers 1 ## 6 LH Ferguson KM Jadhav 1 ## 7 LH Ferguson STR Binny 1 ## 8 Washington Sundar Sachin Baby 1 ### 12. Bowler Vs Batsmen The functions compute and display how the different bowlers of the IPL team performed against the batting opposition. import pandas as pd import yorkpy.analytics as yka csk_kxip=pd.read_csv(".\\Chennai Super Kings-Kings XI Punjab-2014-05-30.csv") yka.teamBowlersVsBatsmenMatch(csk_kxip,"Chennai Super Kings","Kings XI Punjab") print("\n\n") kkr_rr=pd.read_csv(".\\Kolkata Knight Riders-Rajasthan Royals-2010-04-17.csv") m =yka.teamBowlersVsBatsmenMatch(kkr_rr,"Rajasthan Royals","Kolkata Knight Riders",plot=False) print(m) ## batsman bowler runs ## 0 AC Voges AB Dinda 1 ## 1 AC Voges JD Unadkat 1 ## 2 AC Voges LR Shukla 1 ## 3 AC Voges M Kartik 5 ## 4 AJ Finch AB Dinda 3 ## 5 AJ Finch JD Unadkat 3 ## 6 AJ Finch LR Shukla 13 ## 7 AJ Finch M Kartik 2 ## 8 AJ Finch SE Bond 0 ## 9 AS Raut AB Dinda 1 ## 10 AS Raut JD Unadkat 1 ## 11 FY Fazal AB Dinda 1 ## 12 FY Fazal LR Shukla 3 ## 13 FY Fazal M Kartik 3 ## 14 FY Fazal SE Bond 6 ## 15 NV Ojha AB Dinda 10 ## 16 NV Ojha JD Unadkat 5 ## 17 NV Ojha LR Shukla 0 ## 18 NV Ojha M Kartik 1 ## 19 NV Ojha SE Bond 2 ## 20 P Dogra JD Unadkat 2 ## 21 P Dogra LR Shukla 5 ## 22 P Dogra M Kartik 1 ## 23 P Dogra SE Bond 0 ## 24 SK Trivedi AB Dinda 4 ## 25 SK Warne AB Dinda 2 ## 26 SK Warne M Kartik 1 ## 27 SK Warne SE Bond 0 ## 28 SR Watson AB Dinda 2 ## 29 SR Watson JD Unadkat 13 ## 30 SR Watson LR Shukla 1 ## 31 SR Watson M Kartik 18 ## 32 SR Watson SE Bond 10 ## 33 YK Pathan JD Unadkat 1 ## 34 YK Pathan LR Shukla 7 ### 13. Match worm chart The plots below provide the match worm graph for the IPL Twenty 20 matches import pandas as pd import yorkpy.analytics as yka dc_dd=pd.read_csv(".\\Deccan Chargers-Delhi Daredevils-2012-05-10.csv") yka.matchWormChart(dc_dd,"Deccan Chargers", "Delhi Daredevils") gl_mi=pd.read_csv(".\\Gujarat Lions-Mumbai Indians-2017-04-29.csv") yka.matchWormChart(gl_mi,"Mumbai Indians","Gujarat Lions") Feel free to clone/download the code from Github yorkpy ## Conclusion This post included all functions between 2 IPL teams from the package yorkpy for IPL Twenty20 matches. As mentioned above the yaml match files have been already converted to dataframes and are available for download from Github at yorkpyData After having used Python and R for analytics, Machine Learning and Deep Learning, I have now realized that neither language is superior or inferior. Both have, some good packages and some that are not so well suited. To be continued. Watch this space! Important note: Do check out my other posts using yorkpy at yorkpy-posts To see all posts click Index of posts # The 3rd paperback & kindle editions of my books on Cricket, now on Amazon The 3rd paperback & kindle edition of both my books on cricket is now available on Amazon a) Cricket analytics with cricketr, Third Edition. The paperback edition is$12.99 and the kindle edition is $4.99/Rs320. This book is based on my R package ‘cricketr‘, available on CRAN and uses ESPN Cricinfo Statsguru b) Beaten by sheer pace! Cricket analytics with yorkr, 3rd edition . The paperback is$12.99 and the kindle version is $6.99/Rs448. This is based on my R package ‘yorkr‘ on CRAN and uses data from Cricsheet Pick up your copies today!! Note: In the 3rd edition of the paperback book, the charts will be in black and white. If you would like the charts to be in color, please check out the 2nd edition of these books see More book, more cricket! 2nd edition of my books now on Amazon To see all posts see Index of posts # Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket In my recent post, My travels through the realms of Data Science, Machine Learning, Deep Learning and (AI), I had recounted my journey in the domains of of Data Science, Machine Learning (ML), and more recently Deep Learning (DL) all of which are useful while analyzing data. Of late, I have come to the realization that there are many facets to data. And to glean insights from data, Data Science, ML and DL alone are not sufficient and one needs to also have a good handle on linear programming and optimization. My colleague at IBM Research also concurred with this view and told me he had arrived at this conclusion several years ago. If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at$12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

While ML & DL are very useful and interesting to make inferences and predictions of outputs from input variables, optimization computes the choice of input which results in maximizing or minimizing the output. So I made a small course correction and started on a course from India’s own NPTEL Introduction to Linear Programming by Prof G. Srinivasan of IIT Madras (highly recommended!). The lectures are delivered with remarkable clarity by the Prof and I am just about halfway through the course (each lecture is of 50-55 min duration), when I decided that I needed to try to formulate and solve some real world Linear Programming problem.

As usual, I turned towards cricket for some appropriate situations, and sure enough it was there in the open. For this LP formulation I take International T20 and IPL, though International ODI will also work equally well.  You can download the associated code and data for this from Github at LP-cricket-analysis

In T20 matches the captain has to make choice of how to rotate bowlers with the aim of restricting the batting side. Conversely, the batsmen need to take advantage of the bowling strength to maximize the runs scored.

Note:
a) A simple and obvious strategy would be
– If the ith bowler’s economy rate is less than the economy rate of the jth bowler i.e.
$er_{i}$ < $er_{j}$ then have bowler ‘i’ to bowl more overs as his/her economy rate is better

b)A better strategy would be to consider the economy rate of each bowler against each batsman. How often  have we witnessed bowlers with a great bowling average get thrashed time and again by the same batsman, or a bowler who is generally very poor being very effective against a particular batsman. i.e. $er_{ij}$ < $er_{ik}$ where the jth bowler is more effective than the kth bowler against the ith batsman. This now becomes a linear optimization problem as we can have several combinations of number of overs X economy rate for different bowlers and we will have to solve this algorithmically to determine the lowest score for bowling performance or highest score for batting order.

This post uses the latter approach to optimize bowling change and batting lineup.

Let is take a hypothetical situation
Assume there are 3 bowlers – $bwlr_{1},bwlr_{2},bwlr_{3}$
and there are 4 batsmen – $bman_{1},bman_{2},bman_{3},bman_{4}$

Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman. Also if remaining overs for the bowlers are $o_{1},o_{2},o_{3}$
and the total number of overs left to be bowled are
$o_{1}+o_{2}+o_{3} = N$ then the question is

a) Given the economy rate of each bowler per batsman, how many overs should each bowler bowl, so that the total runs scored by all the batsmen are minimum?

b) Alternatively, if the know the individual strike rate of a batsman against the individual bowlers, how many overs should each batsman face with a bowler so that the total runs scored is maximized?

## 1. LP Formulation for bowling order

Let the economy rate $er_{ij}$ be the Economy Rate of the jth bowler to the ith batsman.
Objective function : Minimize –
$er_{11}*o_{11} + er_{12}*o_{12} +..+er_{1n}*o_{1n}+ er_{21}*o_{21} + er_{22}*o_{22}+.. + er_{22}*o_{2n}+ er_{m1}*o_{m1}+..+ er_{mn}*o_{mn}$
i.e.
$\sum_{i=1}^{i=m}\sum_{j=1}^{i=n}er_{ij}*o_{ij}$
Constraints
Where $o_{j}$ is the number of overs remaining for the jth bowler against  ‘k’ batsmen
$o_{j1} + o_{j2} + .. o_{jk} < o_{j}$
and if the total number of overs remaining to be bowled is N then
$o_{1} + o_{2} +...+ o_{k} = N$ or
$\sum_{j=1}^{j=k} o_{j} =N$
The overs that any bowler can bowl is $o_{j} >=0$

## 2. LP Formulation for batting lineup

Let the strike rate $sr_{ij}$  be the Strike Rate of the ith batsman to the jth bowler
Objective function : Maximize –
$sr_{11}*o_{11} + sr_{12}*o_{12} +..+ sr_{1n}*o_{1n}+ sr_{21}*o_{21} + sr_{22}*o_{22}+.. sr_{2n}*o_{2n}+ sr_{m1}*o_{m1}+..+ sr_{mn}*o_{mn}$
i.e.
$\sum_{i=1}^{i=4}\sum_{j=1}^{i=3}sr_{ij}*o_{ij}$
Constraints
Where $o_{j}$ is the number of overs remaining for the jth bowler against  ‘k’ batsmen
$o_{j1} + o_{j2} + .. o_{jk} < o_{j}$
and the total number of overs remaining to be bowled is N then
$o_{1} + o_{2} +...+ o_{k} = N$ or
$\sum_{j=1}^{j=k} o_{j} =N$
The overs that any bowler can bowl is
$o_{j} >=0$

lpSolveAPI– For this maximization and minimization problem I used lpSolveAPI.

Below I take 2 simple examples (example1 & 2)  to ensure that my LP formulation and solution is correct before applying it on real T20 cricket data (Intl. T20 and IPL)

## 3. LP formulation (Example 1)

Initially I created a test example to ensure that I get the LP formulation and solution correct. Here the er1=4 and er2=3 and o1 & o2 are the overs bowled by bowlers 1 & 2. Also o1+o2=4 In this example as below

o1 o2 Obj Fun(=4o1+3o2)
1    3      13
2    2      14
3    1      15

library(lpSolveAPI)
library(dplyr)
library(knitr)
lprec <- make.lp(0, 2)
a <-lp.control(lprec, sense="min")
set.objfn(lprec, c(4, 3))  # Economy Rate of 4 and 3 for er1 and er2
add.constraint(lprec, c(1, 1), "=",4)  # o1 + o2 =4
add.constraint(lprec, c(1, 0), ">",1)  # o1 > 1
add.constraint(lprec, c(0, 1), ">",1)  # o2 > 1
lprec
## Model name:
##             C1    C2
## Minimize     4     3
## R1           1     1   =  4
## R2           1     0  >=  1
## R3           0     1  >=  1
## Kind       Std   Std
## Type      Real  Real
## Upper      Inf   Inf
## Lower        0     0
b <-solve(lprec)
get.objective(lprec) # 13
## [1] 13
get.variables(lprec) # 1    3 
## [1] 1 3

Note 1: In the above example 13 runs is the minimum that can be scored and this requires

LP solution:
Minimum runs=13

• o1=1
• o2=3

Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman.

## 4. LP formulation (Example 2)

In this formulation there are 2 bowlers and 2 batsmen o11,o12 are the oves bowled by bowler 1 to batsmen 1 & 2 and o21, o22 are the overs bowled by bowler 2 to batsmen 1 & 2 er11=4, er12=2,er21=2,er22=5 o11+o12+o21+o22=5

The solution for this manually computed is o11, o12, o21, o22 Runs
where B11, B12 are the overs bowler 1 bowls to batsman 1 and B21 and B22 are overs bowler 2 bowls to batsman 2

o11     o12    o21    o22      Runs=(4*o11+2*o12+2*o21+5*o22)
1            1             1            2           18
1           2              1             1           15
2           1              1            1            17
1           1               2            1            15

lprec <- make.lp(0, 4)
a <-lp.control(lprec, sense="min")
set.objfn(lprec, c(4, 2,2,5))
lprec
## Model name:
##             C1    C2    C3    C4
## Minimize     4     2     2     5
## R1           1     1     0     0  <=  8
## R2           0     0     1     1  <=  7
## R3           1     1     1     1   =  5
## R4           1     0     0     0  >=  1
## R5           0     1     0     0  >=  1
## R6           0     0     1     0  >=  1
## R7           0     0     0     1  >=  1
## Kind       Std   Std   Std   Std
## Type      Real  Real  Real  Real
## Upper      Inf   Inf   Inf   Inf
## Lower        0     0     0     0
b<-solve(lprec)
get.objective(lprec) 
## [1] 15
get.variables(lprec) 
## [1] 1 2 1 1

Note: In the above example 15 runs is the minimum that can be scored and this requires

LP Solution:
Minimum runs=15

• o11=1
• o12=2
• o21=1
• o22=1

It is possible to keep the minimum to other values and solves also.

## 5. LP formulation for International T20 India vs Australia (Batting lineup)

To analyze batting and bowling lineups in the cricket world I needed to get the ball-by-ball details of runs scored by each batsman against each of the bowlers. Fortunately I had already created this with my R package yorkr. yorkr processes yaml data from Cricsheet. So I copied the data of all matches between Australia and India in International T20s. You can download my processed data for International T20 at Inswinger

load("Australia-India-allMatches.RData")
dim(matches)
## [1] 3541   25

The following functions compute the ‘Strike Rate’ of a batsman as

SR=1/oversRunsScored

Also the Economy Rate is computed as

ER=1/oversRunsConceded

Incidentally the SR=ER

# Compute the Strike Rate of the batsman
computeSR <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6)
a1
}

# Compute the Economy Rate of the batsman
computeER <- function(batsman1,bowler1){
a <- matches %>% filter(batsman==batsman1 & bowler==bowler1)
a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(ER=(totalRuns/count)*6)
a1
}

Here I compute the Strike Rate of Virat Kohli, Yuvraj Singh and MS Dhoni against Shane Watson, Brett Lee and MA Starc

 # Kohli
kohliWatson<- computeSR("V Kohli","SR Watson")
kohliWatson
##   totalRuns count       SR
## 1        45    37 7.297297
kohliLee <- computeSR("V Kohli","B Lee")
kohliLee
##   totalRuns count       SR
## 1        10     7 8.571429
kohliStarc <- computeSR("V Kohli","MA Starc")
kohliStarc
##   totalRuns count       SR
## 1        11     9 7.333333
# Yuvraj
yuvrajWatson<- computeSR("Yuvraj Singh","SR Watson")
yuvrajWatson
##   totalRuns count       SR
## 1        24    22 6.545455
yuvrajLee <- computeSR("Yuvraj Singh","B Lee")
yuvrajLee
##   totalRuns count       SR
## 1        12     7 10.28571
yuvrajStarc <- computeSR("Yuvraj Singh","MA Starc")
yuvrajStarc
##   totalRuns count SR
## 1        12     8  9
# MS Dhoni
dhoniWatson<- computeSR("MS Dhoni","SR Watson")
dhoniWatson
##   totalRuns count       SR
## 1        33    28 7.071429
dhoniLee <- computeSR("MS Dhoni","B Lee")
dhoniLee
##   totalRuns count  SR
## 1        26    20 7.8
dhoniStarc <- computeSR("MS Dhoni","MA Starc")
dhoniStarc
##   totalRuns count   SR
## 1        11     8 8.25

When we consider the batting lineup, the problem is one of maximization. In the LP formulation below V Kohli has a SR of 7.29, 8.57, 7.33 against Watson, Lee & Starc
Yuvraj has a SR of 6.5, 10.28, 9 against Watson, Lee & Starc
and Dhoni has a SR of 7.07, 7.8,  8.25 against Watson, Lee and Starc

The constraints are Watson, Lee and Starc have 3, 4 & 3 overs remaining respectively. The total number of overs remaining to be bowled is 9.The other constraints could be that a bowler bowls at least 1 over etc.

Formulating and solving

# 3 batsman x 3 bowlers
lprec <- make.lp(0, 9)
# Maximization
a<-lp.control(lprec, sense="max")

# Set the objective function
set.objfn(lprec, c(kohliWatson$SR, kohliLee$SR,kohliStarc$SR, yuvrajWatson$SR,yuvrajLee$SR,yuvrajStarc$SR,
dhoniWatson$SR,dhoniLee$SR,dhoniStarc$SR)) #Assume the bowlers have 3,4,3 overs left respectively add.constraint(lprec, c(1, 1,1,0,0,0, 0,0,0), "<=",3) add.constraint(lprec, c(0,0,0,1,1,1,0,0,0), "<=",4) add.constraint(lprec, c(0,0,0,0,0,0,1,1,1), "<=",3) #o11+o12+o13+o21+o22+o23+o31+o32+o33=8 (overs remaining) add.constraint(lprec, c(1,1,1,1,1,1,1,1,1), "=",9) add.constraint(lprec, c(1,0,0,0,0,0,0,0,0), ">=",1) #o11 >=1 add.constraint(lprec, c(0,1,0,0,0,0,0,0,0), ">=",0) #o12 >=0 add.constraint(lprec, c(0,0,1,0,0,0,0,0,0), ">=",0) #o13 >=0 add.constraint(lprec, c(0,0,0,1,0,0,0,0,0), ">=",1) #o21 >=1 add.constraint(lprec, c(0,0,0,0,1,0,0,0,0), ">=",1) #o22 >=1 add.constraint(lprec, c(0,0,0,0,0,1,0,0,0), ">=",0) #o23 >=0 add.constraint(lprec, c(0,0,0,0,0,0,1,0,0), ">=",1) #o31 >=1 add.constraint(lprec, c(0,0,0,0,0,0,0,1,0), ">=",0) #o32 >=0 add.constraint(lprec, c(0,0,0,0,0,0,0,0,1), ">=",0) #o33 >=0 lprec ## Model name: ## a linear program with 9 decision variables and 13 constraints b <-solve(lprec) get.objective(lprec) #  ## [1] 77.16418 get.variables(lprec) #  ## [1] 1 2 0 1 3 0 1 0 1 This shows that the maximum runs that can be scored for the current strike rate is 77.16 runs in 9 overs The breakup is as follows This is also shown below get.variables(lprec) #  ## [1] 1 2 0 1 3 0 1 0 1 This is also shown below e <- as.data.frame(rbind(c(1,2,0,3),c(1,3,0,4),c(1,0,1,2))) names(e) <- c("S Watson","B Lee","MA Starc","Overs") rownames(e) <- c("Kohli","Yuvraj","Dhoni") e LP Solution: Maximum runs that can be scored by India against Australia is:77.164 if the 9 overs to be faced by the batsman are as below ## S Watson B Lee MA Starc Overs ## Kohli 1 2 0 3 ## Yuvraj 1 3 0 4 ## Dhoni 1 0 1 2 #Total overs=9 Note: This assumes that the batsmen perform at their current Strike Rate. Howvever anything can happen in a real game, but nevertheless this is a fairly reasonable estimate of the performance Note 2:The numbers in the columns represent the number of overs that need to be bowled by a bowler to the corresponding batsman. Note 3:You could try other combinations of overs for the above SR. For the above constraints 77.16 is the highest score for the given number of overs ## 6. LP formulation for International T20 India vs Australia (Bowling lineup) For this I compute how the bowling should be rotated between R Ashwin, RA Jadeja and JJ Bumrah when taking into account their performance against batsmen like Shane Watson, AJ Finch and David Warner. For the bowling performance I take the Economy rate of the bowlers. The data is the same as above computeSR <- function(batsman1,bowler1){ a <- matches %>% filter(batsman==batsman1 & bowler==bowler1) a1 <- a %>% summarize(totalRuns=sum(runs),count=n()) %>% mutate(SR=(totalRuns/count)*6) a1 } # RA Jadeja jadejaWatson<- computeER("SR Watson","RA Jadeja") jadejaWatson ## totalRuns count ER ## 1 60 29 12.41379 jadejaFinch <- computeER("AJ Finch","RA Jadeja") jadejaFinch ## totalRuns count ER ## 1 36 33 6.545455 jadejaWarner <- computeER("DA Warner","RA Jadeja") jadejaWarner ## totalRuns count ER ## 1 23 11 12.54545 # Ashwin ashwinWatson<- computeER("SR Watson","R Ashwin") ashwinWatson ## totalRuns count ER ## 1 41 26 9.461538 ashwinFinch <- computeER("AJ Finch","R Ashwin") ashwinFinch ## totalRuns count ER ## 1 63 36 10.5 ashwinWarner <- computeER("DA Warner","R Ashwin") ashwinWarner ## totalRuns count ER ## 1 38 28 8.142857 # JJ Bunrah bumrahWatson<- computeER("SR Watson","JJ Bumrah") bumrahWatson ## totalRuns count ER ## 1 22 20 6.6 bumrahFinch <- computeER("AJ Finch","JJ Bumrah") bumrahFinch ## totalRuns count ER ## 1 25 19 7.894737 bumrahWarner <- computeER("DA Warner","JJ Bumrah") bumrahWarner ## totalRuns count ER ## 1 2 4 3 As can be seen from above RA Jadeja has a ER of 12.4, 6.54, 12.54 against Watson, AJ Finch and Warner also Ashwin has a ER of 9.46, 10.5, 8.14 against Watson, Finch and Warner. Similarly Bumrah has an ER of 6.6,7.89, 3 against Watson, Finch and Warner The constraints are Jadeja, Ashwin and Bumrah have 4, 3 & 4 overs remaining and the total overs remaining to be bowled is 10. Formulating solving the bowling lineup is shown below lprec <- make.lp(0, 9) a <-lp.control(lprec, sense="min") # Set the objective function set.objfn(lprec, c(jadejaWatson$ER, jadejaFinch$ER,jadejaWarner$ER,
ashwinWatson$ER,ashwinFinch$ER,ashwinWarner$ER, bumrahWatson$ER,bumrahFinch$ER,bumrahWarner$ER))

add.constraint(lprec, c(0,0,0,1,1,1,0,0,0), "<=",3)   # Ashwin has 3 overs left
add.constraint(lprec, c(0,0,0,0,0,0,1,1,1), "<=",4)   # Bumrah has 4 overs left
add.constraint(lprec, c(1,1,1,1,1,1,1,1,1), "=",10) # Total overs = 10

lprec
## Model name:
##   a linear program with 9 decision variables and 13 constraints
b <-solve(lprec)
get.objective(lprec) #  
## [1] 73.58775
get.variables(lprec) # 
## [1] 1 2 1 0 1 1 0 1 3

The minimum runs that will be conceded by these 3 bowlers in 10 overs is 73.58 assuming the bowling is rotated as follows

e <- as.data.frame(rbind(c(1,0,0),c(2,1,1),c(1,1,3),c(4,2,4)))
names(e) <- c("RA Jadeja","R Ashwin","JJ Bumrah")
rownames(e) <- c("S Watson","AJ Finch","DA Warner","Overs")
e 

LP Solution:
Minimum runs that will be conceded by India against Australia is 73.58 in 10 overs if the overs bowled are as follows

##           RA Jadeja R Ashwin JJ Bumrah
## S Watson          1        0         0
## AJ Finch          2        1         1
## DA Warner         1        1         3
## Overs             4        2         4
#Total overs=10  

## 7. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Bowling lineup)

As in the case of International T20s I also have processed IPL data derived from my R package yorkr. yorkr. yorkr processes yaml data from Cricsheet. The processed data for all IPL matches can be downloaded from GooglyPlus

load("Mumbai Indians-Kolkata Knight Riders-allMatches.RData")
dim(matches)
## [1] 4237   25
# Compute the Economy Rate of the Mumbai Indian bowlers against Kolkata Knight Riders

# Gambhir
gambhirMalinga <- computeER("G Gambhir","SL Malinga")
gambhirHarbhajan <- computeER("G Gambhir","Harbhajan Singh")
gambhirPollard <- computeER("G Gambhir","KA Pollard")

#Yusuf Pathan
yusufMalinga <- computeER("YK Pathan","SL Malinga")
yusufHarbhajan <- computeER("YK Pathan","Harbhajan Singh")
yusufPollard <- computeER("YK Pathan","KA Pollard")

#JH Kallis
kallisMalinga <- computeER("JH Kallis","SL Malinga")
kallisHarbhajan <- computeER("JH Kallis","Harbhajan Singh")
kallisPollard <- computeER("JH Kallis","KA Pollard")

#RV Uthappa
uthappaMalinga <- computeER("RV Uthappa","SL Malinga")
uthappaHarbhajan <- computeER("RV Uthappa","Harbhajan Singh")
uthappaPollard <- computeER("RV Uthappa","KA Pollard")

Here

gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the ER of Malinga against Gambhir, Yusuf Pathan, Kallis and Uthappa
gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the ER of Harbhajan against Gambhir, Yusuf Pathan, Kallis and Uthappa
gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the ER of Kieron Pollard against Gambhir, Yusuf Pathan, Kallis and Uthappa

The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10.

Formulating and solving this for the bowling lineup of Mumbai Indians against Kolkata Knight Riders

 library("lpSolveAPI")
lprec <- make.lp(0, 12)
a=lp.control(lprec, sense="min")

set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER,
gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER,
gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER))

lprec
## Model name:
##   a linear program with 12 decision variables and 16 constraints
 b=solve(lprec)
get.objective(lprec) #  
## [1] 55.57887
 get.variables(lprec) # 
##  [1] 3 1 0 0 0 1 0 1 3 1 0 0
e <- as.data.frame(rbind(c(3,1,0,0,4),c(0, 1, 0,1,2),c(3, 1, 0,0,4)))
names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs")
rownames(e) <- c("Malinga","Harbhajan","Pollard")
e

LP Solution: Mumbai Indians can restrict Kolkata Knight Riders to 55.87 in 10 overs
if the overs are bowled as below

##           Gambhir Yusuf Kallis Uthappa Overs
## Malinga         3     1      0       0     4
## Harbhajan       0     1      0       1     2
## Pollard         3     1      0       0     4
#Total overs=10  

## 8. LP formulation for IPL (Mumbai Indians – Kolkata Knight Riders – Batting lineup)

As I mentioned it is possible to perform a maximation with the same formulation since computeSR<==>computeER

This just flips the problem around and computes the maximum runs that can be scored for the batsman’s Strike rate (this is same as the bowler’s Economy rate) i.e.

gambhirMalinga, yusufMalinga, kallisMalinga, uthappaMalinga is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Malinga
gambhirHarbhajan, yusufHarbhajan, kallisHarbhajan, uthappaHarbhajan is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Harbhajan
gambhirPollard, yusufPollard, kallisPollard, uthappaPollard is the SR of Gambhir, Yusuf Pathan, Kallis and Uthappa against Kieron Pollard.

The constraints are Malinga, Harbhajan and Pollard have 4 overs each and remaining overs to be bowled is 10.

 library("lpSolveAPI")
lprec <- make.lp(0, 12)
a=lp.control(lprec, sense="max")

a <-set.objfn(lprec, c(gambhirMalinga$ER, yusufMalinga$ER,kallisMalinga$ER,uthappaMalinga$ER,
gambhirHarbhajan$ER,yusufHarbhajan$ER,kallisHarbhajan$ER,uthappaHarbhajan$ER,
gambhirPollard$ER,yusufPollard$ER,kallisPollard$ER,uthappaPollard$ER))

lprec
## Model name:
##   a linear program with 12 decision variables and 16 constraints
 b=solve(lprec)
get.objective(lprec) #  
## [1] 94.22649
 get.variables(lprec) # 
##  [1] 0 3 0 0 0 1 0 3 0 1 3 0
e <- as.data.frame(rbind(c(0,3,0,0,3),c(0, 1, 0,3,4),c(0, 1, 3,0,4)))
names(e) <- c("Gambhir","Yusuf","Kallis","Uthappa","Overs")
rownames(e) <- c("Malinga","Harbhajan","Pollard")
e

LP Solution: Kolkata Knight Riders can score a maximum of 94.22 in 11 overs against Mumbai Indians
if the the number of overs KKR face is as below

##           Gambhir Yusuf Kallis Uthappa Overs
## Malinga         0     3      0       0     3
## Harbhajan       0     1      0       3     4
## Pollard         0     1      3       0     4
#Total overs=11  

Conclusion: It is possible to thus determine the optimum no of overs to give to a specific bowler based on his/her Economy Rate with a particular batsman. Similarly one can determine the maximum runs that can be scored by a batsmen based on their strike rate with bowlers. Cricket like many other games is a game of strategy, skill, talent and some amount of luck. So while the LP formulation can provide some direction,  one must be aware anything could happen in a game of cricket!