GooglyPlus: yorkr analyzes IPL players, teams, matches with plots and tables

In this post I introduce my new Shiny app,“GooglyPlus”, which is a  more evolved version of my earlier Shiny app “Googly”. My R package ‘yorkr’,  on which both these Shiny apps are based, has the ability to output either a dataframe or plot, depending on a parameter plot=TRUE or FALSE. My initial version of the app only included plots, and did not exercise the yorkr package fully. Moreover, I am certain, there may be a set of cricket aficionados who would prefer, numbers to charts. Hence I have created this enhanced version of the Googly app and appropriately renamed it as GooglyPlus. GooglyPlus is based on the yorkr package which uses data from Cricsheet. The app is based on IPL data from  all IPL matches from 2008 up to 2016. Feel free to clone/fork or download the code from Github at GooglyPlus.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

Click  GooglyPlus to access the Shiny app!

The changes for GooglyPlus over the earlier Googly app is only in the following 3 tab panels

  • IPL match
  • Head to head
  • Overall Performance

The analysis of IPL batsman and IPL bowler tabs are unchanged. These charts are as they were before.

The changes are only in  tabs i) IPL match ii) Head to head and  iii) Overall Performance. New functionality has been added and existing functions now have the dual option of either displaying a plot or a table.

The changes are

A) IPL Match
The following additions/enhancements have been done

-Match Batting Scorecard – Table
-Batting Partnerships – Plot, Table (New)
-Batsmen vs Bowlers – Plot, Table(New)
-Match Bowling Scorecard   – Table (New)
-Bowling Wicket Kind – Plot, Table (New)
-Bowling Wicket Runs – Plot, Table (New)
-Bowling Wicket Match – Plot, Table (New)
-Bowler vs Batsmen – Plot, Table (New)
-Match Worm Graph – Plot

B) Head to head
The following functions have been added/enhanced

-Team Batsmen Batting Partnerships All Matches – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard All Matches – Table (New)
-Team Batsmen vs Bowlers all Matches – Plot, Table (New)
-Team Wickets Opposition All Matches – Plot, Table (New)
-Team Bowling Scorecard All Matches – Table (New)
-Team Bowler vs Batsmen All Matches – Plot, Table (New)
-Team Bowlers Wicket Kind All Matches – Plot, Table (New)
-Team Bowler Wicket Runs All Matches – Plot, Table (New)
-Win Loss All Matches – Plot

C) Overall Performance
The following additions/enhancements have been done in this tab

-Team Batsmen Partnerships Overall – Plot, Table {Summary (New) and Detailed (New)}
-Team Batting Scorecard Overall –Table (New)
-Team Batsmen vs Bowlers Overall – Plot, Table (New)
-Team Bowler vs Batsmen Overall – Plot, Table (New)
-Team Bowling Scorecard Overall – Table (New)
-Team Bowler Wicket Kind Overall – Plot, Table (New)

Included below are some random charts and tables. Feel free to explore the Shiny app further

1) IPL Match
a) Match Batting Scorecard (Table only)
This is the batting score card for the Chennai Super Kings & Deccan Chargers 2011-05-11

untitled

b)  Match batting partnerships (Plot)
Delhi Daredevils vs Kings XI Punjab – 2011-04-23

untitled

c) Match batting partnerships (Table)
The same batting partnership  Delhi Daredevils vs Kings XI Punjab – 2011-04-23 as a table

untitled

d) Batsmen vs Bowlers (Plot)
Kolkata Knight Riders vs Mumbai Indians 2010-04-19

Untitled.png

e)  Match Bowling Scorecard (Table only)
untitled

B) Head to head

a) Team Batsmen Partnership (Plot)
Deccan Chargers vs Kolkata Knight Riders all matches

untitled

b)  Team Batsmen Partnership (Summary – Table)
In the following tables it can be seen that MS Dhoni has performed better that SK Raina  CSK against DD matches, whereas SK Raina performs better than Dhoni in CSK vs  KKR matches

i) Chennai Super Kings vs Delhi Daredevils (Summary – Table)

untitled

ii) Chennai Super Kings vs Kolkata Knight Riders (Summary – Table)
untitled

iii) Rising Pune Supergiants vs Gujarat Lions (Detailed – Table)
This table provides the detailed partnership for RPS vs GL all matches

untitled

c) Team Bowling Scorecard (Table only)
This table gives the bowling scorecard of Pune Warriors vs Deccan Chargers in all matches

untitled

C) Overall performances
a) Batting Scorecard All Matches  (Table only)

This is the batting scorecard of Royal Challengers Bangalore. The top 3 batsmen are V Kohli, C Gayle and AB Devilliers in that order

untitled

b) Batsman vs Bowlers all Matches (Plot)
This gives the performance of Mumbai Indian’s batsman of Rank=1, which is Rohit Sharma, against bowlers of all other teams

untitled

c)  Batsman vs Bowlers all Matches (Table)
The above plot as a table. It can be seen that Rohit Sharma has scored maximum runs against M Morkel, then Shakib Al Hasan and then UT Yadav.

untitled

d) Bowling scorecard (Table only)
The table below gives the bowling scorecard of CSK. R Ashwin leads with a tally of 98 wickets followed by DJ Bravo who has 88 wickets and then JA Morkel who has 83 wickets in all matches against all teams

Untitled.png

This is just a random selection of functions. Do play around with the app and checkout how the different IPL batsmen, bowlers and teams stack against each other. Do read my earlier post Googly: An interactive app for analyzing IPL players, matches and teams using R package yorkr  for more details about the app and other functions available.

Click GooglyPlus to access the Shiny app!

You can clone/fork/download the code from Github at GooglyPlus

Hope you have fun playing around with the Shiny app!

Note: In the tabs, for some of the functions, not all controls  are required. It is possible to enable the controls selectively but this has not been done in this current version. I may make the changes some time in the future.

Take a look at my other Shiny apps
a.Revisiting crimes against women in India
b. Natural language processing: What would Shakespeare say?

Check out some of my other posts
1. Analyzing World Bank data with WDI, googleVis Motion Charts
2. Video presentation on Machine Learning, Data Science, NLP and Big Data – Part 1
3. Singularity
4. Design principles of scalable, distributed systems
5. Simulating an Edge shape in Android
6. Dabbling with Wiener filter in OpenCV

To see all posts click Index of Posts

The making of cricket package yorkr – Part 1

Introduction

Here is a sneak preview of my latest package cricket package yorkr in R. My earlier package ‘cricketr’ (see Introducing cricketr: An R package for analyzing performances of cricketers) was based on data from ESPN Cricinfo Statsguru. My current package ‘yorkr’ is based on data from Cricsheet. The data for Test, ODI, Twenty20 matches in Cricheet are formatted as yaml files.

While the data available from ESPN Cricinfo Statsguru is a summary of the player’s performances, Cricsheet data is more detailed and granular. Cricsheet gives a ball-by-ball detail for each match as can be seen from the above website. Hence the type of analyses possible can be much more detailed and richer. Some cool functions in this package, include charts for batsman partnerships, performance of batsman against bowlers and how bowlers fared against batsman for a single ODI match or for all ODI matches between 2 opposing sides (for e.g Australia-India or West Indies-Sri Lanka)

This current post includes my first stab at analysing ODI data from Cricsheet. To do this I had to parse the Yaml files and flatten them out as data frames. That was a fairly involved task and I think I now have done it. I then perform analyses on these flattened 1000’s of data frames. This post contains my initial analyses of the ODI data from Cricsheet.

Since the package ‘yorkr’ is still work in progress. I will be adding more functions, refining existing functions and crossing t’s and dotting the i’s. I hope to have the yorkr package wrapped up in about 6-10 weeks time. The package and code should be available after that. Please ‘hold your horses’ till this time.

If you are passionate about cricket, and love analyzing cricket performances, then check out my 2 racy books on cricket! In my books, I perform detailed yet compact analysis of performances of both batsmen, bowlers besides evaluating team & match performances in Tests , ODIs, T20s & IPL. You can buy my books on cricket from Amazon at $12.99 for the paperback and $4.99/$6.99 respectively for the kindle versions. The books can be accessed at Cricket analytics with cricketr  and Beaten by sheer pace-Cricket analytics with yorkr  A must read for any cricket lover! Check it out!!

1

 

This report is also available at Rpubs at yorkr1 york1. The report can also be downloaded as a PDF document at yorkr-1.pdf

 

Checkout my interactive Shiny apps GooglyPlus (plots & tables) and Googly (only plots) which can be used to analyze IPL players, teams and matches.

Important note: Do check out all the posts on the python avatar of yorkr, namely ‘yorkpy’ in my post ‘Pitching yorkpy … short of good length to IPL – Part 1

The current set of functions developed fall into 4 main categories

  • batsmen performance in match
  • bowlers performance in match
  • batsmen performance against opposition
  • bowlers performance against opposition

In the first part of the post I have taken an single Australia-India ODI match on 24 Feb 2008 at Sydney. (For details on this match look up Australia – India, Sydney)

The second part of the past looks at all ODI matches between Australia-India (there are 40 ODI matches between India and Australia)

While this post analyses 1 ODI match and all matches between 2 opposing sides (Australia vs India), the functions developed in yorkr(Part 1) can be used for any of 1000+ ODI matches and any combination of opposing countries!!!

So without much ado let me dive into the functions created

library(dplyr)
library(ggplot2)
library(yorkr)

Get the match details (Aus-Ind,24 Feb 2008,Sydney)

match <- getMatchDetails()

Team batting performances of the opposing teams

In this post I pick a ODI match played between India and Australia on 24 Feb 2008 at Sydney.

1. Team batting details (ODI Match)

This function gives the overall scores of the team for which the function is invoked

Team batting details (ODI Match)
This function gives the overall scores of the team for which the function is invoked

teamBattingDetailsMatch(match,"India")
## Total= 272
## Source: local data frame [11 x 5]
## 
##            batsman ballsPlayed fours sixes  runs
##             (fctr)       (int) (dbl) (dbl) (dbl)
## 1         V Sehwag          18     3     0    17
## 2     SR Tendulkar           3     0     0     2
## 3        G Gambhir         118     9     1   113
## 4        RG Sharma           3     0     0     1
## 5     Yuvraj Singh           3     1     0     5
## 6         MS Dhoni          64     4     0    36
## 7       RV Uthappa          40     4     1    51
## 8        IK Pathan          20     2     0    22
## 9  Harbhajan Singh          11     3     0    20
## 10     S Sreesanth           4     0     0     3
## 11        I Sharma           3     0     0     2
teamBattingDetailsMatch(match,"Australia")
## Total= 303
## Source: local data frame [7 x 5]
## 
##        batsman ballsPlayed fours sixes  runs
##         (fctr)       (int) (dbl) (dbl) (dbl)
## 1 AC Gilchrist           7     3     0    16
## 2    ML Hayden          61     5     1    54
## 3   RT Ponting         132     7     1   124
## 4    MJ Clarke          38     0     0    31
## 5    A Symonds          48     6     2    59
## 6   MEK Hussey          10     1     0    15
## 7     JR Hopes           3     0     0     4

2. Batsmen partnership (ODI Match)

The plot below shows the partnerships between batsman. Gautham Gambhir scored the highest followed by Uthappa. Gambhir had a good partnership with Sehway, Dhoni and Uthappa. On the Australian side Ponting had a good partnership with Hayden,Clarke and Symonds.

batsmenPartnershipMatch(match,"India")

partnershipmatch-1

batsmenPartnershipMatch(match,"Australia")

partnershipmatch-2

3. Batsmen vs Bowlers (ODI Match)

This chart shows how each batsman fared against the bowlers. Gambhir scored maximum from Hogg and Clarke. Ponting scores maximum from Pathan, Ishant Sharma, Sreesanth.

batsmenVsBowlersMatch(match,"India")

batsmenbowler-1

batsmenVsBowlersMatch(match,"Australia")

batsmenbowler-2

4. Team bowling details (ODI Match)

The table gives bowling details of each team

teamBowlingDetailsMatch(match,"India")
## Source: local data frame [6 x 5]
## 
##       bowler overs maidens  runs wickets
##       (fctr) (int)   (int) (dbl)   (dbl)
## 1      B Lee    10       2    58       5
## 2 NW Bracken    10       0    53       1
## 3   SR Clark    10       0    55       2
## 4   JR Hopes     6       0    27       1
## 5    GB Hogg     9       0    62       1
## 6  MJ Clarke     5       0    33       0
teamBowlingDetailsMatch(match,"Australia")
## Source: local data frame [6 x 5]
## 
##            bowler overs maidens  runs wickets
##            (fctr) (int)   (int) (dbl)   (dbl)
## 1     S Sreesanth     8       0    58       2
## 2        I Sharma    10       0    65       1
## 3       IK Pathan     9       0    73       0
## 4 Harbhajan Singh     9       0    50       2
## 5        V Sehwag     6       0    28       2
## 6    Yuvraj Singh     8       0    38       0

5. Wicket kind (ODI Match)

This chart gives the wicket kind or the type of wicket for the bowler vs the runs scored

teamBowlingWicketKindMatch(match,"India")

wicketKindmatch-1

teamBowlingWicketKindMatch(match,"Australia")

wicketKindmatch-2

6. Wickets Runs (ODI Match)l

This plot gives the number of wickets taken and the runs conceded by the bowler

teamBowlingWicketRunsMatch(match,"India")

wicketRunsMatch-1

teamBowlingWicketRunsMatch(match,"Australia")

wicketRunsMatch-2

7. Wicket (batsman) and total runs scored (ODI Match)

This plot gives the details of the wickets taken and the runs conceded. Brett Lee has the performance with 5 scalps. On the Indian side Sreesanth, Harbhajan and Sehwag have 2 wickets apiece. Sreesanth is the most expensive,

teamBowlingWicketMatch(match,"India")

wicketMatch-1

teamBowlingWicketMatch(match,"Australia")

wicketMatch-2

8. Bowler vs Batsman (ODI Match)

This plot below shows which of the batsman was most brutal against the bowler or who scored the most against the bowler. Ponting scores most against Pathan.

bowlersVsBatsmanMatch(match,"India")


batsmanMatch,-1

bowlersVsBatsmanMatch(match,"Australia")


batsmanMatch-2

9.

Worm graph (ODI Match) This chart gives the match worm of runs scored against the number deliveries.

matchWormGraph(match,team1="Australia",team2="India")

worm-1

The following charts show the performances of the batsmen and against the opposition. In this case I have chosen India and Australia. Hence the plots below show the best performers(batsmen and bowlers) of either team against their adversary. The below analyses are based on all ODI confrontations between Australia and India. There are a total of 40 head-on confrontations between Aus-India.

allMatches <- getOppositionDetails()

10.Batsman partnership against opposition (all ODI matches)

The report below gives the batsman who has had the best partnetship in Australia-India matches. On the Indian side the top 3 are Mahendra Singh Dhoni, Rohit Sharma followed by Tendulkar. Ponting, Hussey and Bailey are the top 3 for the Autralians. As far as ODI is concerned Dhoni towers over all others. Of course similar analyses can be done between India-Pakistan, India-South Africa etc. But at least against the Australians we need to have Dhoni and Rohit Sharma I think The report below gives a summary of the partnership runs

report <- batsmanPartnershipOppn(allMatches,"India",report="summary")
report
## Source: local data frame [44 x 2]
## 
##         batsman partnershipRuns
##          (fctr)           (dbl)
## 1      MS Dhoni            1156
## 2     RG Sharma             914
## 3  SR Tendulkar             910
## 4       V Kohli             902
## 5     G Gambhir             532
## 6  Yuvraj Singh             524
## 7      SK Raina             509
## 8      S Dhawan             471
## 9      V Sehwag             287
## 10   RV Uthappa             279
## ..          ...             ...
report <- batsmanPartnershipOppn(allMatches,"Australia",report="summary")
report
## Source: local data frame [48 x 2]
## 
##       batsman partnershipRuns
##        (fctr)           (dbl)
## 1  RT Ponting             876
## 2  MEK Hussey             753
## 3   GJ Bailey             610
## 4   SR Watson             609
## 5   MJ Clarke             607
## 6   ML Hayden             573
## 7   A Symonds             536
## 8    AJ Finch             525
## 9   SPD Smith             467
## 10  DA Warner             391
## ..        ...             ...

The report below gives a detailed breakup of the partnership runs

report <- batsmanPartnershipOppn(allMatches,"India",report="detailed")
report[1:40,]
##         batsman      nonStriker runs partnershipRuns
## 1      MS Dhoni    SR Tendulkar   71            1156
## 2      MS Dhoni        R Dravid   27            1156
## 3      MS Dhoni    Yuvraj Singh  128            1156
## 4      MS Dhoni        SK Raina  187            1156
## 5      MS Dhoni          M Kaif    6            1156
## 6      MS Dhoni        D Mongia   23            1156
## 7      MS Dhoni Harbhajan Singh   16            1156
## 8      MS Dhoni       IK Pathan   42            1156
## 9      MS Dhoni       G Gambhir  117            1156
## 10     MS Dhoni       RG Sharma   56            1156
## 11     MS Dhoni      RV Uthappa   51            1156
## 12     MS Dhoni     S Sreesanth   19            1156
## 13     MS Dhoni        I Sharma    4            1156
## 14     MS Dhoni         P Kumar    1            1156
## 15     MS Dhoni         V Kohli   78            1156
## 16     MS Dhoni       RA Jadeja  103            1156
## 17     MS Dhoni        R Ashwin   78            1156
## 18     MS Dhoni        R Sharma    2            1156
## 19     MS Dhoni   R Vinay Kumar   30            1156
## 20     MS Dhoni          Z Khan    6            1156
## 21     MS Dhoni       AM Rahane   47            1156
## 22     MS Dhoni       MK Pandey   34            1156
## 23     MS Dhoni Gurkeerat Singh    1            1156
## 24     MS Dhoni         B Kumar   26            1156
## 25     MS Dhoni        RR Powar    3            1156
## 26    RG Sharma    SR Tendulkar   66             914
## 27    RG Sharma    Yuvraj Singh    5             914
## 28    RG Sharma        SK Raina   69             914
## 29    RG Sharma        MS Dhoni   90             914
## 30    RG Sharma               4    0             914
## 31    RG Sharma       G Gambhir   35             914
## 32    RG Sharma         V Kohli  248             914
## 33    RG Sharma       RA Jadeja   13             914
## 34    RG Sharma        R Ashwin   11             914
## 35    RG Sharma        S Dhawan  247             914
## 36    RG Sharma       AM Rahane   77             914
## 37    RG Sharma       MK Pandey   53             914
## 38 SR Tendulkar        R Dravid   12             910
## 39 SR Tendulkar        V Sehwag  111             910
## 40 SR Tendulkar    Yuvraj Singh  173             910
report <- batsmanPartnershipOppn(allMatches,"Australia",report="detailed")
report[1:40,]
##       batsman   nonStriker runs partnershipRuns
## 1  RT Ponting    SR Watson  140             876
## 2  RT Ponting    DR Martyn   35             876
## 3  RT Ponting    MJ Clarke   63             876
## 4  RT Ponting    BJ Haddin   33             876
## 5  RT Ponting    ML Hayden  117             876
## 6  RT Ponting    A Symonds   41             876
## 7  RT Ponting   MEK Hussey   74             876
## 8  RT Ponting AC Gilchrist  113             876
## 9  RT Ponting     TD Paine   68             876
## 10 RT Ponting     CL White   84             876
## 11 RT Ponting    DA Warner    6             876
## 12 RT Ponting      MS Wade    9             876
## 13 RT Ponting    DJ Hussey   20             876
## 14 RT Ponting     SE Marsh   45             876
## 15 RT Ponting     BJ Hodge   28             876
## 16 MEK Hussey   RT Ponting   85             753
## 17 MEK Hussey    MJ Clarke   74             753
## 18 MEK Hussey    BJ Haddin   24             753
## 19 MEK Hussey      GB Hogg   19             753
## 20 MEK Hussey   MG Johnson   43             753
## 21 MEK Hussey     SR Clark    4             753
## 22 MEK Hussey    ML Hayden    5             753
## 23 MEK Hussey    A Symonds    5             753
## 24 MEK Hussey        B Lee   39             753
## 25 MEK Hussey   NW Bracken    3             753
## 26 MEK Hussey     JR Hopes   83             753
## 27 MEK Hussey     CL White  185             753
## 28 MEK Hussey    DA Warner   10             753
## 29 MEK Hussey      MS Wade   35             753
## 30 MEK Hussey    DJ Hussey   10             753
## 31 MEK Hussey   PJ Forrest   59             753
## 32 MEK Hussey     AC Voges   59             753
## 33 MEK Hussey MC Henriques   11             753
## 34  GJ Bailey    SR Watson   79             610
## 35  GJ Bailey    BJ Haddin    7             610
## 36  GJ Bailey            4    0             610
## 37  GJ Bailey    DA Warner    6             610
## 38  GJ Bailey     AJ Finch   22             610
## 39  GJ Bailey    SPD Smith  149             610
## 40  GJ Bailey   GJ Maxwell  133             610

11. Partnership runs against opposition (all ODI matches)

The chart below gives the overall partnership. It is graphical representation of the chart above.

batsmanPartnershipOppnChart(allMatches,"India")

partnershipOppnChart,-1

batsmanPartnershipOppnChart(allMatches,"Australia")

partnershipOppnChart,-2

12. Batsmen vs Bowlers against opposition (all ODI matches)

The chart below gives how the batsmen fared against the bowlers of the opposition.)

batsmanVsBowlersOppn(allMatches,"India")

batsmenVsBowlers,-1

batsmanVsBowlersOppn(allMatches,"Australia"

bowlersVsBatsmen,-2

13. Team batting details opposition (all ODI matches)

The table below gives the total runs scores by each batsman and is dsiplayed in descending order. Dhoni, Rohit Sharma and Tendulkar are the top 3 for India and Ponting, Hussey and Bailey lead for Australia

teamBattingDetailsOppn(allMatches,"India")
## Total= 8313
## Source: local data frame [44 x 5]
## 
##         batsman  runs fours sixes ballsPlayed
##          (fctr) (dbl) (int) (int)       (int)
## 1      MS Dhoni  1156    78    22        1406
## 2     RG Sharma   914    72    24        1015
## 3  SR Tendulkar   910   103     6        1157
## 4       V Kohli   902    87     6         961
## 5     G Gambhir   532    43     2         677
## 6  Yuvraj Singh   524    52    11         664
## 7      SK Raina   509    43    11         536
## 8      S Dhawan   471    55     6         470
## 9      V Sehwag   287    42     4         303
## 10   RV Uthappa   279    28     7         295
## ..          ...   ...   ...   ...         ...
teamBattingDetailsOppn(allMatches,"Australia")
## Total= 9993
## Source: local data frame [48 x 5]
## 
##       batsman  runs fours sixes ballsPlayed
##        (fctr) (dbl) (int) (int)       (int)
## 1  RT Ponting   876    86     8        1107
## 2  MEK Hussey   753    56     5         816
## 3   GJ Bailey   610    50    13         578
## 4   SR Watson   609    81    10         653
## 5   MJ Clarke   607    45     5         786
## 6   ML Hayden   573    72     8         660
## 7   A Symonds   536    43    15         543
## 8    AJ Finch   525    52     9         617
## 9   SPD Smith   467    44     7         431
## 10  DA Warner   391    40     6         385
## ..        ...   ...   ...   ...         ...

14. Bowler vs Batsman against opposition (all ODI matches)

The charts below give the performance of the bowlers against batsman

bowlersVsBatsmanOppn(allMatches,"India")

bowlersVsBatsmen,-1

bowlersVsBatsmanOppn(allMatches,"Australia")

bowlersVsBatsmen,-2

15. Bowling details against opposition (all ODI matches)

For matches between Australia and India the top 3 wicket takes for Australia are Mitchell Johnson, Brett Lee and JR Faulkner. For India it is Ishant Sharma, Harbhajan Singh and R A Jadeja.

teamBowlingDetailsOppn(allMatches,"India")
## Source: local data frame [39 x 5]
## 
##          bowler overs maidens  runs wickets
##          (fctr) (int)   (int) (dbl)   (dbl)
## 1    MG Johnson    40       0  1012      18
## 2         B Lee    21       1   667      15
## 3   JP Faulkner    33       0   598      13
## 4     SR Watson    24       0   532      12
## 5       GB Hogg    15       0   427      12
## 6      CJ McKay    17       0   403      12
## 7    NW Bracken    28       2   429      11
## 8      MA Starc    12       2   251      11
## 9      JR Hopes    18       0   346       8
## 10 DE Bollinger    11       4   174       8
## ..          ...   ...     ...   ...     ...
teamBowlingDetailsOppn(allMatches,"Australia")
## Source: local data frame [37 x 5]
## 
##             bowler overs maidens  runs wickets
##             (fctr) (int)   (int) (dbl)   (dbl)
## 1         I Sharma    44       1   739      20
## 2  Harbhajan Singh    40       0   926      15
## 3        RA Jadeja    39       0   867      14
## 4        IK Pathan    42       1   702      11
## 5         UT Yadav    37       2   606      10
## 6          P Kumar    27       0   501      10
## 7           Z Khan    33       1   500      10
## 8      S Sreesanth    34       0   454      10
## 9         R Ashwin    43       0   680       9
## 10   R Vinay Kumar    31       1   380       9
## ..             ...   ...     ...   ...     ...

16. Wicket kind against opposition (all ODI matches)

These charts give the wicket kind for each of the top 9 bowlers from each side.

teamBowlingWicketKindOppn(allMatches,"India")

wicketKindOppn-1

teamBowlingWicketKindOppn(allMatches,"Australia")

wicketKindOppn-2

17. Wicket runs against opposition (all ODI matches)

These given the runs conceded by the bowlers

teamBowlingWicketRunsOppn(allMatches,"India")

wicketRunsOppn-1

teamBowlingWicketRunsOppn(allMatches,"Australia")

wicketRunsOppn-2

18. Wickets against opposition (all ODI matches)

The charts below depict the wickets taken by each bowler. If you notice Mitchel Johnson has the most wickets.

teamBowlingWicketsOppn(allMatches,"India")

wicketOppn,-1

teamBowlingWicketsOppn(allMatches,"Australia")

wicketOppn,-2

Conclusion :

Some key findings

In the ODI confrontations between Australia and India the top 3 batsmen of India are

  1. Mahendra Dhoni 2.Rohit Sharma
  2. Sachin Tendulkar.

The best bowlers for India are

  1. Ishant Sharma
  2. Harbhajan Singh
  3. R A Jadeja

For the Australian side the top 3 batsmen are

  1. R A Ponting
  2. M Hussey
  3. G J Bailey

The top 3 bowlers are

1. Mitchell Johnson
2. Brett Lee
3. J P Faulkner

Note: This is the first part of my yorkr package. I will be adding more functions in the weeks to come. Clearly the data from Cricsheet is more granular and allows for more detailed analyses. I should have the next set of functions soon.

(Take a look at The making of cricket package yorkr – Part 2)

Important note: Do check out my other posts using yorkr at yorkr-posts

Watch this space!!!

Also see

  1. Cricket analytics with cricketr
  2. Introducing cricketr! : An R package to analyze performances of cricketers
  3. Sixer – R package cricketr’s new Shiny avatar
  4. Informed choices through Machine Learning – Analyzing Kohli, Tendulkar and Dravid

You may also like

  1. Natural language processing: What would Shakespeare say?
  2. Revisiting crimes against women in India
  3. Literacy in India – A deepR dive
  4. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
  5. Singularity
  6. Simulating an Edge shape in Android
  7. Programming Zen and now – Sime essential tips
  8. Rock N’ Roll with Bluemix, Cloudant & NodeExpress
  9. Architecting a cloud based IP Multimedia System (IMS)

Literacy in India – A deepR dive

Published in R-bloggers: Literacy in India – A deepR dive
You can do magic!
You can have anything,
That you desire
Magic…
You can do magic – song by America (1982)

That is exactly how I feel when I write code in R. A few lines of R, lo behold, hundreds of rows and columns are magically transformed into  easily understandable graphs, regression curves or choropleth maps. (By the way, the song is a really cool! Listen to it if you have not heard it before). You really can do magic with R

In this post I do a deep dive into literacy in India The dataset is taken from Open Government Data (OGD) platform India was used for this purpose. This data is based on the 2001 census. Though the data is a little dated, it is extremely rich with literacy details across different age groups, and over all Indian States. The data includes the total number of persons/males/females who are in the primary, middle.matric, college,technical diploma, non-technical diploma and so on. In fact the data also includes the educational background of people in the districts in each state. I slice and dice the data across multiple parameters. I have created an interactive Shiny App which will provide very detailed visualization based on the parameters chosen

Do try out my interactive Shiny app : IndiaLiteracy

The entire code for this app is on GitHub. Feel free to download/clone/fork/modify or enhance the code – IndiaLiteracy

For analyzing   such a rich data set as the Census data of 2001, I create 4 tabs
1) State Literacy
2) Educational Levels vs Age
3) India Literacy and
4) District Literacy

Here are the details of these 4 tabs in my Shiny app

A) State Literacy
This tab provides the age wise distribution of people (Persons/Males/Females) who attend educational institutions. This is shown as a barplot. The plot also includes the national average. In the plot below which is for entire India we see that the national average

1

The distribution of females attending primary school in the state of Haryana is shown. Also included is the national average. As can be seen there are options for (Total/Urban/Rural) against (Persons/Males/Females) and whether these people attend educational institutions are illiterate of literate.

2

I also have another option under “Who’ which is “All” This will plot the age wise distribution of males/females/persons in urban/rural or entire state.

3

B. Educational Institutions vs Age plot

This plot displays the the educational institutions attended by people in a particular age group. So for example in the state of Orissa for the 18 year age group we can see that there persons who are in (Primary, Matric, Higher Secondary, Non-Technical Diploma and Technical Diploma). The bar length for each color is the percentage of the total persons at that level of education

4

C. Literacy across India
This tab plots a chorpleth map for a region(Urban+Rural, Urban, Rural), Who(Persons, Males, Females) and the literacy level (attending educational institutions, primary, higher secondary, Matric etc) across the whole of India.

6

D. Literacy within a state
This tab plots a chorpleth map of literacy in the districts of a state. A sample plot for Karnataka is shown below

7

E. Key observations

There is a wealth of insights you can glean by looking at the various charts. Here a few insights from my initial observations
1) The literacy in Kerala across ages is higher than the national average while in Bihar it is less than the national average

a) Kerala

8b) Bihar

9
2) In Rajasthan The Males Attending education instituions is higher than the national average while for females it less than the national average. However the situation is reverse in Chandigarh where there are the percentage of females attending education instiuons is higher than the national average and the males

a) Rajasthan

10b) Chandigarh

11
3) When we look at the number of persons attending educational institution across India the north-eastern states lead with Manipur, Nagaland and Sikkim in the top 3.

12

We have heard that Kerala is the most literate state. But  it looks like Manipur, Nagaland, Sikkim actually edge Kerala out. If we look at the State literacy chart for Kerala and Manipur this becomes more clear

a) Kerala

13

b) Manipur

14

It can be seen that in Manipur the number of persons attending educational instition in the age range 13-24 years it is much higher than the national average and much higher than Kerala

4) If we take a look at the District wise literacy for the state of Bihar we see that the literacy is lower in the north eastern districts.,

15

5) Here is another interesting observation I made. The top 3 states which are most ‘literate with no education’ are i) Rajasthan ii) Madhya Pradesh iii) Chhattisgarh

16

While I have included several charts with accompanying explanation, this is largely unnecessary as  most of the charts are self-explanatory.

Do try out the Shiny app and see for yourself the literacy in each state/district/age group educational  level etc –IndiaLiteracy

Feel free to clone/fork my code and make your own enhancements –IndiaLiteracy

You may also like
1.  Natural Language Processing: What would Shakespeare say?
2. Introducing cricketr! : An R package to analyze performances of cricketers
3. Revisiting crimes against women in India
4. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
5. Re-working the Lucy-Richardson Algorithm in OpenCV
6.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
7.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
8. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
9. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
10.  Simulating an Edge Shape in Android

Revisiting crimes against women in India

Here I go again, raking the muck about crimes against women in India. My earlier post “A crime map of India in R: Crimes against women in India” garnered a lot of responses from readers. In fact one of the readers even volunteered to create the only choropleth map in that post. The data for this post is taken from http://data.gov.in. You can download the data from the link “Crimes against women in India

I was so impressed by the choropleth map that I decided to do that for all crimes against women.(Wikipedia definition: A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map). Personally, I think pictures tell the story better. I am sure you will agree!

So here, I have it a Shiny app which will plot choropleth maps for a chosen crime in a given year.

You can try out my interactive Shiny app at  Crimes against women in India

Checkout out my book  on Amazon available in both  Paperback ($9.99) and a Kindle version($6.99/Rs449/). (see ‘Practical Machine Learning with R and Python – Machine Learning in stereo‘)

The following technique can be used to determine the ‘goodness’ of a hypothesis or how well the hypothesis can fit the data and can also generalize to new examples not in the training set.

In the picture below  are the details of  ‘Rape” in the year 2015.
1

Interestingly the ‘Total Crime against women’ in 2001 shows the Top 5 as
1) Uttar Pradresh 2) Andhra Pradesh 3) Madhya Pradesh 4) Maharashtra 5) Rajasthan

2

But in 2015 West Bengal tops the list, as the real heavy weight in crimes against women. The new pecking order in 2015 for ‘Total Crimes against Women’ is

1) West Bengal 2) Andhra Pradesh 3) Uttar Pradesh  4) Rajasthan 5) Maharashtra

3

Similarly for rapes, West Bengal is nowhere in the top 5 list in 2001. In 2015, it is in second only to the national rape leader Madhya Pradesh.  Also in 2001 West Bengal is not in the top 5 for any of 6 crime heads. But in 2015, West Bengal is in the top 5 of 6 crime heads. The emergence of West Bengal as the leader in Crimes against Women is due to the steep increase in crime rate  over the years.Clearly the law and order situation in West Bengal is heading south.

In Dowry Deaths, UP, Bihar, MP, West Bengal lead the pack, and in that order in 2015.

The usual suspects for most crime categories are West Bengal, UP, MP, AP & Maharashtra.

The state-wise crime charts plot the incidence of the crime (rape, dowry death, assault on women etc) over the years. Data for each state and for each crime was available from 2001-2013. The data for period 2014-2018 are projected using linear regression. The shaded portion in the plots indicate the 95% confidence level in the prediction (i.e in other words we can be 95% certain that the true mean of the crime rate in the projected years will lie within the shaded region)

4

Several  interesting requests came from readers to my earlier post. Some of them were to to plot the crimes as function of population and per capita income of the State/Union Territory to see if the plots  throw up new crime leaders. I have not got the relevant state-wise population distribution data yet. I intend to update this when I get my hands on this data.

I have included the crimes.csv which has been used to generate the visualization. However for the Shiny app I save this as .RData for better performance of the app.

You can clone/download  the code for the Shiny app from GitHub at  crimesAgainWomenIndia

Please checkout my Shiny app : Crimes against women

I also intend to add further interactivity to my visualizations in a future version. Watch this space. I’ll be back!

You may like
1. My book ‘Practical Machine Learning with R and Python’ on Amazon
2. Natural Language Processing: What would Shakespeare say?
3. Introducing cricketr! : An R package to analyze performances of cricketers
4. A peek into literacy in India: Statistical Learning with R
5. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
6. Re-working the Lucy-Richardson Algorithm in OpenCV
7.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
8.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
9. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
10. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
11.  Simulating an Edge Shape in Android

A crime map of India in R – Crimes against women

In this post I take a look at the gory crime scene across India to determine which states are the heavy weights in crimes. Who is the undisputed champion of rapes in a year? Which state excels in cruelty by husbands and the relatives to wives? Which state leads in dowry deaths? To get the answers to these questions I perform analysis of the state-wise crime data against women with the data  from Open Government Data (OGD) Platform India. The dataset  for this analysis was taken for the Crime against Women from OGD.

(Do see my post Revisiting crimes against women in India which includes an interactive Shiny app)

The data in OGD is available for crimes against women in different states under different ‘crime heads’ like rape, dowry deaths, kidnapping & abduction etc. The data is available for years from 2001 to 2012. This data is plotted as a scatter plot and a linear regression line is then fit on the available data. Based on this linear model,  the projected incidence of crimes likes rapes, dowry deaths, abduction & kidnapping is performed for each of the states. This is then used to build a table of  different crime heads for all the states predicting the number of crimes till the year 2018. Fortunately, R  crunches through the data sets quite easily. The overall projections of crimes against as women is shown below based on the linear regression for each of these states

Projections over the next couple of years
The tables below are based on the projected incidence of crimes under various categories assuming that these states maintain their torrid crime rate. A cursory look at the tables below clearly indicate the Uttar Pradesh is the undisputed heavy weight champion in 4 of 5 categories shown. Maharashtra and Andhra Pradesh take 2nd and 3rd ranks in the total crimes against women and are significant contenders in other categories too.

A) Projected rapes in India
The top 3 heavy weights in projected rapes over the next 5 years are 1) Madhya Pradesh  2) Uttar Pradesh 3) Maharashtra

rapes

Full table: Rape.csv
B) Projected Dowry deaths in India 
dowrydeaths

Full table: Dowry Deaths.csv
C) Kidnapping & Abduction
kidnapping

Full table: Kidnapping&Abduction.csv
D) Cruelty by husband & relatives
cruelty

Full table: Cruelty by husbands_relatives.csv
E) Total crimes against women

total

Full table: Total crimes.csv
Here is a visualization of ‘Total crimes against women’  created as a choropleth map

1The implementation for this analysis was done using the  R language.  The R code, dataset, output and the crime charts can be accessed at GitHub at crime-against-women

Directory structure
– R code
dataset used
output
statewise-crime-charts

The analysis has been completely parametrized. A quick look at the implementation is shown  below. A function state crime was created as given below

statecrime.R
This function (statecrime.R)  does the following
a) Creates a scatter plot for the state for the crime head
b) Computes a best linear regression fir and draws this line
c) Uses the model parameters (coefficients) to compute the projected crime in the years to come
d) Writes the projected values to a text file
c) Creates a directory with the name of the state if it does not exist and stores the jpeg of the plot there.

statecrime <- function(indiacrime, row, state,crime) {
year <- c(2001:2012)
# Make seperate folders for each state
if(!file.exists(state)) {
dir.create(state)
}
setwd(state)
crimeplot <- paste(crime,".jpg")
jpeg(crimeplot)

# Plot the details of the crime
plot(year,thecrime ,pch= 15, col="red", xlab = "Year", ylab= crime, main = atitle,
,xlim=c(2001,2018),ylim=c(ymin,ymax), axes=FALSE)

A linear regression line is fit using ‘lm’

# Fit a linear regression model
lmfit <-lm(thecrime~year)
# Draw the lmfit line
abline(lmfit)

The model parameters are then used to draw the line and also project for the next 5 years from 2013 to 2018

nyears <-c(2013:2018)
nthecrime <- rep(0,length(nyears))
# Projected crime incidents from 2013 to 2018 using a linear regression model
for (i in seq_along(nyears)) {
nthecrime[i] <- lmfit$coefficients[2] * nyears[i] + lmfit$coefficients[1]
}

The projected data for each state is appended into an appropriate file which is then used to display the tables at the top of this post

# Write the projected crime rate in a file
nthecrime <- round(nthecrime,2)
nthecrime <- c(state, nthecrime, "\n")
print(nthecrime)
#write(nthecrime,file=fileconn, ncolumns=9, append=TRUE,sep="\t")
filename <- paste(crime,".txt")
# Write the output in the ./output directory
setwd("./output")
cat(nthecrime, file=filename, sep=",",append=TRUE)

The above function is then repeatedly called for each state for the different crime heads. (Note: It is possible to check the read both the states and crime heads with R and perform the computation repeatedly. However, I have done this the manual way!)

crimereport.R
# 1. Andhra Pradesh
i <- 1
statecrime(indiacrime, i, "Andhra Pradesh","Rape")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Kidnapping& Abduction")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Dowry Deaths")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Assault on Women")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Insult to modesty")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Cruelty by husband_relatives")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Imporation of girls from foreign country")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Immoral traffic act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Dowry prohibition act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Indecent representation of Women Act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Commission of Sati Act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Total crimes against women")
...
...

and so on for all the states

Charts for different crimes against women

1) Uttar Pradesh

The plots for  Uttar Pradesh  are shown below

Rapes in UP

Rape

Dowry deaths in UP

Dowry Deaths

Cruelty by husband/relative

Cruelty by husband_relatives

Total crimes against women in Uttar Pradesh

Total crimes against women

You can find more charts in GitHub by clicking Uttar Pradesh

2) Maharashtra : Some of the charts for Maharashtra

Rape

Rape

Kidnapping & Abduction

Kidnapping& Abduction

Total crimes against women in Maharashtra

Total crimes against women

More crime charts  for Maharashtra

Crime charts can be accessed for the following states from GitHub ( in alphabetical order)

3) Andhra Pradesh
4) Arunachal Pradesh
5) Assam
6) Bihar
7) Chattisgarh
8) Delhi (Added as an exception based on its notoriety)
9) Goa
10) Gujarat
11) Haryana
12) Himachal Pradesh
13) Jammu & Kashmir
14) Jharkhand
15) Karnataka
16) Kerala
17) Madhya Pradesh
18) Manipur
19) Meghalaya
20) Mizoram
21) Nagaland
22) Odisha
23) Punjab
24) Rajasthan
25) Sikkim
26) Tamil Nadu
27) Tripura
28) Uttarkhand
29) West Bengal

The code, dataset and the charts can be cloned/forked from GitHub at crime-against-women

Let me know if you find any interesting patterns in the data.
Thoughts, comments welcome!


See also
My book ‘Practical Machine Learning with R and Python’ on Amazon
A peek into literacy in India: Statiscal learning with R

You may also like
– Analyzing cricket’s batting legends – Through the mirage with R
– What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
– Bend it like Bluemix, MongoDB with autoscaling – Part 1