Revisiting crimes against women in India

Here I go again, raking the muck about crimes against women in India. My earlier post “A crime map of India in R: Crimes against women in India” garnered a lot of responses from readers. In fact one of the readers even volunteered to create the only choropleth map in that post. The data for this post is taken from You can download the data from the link “Crimes against women in India

I was so impressed by the choropleth map that I decided to do that for all crimes against women.(Wikipedia definition: A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map). Personally, I think pictures tell the story better. I am sure you will agree!

So here, I have it a Shiny app which will plot choropleth maps for a chosen crime in a given year.

You can try out my interactive Shiny app at  Crimes against women in India

Checkout out my book  on Amazon available in both  Paperback ($9.99) and a Kindle version($6.99/Rs449/). (see ‘Practical Machine Learning with R and Python – Machine Learning in stereo‘)

The following technique can be used to determine the ‘goodness’ of a hypothesis or how well the hypothesis can fit the data and can also generalize to new examples not in the training set.

In the picture below  are the details of  ‘Rape” in the year 2015.

Interestingly the ‘Total Crime against women’ in 2001 shows the Top 5 as
1) Uttar Pradresh 2) Andhra Pradesh 3) Madhya Pradesh 4) Maharashtra 5) Rajasthan


But in 2015 West Bengal tops the list, as the real heavy weight in crimes against women. The new pecking order in 2015 for ‘Total Crimes against Women’ is

1) West Bengal 2) Andhra Pradesh 3) Uttar Pradesh  4) Rajasthan 5) Maharashtra


Similarly for rapes, West Bengal is nowhere in the top 5 list in 2001. In 2015, it is in second only to the national rape leader Madhya Pradesh.  Also in 2001 West Bengal is not in the top 5 for any of 6 crime heads. But in 2015, West Bengal is in the top 5 of 6 crime heads. The emergence of West Bengal as the leader in Crimes against Women is due to the steep increase in crime rate  over the years.Clearly the law and order situation in West Bengal is heading south.

In Dowry Deaths, UP, Bihar, MP, West Bengal lead the pack, and in that order in 2015.

The usual suspects for most crime categories are West Bengal, UP, MP, AP & Maharashtra.

The state-wise crime charts plot the incidence of the crime (rape, dowry death, assault on women etc) over the years. Data for each state and for each crime was available from 2001-2013. The data for period 2014-2018 are projected using linear regression. The shaded portion in the plots indicate the 95% confidence level in the prediction (i.e in other words we can be 95% certain that the true mean of the crime rate in the projected years will lie within the shaded region)


Several  interesting requests came from readers to my earlier post. Some of them were to to plot the crimes as function of population and per capita income of the State/Union Territory to see if the plots  throw up new crime leaders. I have not got the relevant state-wise population distribution data yet. I intend to update this when I get my hands on this data.

I have included the crimes.csv which has been used to generate the visualization. However for the Shiny app I save this as .RData for better performance of the app.

You can clone/download  the code for the Shiny app from GitHub at  crimesAgainWomenIndia

Please checkout my Shiny app : Crimes against women

I also intend to add further interactivity to my visualizations in a future version. Watch this space. I’ll be back!

You may like
1. My book ‘Practical Machine Learning with R and Python’ on Amazon
2. Natural Language Processing: What would Shakespeare say?
3. Introducing cricketr! : An R package to analyze performances of cricketers
4. A peek into literacy in India: Statistical Learning with R
5. Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
6. Re-working the Lucy-Richardson Algorithm in OpenCV
7.  What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
8.  Bend it like Bluemix, MongoDB with autoscaling – Part 2
9. TWS-4: Gossip protocol: Epidemics and rumors to the rescue
10. Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
11.  Simulating an Edge Shape in Android

A crime map of India in R – Crimes against women

In this post I take a look at the gory crime scene across India to determine which states are the heavy weights in crimes. Who is the undisputed champion of rapes in a year? Which state excels in cruelty by husbands and the relatives to wives? Which state leads in dowry deaths? To get the answers to these questions I perform analysis of the state-wise crime data against women with the data  from Open Government Data (OGD) Platform India. The dataset  for this analysis was taken for the Crime against Women from OGD.

(Do see my post Revisiting crimes against women in India which includes an interactive Shiny app)

The data in OGD is available for crimes against women in different states under different ‘crime heads’ like rape, dowry deaths, kidnapping & abduction etc. The data is available for years from 2001 to 2012. This data is plotted as a scatter plot and a linear regression line is then fit on the available data. Based on this linear model,  the projected incidence of crimes likes rapes, dowry deaths, abduction & kidnapping is performed for each of the states. This is then used to build a table of  different crime heads for all the states predicting the number of crimes till the year 2018. Fortunately, R  crunches through the data sets quite easily. The overall projections of crimes against as women is shown below based on the linear regression for each of these states

Projections over the next couple of years
The tables below are based on the projected incidence of crimes under various categories assuming that these states maintain their torrid crime rate. A cursory look at the tables below clearly indicate the Uttar Pradesh is the undisputed heavy weight champion in 4 of 5 categories shown. Maharashtra and Andhra Pradesh take 2nd and 3rd ranks in the total crimes against women and are significant contenders in other categories too.

A) Projected rapes in India
The top 3 heavy weights in projected rapes over the next 5 years are 1) Madhya Pradesh  2) Uttar Pradesh 3) Maharashtra


Full table: Rape.csv
B) Projected Dowry deaths in India 

Full table: Dowry Deaths.csv
C) Kidnapping & Abduction

Full table: Kidnapping&Abduction.csv
D) Cruelty by husband & relatives

Full table: Cruelty by husbands_relatives.csv
E) Total crimes against women


Full table: Total crimes.csv
Here is a visualization of ‘Total crimes against women’  created as a choropleth map

1The implementation for this analysis was done using the  R language.  The R code, dataset, output and the crime charts can be accessed at GitHub at crime-against-women

Directory structure
– R code
dataset used

The analysis has been completely parametrized. A quick look at the implementation is shown  below. A function state crime was created as given below

This function (statecrime.R)  does the following
a) Creates a scatter plot for the state for the crime head
b) Computes a best linear regression fir and draws this line
c) Uses the model parameters (coefficients) to compute the projected crime in the years to come
d) Writes the projected values to a text file
c) Creates a directory with the name of the state if it does not exist and stores the jpeg of the plot there.

statecrime <- function(indiacrime, row, state,crime) {
year <- c(2001:2012)
# Make seperate folders for each state
if(!file.exists(state)) {
crimeplot <- paste(crime,".jpg")

# Plot the details of the crime
plot(year,thecrime ,pch= 15, col="red", xlab = "Year", ylab= crime, main = atitle,
,xlim=c(2001,2018),ylim=c(ymin,ymax), axes=FALSE)

A linear regression line is fit using ‘lm’

# Fit a linear regression model
lmfit <-lm(thecrime~year)
# Draw the lmfit line

The model parameters are then used to draw the line and also project for the next 5 years from 2013 to 2018

nyears <-c(2013:2018)
nthecrime <- rep(0,length(nyears))
# Projected crime incidents from 2013 to 2018 using a linear regression model
for (i in seq_along(nyears)) {
nthecrime[i] <- lmfit$coefficients[2] * nyears[i] + lmfit$coefficients[1]

The projected data for each state is appended into an appropriate file which is then used to display the tables at the top of this post

# Write the projected crime rate in a file
nthecrime <- round(nthecrime,2)
nthecrime <- c(state, nthecrime, "\n")
#write(nthecrime,file=fileconn, ncolumns=9, append=TRUE,sep="\t")
filename <- paste(crime,".txt")
# Write the output in the ./output directory
cat(nthecrime, file=filename, sep=",",append=TRUE)

The above function is then repeatedly called for each state for the different crime heads. (Note: It is possible to check the read both the states and crime heads with R and perform the computation repeatedly. However, I have done this the manual way!)

# 1. Andhra Pradesh
i <- 1
statecrime(indiacrime, i, "Andhra Pradesh","Rape")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Kidnapping& Abduction")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Dowry Deaths")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Assault on Women")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Insult to modesty")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Cruelty by husband_relatives")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Imporation of girls from foreign country")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Immoral traffic act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Dowry prohibition act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Indecent representation of Women Act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Commission of Sati Act")
i <- i+38
statecrime(indiacrime, i, "Andhra Pradesh","Total crimes against women")

and so on for all the states

Charts for different crimes against women

1) Uttar Pradesh

The plots for  Uttar Pradesh  are shown below

Rapes in UP


Dowry deaths in UP

Dowry Deaths

Cruelty by husband/relative

Cruelty by husband_relatives

Total crimes against women in Uttar Pradesh

Total crimes against women

You can find more charts in GitHub by clicking Uttar Pradesh

2) Maharashtra : Some of the charts for Maharashtra



Kidnapping & Abduction

Kidnapping& Abduction

Total crimes against women in Maharashtra

Total crimes against women

More crime charts  for Maharashtra

Crime charts can be accessed for the following states from GitHub ( in alphabetical order)

3) Andhra Pradesh
4) Arunachal Pradesh
5) Assam
6) Bihar
7) Chattisgarh
8) Delhi (Added as an exception based on its notoriety)
9) Goa
10) Gujarat
11) Haryana
12) Himachal Pradesh
13) Jammu & Kashmir
14) Jharkhand
15) Karnataka
16) Kerala
17) Madhya Pradesh
18) Manipur
19) Meghalaya
20) Mizoram
21) Nagaland
22) Odisha
23) Punjab
24) Rajasthan
25) Sikkim
26) Tamil Nadu
27) Tripura
28) Uttarkhand
29) West Bengal

The code, dataset and the charts can be cloned/forked from GitHub at crime-against-women

Let me know if you find any interesting patterns in the data.
Thoughts, comments welcome!

See also
My book ‘Practical Machine Learning with R and Python’ on Amazon
A peek into literacy in India: Statiscal learning with R

You may also like
– Analyzing cricket’s batting legends – Through the mirage with R
– What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
– Bend it like Bluemix, MongoDB with autoscaling – Part 1