A peek into literacy in India: Statistical Learning with R

In this post I take a peek into the literacy landscape across India as a whole using R language.  The dataset from Open Government Data (OGD) platform India was used for this purpose. This data is based on the 2011 census. The XL sheets for the states were downloaded for data for each state. The Union Territories were not included in the analysis.

A thin slice of the data from each data set was taken from the data for each individual state (Note: This could also have been done from the consolidated india.xls XL sheet which I came to know of, much later).

I calculate the following for age group

Males (%) attending education institutions = (Males attending educational institutions * 100)/ Total males
Females (%) attending education institutions = (Females attending educational institutions * 100)/ Total Females

This is then plotted as a bar chart with the age distribution. I then overlay the national average for each state over the barchart to check whether the literacy in the state is above or below the national average. The implementation in R is included below

The code and data can be forked/cloned from GitHub at india-literacy

The results based on the analysis is given below.

  1. Kerala is clearly the top ranker with the literacy rates for both males and females well above the average
  2. The states with above average literacy are – Kerala, Himachal Pradesh, Uttarakhand, Tamil Nadu, Haryana, Himachal Pradesh, Karnataka, Maharashtra, Punjab, Uttarakhand
  3. The states with just about average literacy – Karnataka, Andhra Pradesh, Chattisgarh, Gujarat, Madhya Pradesh, Odisha, West Bengal
  4. The states with below average literacy – Uttar Pradesh, Bihar, Jharkhand, Arunachal Pradesh, Assam, Jammu and Kashmir, Jharkhand, Rajasthan

 

A brief implementation of the basic code in R is shown bwelow

# Read the Arunachal Pradhesh literacy related data
arunachal = read.csv("arunachal.csv")
# Create as a matrix
arunachalmat = as.matrix(arunachal)
arunachalTotal = arunachalmat[2:19,7:28]
# Take transpose as this is necessary for plotting bar charts
arunachalmat = t(arunachalTotal)
# Set the scipen option to format the y axis (otherwise prints as e^05 etc.)
getOption("scipen")
opt <- options("scipen" = 20)
getOption("scipen")
#Create a vector of total Males & Females
arunachalTotalM = arunachalmat[3,]
arunachalTotalF = arunachalmat[4,]
#Create a vector of males & females attending education institution
arunachalM = arunachalmat[6,]
arunachalF = arunachalmat[7,]
#Calculate percent of males attending education of total
arunachalpercentM = round(as.numeric(arunachalM) *100/as.numeric(arunachalTotalM),1)
barplot(arunachalpercentM,names.arg=arunachalmat[1,],main ="Percentage males attending educational institutions in Arunachal Pradesh",
xlab = "Age", ylab= "Percentage",ylim = c(0,100), col ="lightblue", legend= c("Males"))
points(age,indiapercentM,pch=15)
lines(age,indiapercentM,col="red",pch=20,lty=2,lwd=3)
legend( x="bottomright",
legend=c("National average"),
col=c("red"), bty="n" , lwd=1, lty=c(2),
pch=c(15) )
#Calculate percent of females attending education of total
arunachalpercentF = round(as.numeric(arunachalF) *100/as.numeric(arunachalTotalF),1)
barplot(arunachalpercentF,names.arg=arunachalmat[1,],main ="Percentage females attending educational institutions in Arunachal Pradesh ",
xlab = "Age", ylab= "Percentage", ylim = c(0,100), col ="lightblue", legend= c("Females"))
points(age,indiapercentF,pch=15)
lines(age,indiapercentF,col="red",pch=20,lty=2,lwd=3)
legend( x="bottomright",
legend=c("National average"),
col=c("red"), bty="n" , lwd=1, lty=c(2),
pch=c(15) )

A) Overall plot for India

a) India – Males

india-males

b) India – females

india-females

The plots for each individual state is given below

1) Literacy in Tamil Nadu

Tamil Nadu is slightly over the national average. The women seem to do marginally better than the males

a) Tamil Nadu – males

tn-males

b) Tamil Nadu – females

tn-females

2) Literacy in Uttar Pradesh

UP is slightly below the national average. Women are comparatively below men here

a) Uttar Pradesh – males

UP-males

b) Uttar Pradesh – females

UP-females

3) Literacy in Bihar

Bihar is well below the national average for both men and women

a) Bihar – males

bihar-males

b) Bihar – females

bihar-females

4. Literacy in Kerala

Kerala is the winner all the way in literacy with almost 100% literacy across all age groups

a) Kerala – males


kerala-females

b) Kerala -females

kerala-females

 

5. Literacy in Andhra Pradesh

AP just meets the national average for literacy.

a) Andhra Pradesh – males

andhra-males

b) Andhra Pradesh – females

andhra-females

6. Literacy in Arunachal Pradesh

Arunachal Pradesh is below average for most of the age groups

a) Arunachal Pradesh – males

arunachal-males

b) Arunachal Pradesh – females

arunachal-females

7. Literacy in  Assam

Assam is below national average

a) Assam – males

assam-males

b) Assam – females

assam-females

 

8. Literacy in Chattisgarh

Chattisgarh is on par with the national average for both men and women

a) Chattisgarh – males

chattisgarh-males

b) Chattisgarh – females

chattisgarh-females

 

9. Literacy in Gujarat

Gujarat is just about average

a) Gujarat – males

gujarat-males

b) Gujarat – females

gujarat-females

10. Literacy in Haryana

Haryana is slightly above average

a) Haryana – males

haryana-males

b) Haryana – females

haryana-females11.  Literacy in Himachal Pradesh

Himachal Pradesh is cool and above average.

a) Himachal Pradesh – males

himachal-males

 

b) Himachal Pradesh – females

himachal-females

12. Literacy in Jammu and Kashmir

J & K is marginally below average

a) Jammu and Kashmir – males

jk-males

b) Jammu and Kashmir – females

jk-females

 

13. Literacy in Jharkhand

Jharkhand is some ways below average

a) Jharkhand – males

jharkand-males

b) Jharkhand – females

jharkand-feamles

14. Literacy in Karnataka

Karnataka is on average for men. Womem seem to do better than men here

a) Karnataka – males

karnataka-males

b) Karnataka – females

karnataka-females

15. Literacy in Madhya Pradesh

Madhya Pradesh meets the national average

a) Madhya Pradesh – males

mp-males

b) Madhya Pradesh – females

mp-females

16. Literacy in Maharashtra

Maharashtra is front-runner in literacy

a) Maharashtra – females

maharashtra

b) Maharastra – females

maharashtra-feamles

 

17. Literacy in Odisha

Odisha meets national average

a) Odisha – males

odisha-males

b) Odisha – females

odisha-females

 

18. Literacy in  Punjab

Punjab is marginally above average with women doing even better

a) Punjab – males

punjab-males

b) Punjab – females

punjab-females19. Literacy in Rajasthan

Rajasthan is average for males and below average for females

a) Rajasthan – males

rajashthan-males

b) Rajasthan – females

rajasthan-females20. Literacy in Uttarakhand

Uttarakhand rocks and is above average

a) Uttarakhand – males

uttarkhan-males

b) Uttarakhand – females

uttarkhand-females

21. Literacy in West Bengal

West Bengal just about meets the national average.

a) West Bengal – males

wb-males

 

b) West Bengal – females

wb-females

The code can be cloned/forked from GitHub  india-literacy. I have done my analysis on the overall data. The data is further sub-divided across districts in each state and further into urban and rural. Many different ways of analysing are possible. One method is shown here

Conclusion

  1. Kerala is clearly head and shoulders above all states when it comes to literacy
  2. Many states are above average. They are Kerala, Himachal Pradesh, Uttarakhand, Tamil Nadu, Haryana, Himachal Pradesh, Karnataka, Maharashtra, Punjab, Uttarakhand
  3. States with average literacy are – Karnataka, Andhra Pradesh, Chattisgarh, Gujarat, Madhya Pradesh, Odisha, West Bengal
  4. States which fall below the national average are – Uttar Pradesh, Bihar, Jharkhand, Arunachal Pradesh, Assam, Jammu and Kashmir, Jharkhand, Rajasthan

See also
– A crime map of India in R: Crimes against women
– What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
– Bend it like Bluemix, MongoDB with autoscaling – Part 1

14 thoughts on “A peek into literacy in India: Statistical Learning with R

  1. Hi
    I am highly impressed the way you did this work. Please let me know how could we get the data from data.gov, as I am unable to download the data from the website.

    Like

Leave a comment