Here are some basic R incantations that will get you started with R
A) Scalars & Vectors:
Chant 1 – Now repeat after me, with your right hand forward at shoulder height “In R there are no scalars. There are only vectors of length 1”.
Just kidding:-)
To create an integer variable x with a value 5 we write
x <- 5 or
x = 5
While the former notation may seem odd, it is actually more logical considering that the RHS is assigned to LHS. Anyway both seem to work
Vectors can be created as follows
a <- c( 2:10)
b <- c("This", "is", 'R","language")
B) Sequences:
There are several ways of creating sequences of numbers which you intend to use for your computation
<- seq(5:25) # Sequence from 5 to 25
Other ways to create sequences
Increment by 2
> seq(5, 25, by=2)
[1] 5 7 9 11 13 15 17 19 21 23 25
>seq(5,25,length=18) # Create sequence from 5 to 25 with a total length of 18
[1] 5.000000 6.176471 7.352941 8.529412 9.705882 10.882353 12.058824 13.235294
[9] 14.411765 15.588235 16.764706 17.941176 19.117647 20.294118 21.470588 22.647059
[17] 23.823529 25.000000
C) Conditions and loops
An if-else if-else construct goes like this
if(condition) {
do something
} else if (condition) {
do something
} else {
do something
}
Note: Make sure the statements appear as above with the else if and else appearing on the same line as the closing braces, otherwise R complains about ‘unexpected else’ in else statement
D) Loops
I would like to mention 2 ways of doing ‘for’ loops in R.
a) for (i in 1:10) {
statement
}
> a <- seq(5,25,length=10)
> a
[1] 5.000000 7.222222 9.444444 11.666667 13.888889 16.111111 18.333333
[8] 20.555556 22.777778 25.000000
b) Sequence along the vector sequence. Note: This is useful as we don’t have to know the length of the vector/sequence
for (i in seq_along(a)){
+ print(a[i])
+ }
[1] 5
[1] 7.222222
[1] 9.444444
[1] 11.66667
…
There are others ways of looping with ‘while’ and ‘repeat’ which I have not included in this post.
R makes manipulation of matrices and data frames really easy. All the elements in a matrix are numeric while data frames can have different types for each of the element
E) Matrix
> rnorm(12,5,2)
[1] 2.699961 3.160208 5.087478 3.969129 3.317840 4.551565 2.585758 2.397780
[9] 5.297535 6.574757 7.468268 2.440835
a) Create a vector of 12 random numbers with a mean of 5 and SD of 2
> a <-rnorm(12,5,2)
b) Convert the vector to a matrix with 4 rows and 3 columns
> mat <- matrix(a,4,3)
> mat[,1] [,2] [,3]
[1,] 5.197010 3.839281 9.022818
[2,] 4.053590 5.321399 5.587495
[3,] 4.225763 4.873768 6.648151
[4,] 4.709784 4.129093 2.575523
c) Subset rows 1 & 2 from the matrix
> mat[1:2,]
[,1] [,2] [,3]
[1,] 5.19701 3.839281 9.022818
[2,] 4.05359 5.321399 5.587495
d) Subset matrix a rows 1& 2 and with columns 2 & 3
> mat[1:2,2:3]
[,1] [,2]
[1,] 3.839281 9.022818
[2,] 5.321399 5.587495
e) Subset matrix a for all row elements for the column 3
> mat[,3]
[1] 9.022818 5.587495 6.648151 2.575523
e) Add row names and column names for the matrix as follows
> names <- c(“tim”,”pat”,”joe”,”jim”)
> v <- data.frame(names,mat)
> v
names X1 X2 X3
1 tim 5.197010 3.839281 9.022818
2 pat 4.053590 5.321399 5.587495
3 joe 4.225763 4.873768 6.648151
4 jim 4.709784 4.129093 2.575523
> colnames(v) <- c("names","a","b","c")
> v
names a b c
1 tim 5.197010 3.839281 9.022818
2 pat 4.053590 5.321399 5.587495
3 joe 4.225763 4.873768 6.648151
4 jim 4.709784 4.129093 2.575523
F) Data Frames
In R data frames are the most important method to manipulate large amounts of data. One can read data in .csv format into data frame using
df <- read.csv(“mydata.csv”)
To get a feel of data frames it is useful to play around with the numerous data sets that are available with the installation of R
To check the available dataframes do
>data()
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales) Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
...
…
I will be using the mtcars data frame. Here are some of the most important commands on data frames
a) load data from mtcars
data(mtcars)
b) > head(mtcars,3) # Display the top 3 rows of the data frame
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
c) > tail(mtcars,4) # Display the boittom 4 rows of the data frame
mpg cyl disp hp drat wt qsec vs am gear carb
Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
Volvo 142E 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
d) > names(mtcars) # Display the names of the columns of the data frame
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
e) > summary(mtcars) # Display the summary of the data frame
mpg cyl disp hp drat wt
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695 Median :3.325
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930 Max. :5.424
qsec vs am gear carb
Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :17.71 Median :0.0000 Median :0.0000 Median :4.000 Median :2.000
Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :8.000
f) > str(mtcars) # Generate a concise description of the data frame - values in each column, factors
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
g) > mtcars[mtcars$mpg == 10.4,] #Select all rows in mtcars where the mpg column has a value 10.4
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
h) > mtcars[(mtcars$mpg >20) & (mtcars$mpg <24),] # Select all rows in mtcars where the mpg > 20 and mpg < 24
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
i) > myset <- mtcars[(mtcars$cyl == 6) | (mtcars$cyl == 4),] # Get all calls which are either 4 or 6 cylinder
> myset
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2…
…
…
j) > mean(myset$mpg) # Determine the mean of the set created above
[1] 23.97222
k) > table(mtcars$cyl) #Create a table of cars which have 4,6, or 8 cylinders
4 6 8
11 7 14
G) lapply,sapply,tapply
I use the iris data set for these commands
a) > data(iris)
#Load iris data set
b) > names(iris) #Show the column names of the data set
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
c) > lapply(iris,class) #Show the class of all the columns in iris
$Sepal.Length
[1] "numeric"
$Sepal.Width
[1] "numeric"
$Petal.Length
[1] "numeric"
$Petal.Width
[1] "numeric"
$Species
[1] "factor"
d) > sapply(iris,class) # Display a summary of the class of the iris data set
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"numeric" "numeric" "numeric" "numeric" "factor"
e) tapply: Instead of getting the mean for each of the species as below we can use tapply
> a <-iris[iris$Species == "setosa",]
> mean(a$Sepal.Length)
[1] 5.006
> b <-iris[iris$Species == "versicolor",]
> mean(b$Sepal.Length)
[1] 5.936
> c <-iris[iris$Species == "virginica",]
> mean(c$Sepal.Length)
[1] 6.588
> tapply(iris$Sepal.Length,iris$Species,mean)
setosa versicolor virginica
5.006 5.936 6.588
Hopefully this highly condensed version of R will set you on a R-oll.
You may like
– A peek into literacy in India:Statistical learning with R
– A crime map of India in R: Crimes against women
– Analyzing cricket’s batting legends – Through the mirage with R