# R incantations for the uninitiated

Here are some basic R incantations that will get you started with R

A) Scalars & Vectors:
Chant 1 – Now repeat after me, with your right hand forward at shoulder height “In R there are no scalars. There are only vectors of length 1”.
Just kidding:-)

To create an integer variable x with a value 5 we write
```x <- 5 or x = 5```
While the former notation may seem odd, it is actually more logical considering that the RHS is assigned to LHS. Anyway both seem to work
Vectors can be created as follows
```a <- c( 2:10) b <- c("This", "is", 'R","language")```

B) Sequences:
There are several ways of creating sequences of numbers which you intend to use for your computation
`<- seq(5:25) # Sequence from 5 to 25`

Other ways to create sequences
Increment by 2
```> seq(5, 25, by=2) [1]  5  7  9 11 13 15 17 19 21 23 25```

```>seq(5,25,length=18) # Create sequence from 5 to 25 with a total length of 18 [1]  5.000000  6.176471  7.352941  8.529412  9.705882 10.882353 12.058824 13.235294 [9] 14.411765 15.588235 16.764706 17.941176 19.117647 20.294118 21.470588 22.647059 [17] 23.823529 25.000000```

C) Conditions and loops
An if-else if-else construct goes like this
```if(condition) { do something } else if (condition) { do something } else { do something }```

Note: Make sure the statements appear as above with the else if and else appearing on the same line as the closing braces, otherwise R complains about ‘unexpected else’ in else statement

D) Loops
I would like to mention 2 ways of doing ‘for’ loops  in R.
a) ```for (i in 1:10) { statement }```
```> a <- seq(5,25,length=10) > a [1]  5.000000  7.222222  9.444444 11.666667 13.888889 16.111111 18.333333 [8] 20.555556 22.777778 25.000000```

b) Sequence along the vector sequence. Note: This is useful as we don’t have to know  the length of the vector/sequence
```for (i in seq_along(a)){ +   print(a[i]) + }```

[1] 5
[1] 7.222222
[1] 9.444444
[1] 11.66667

There are others ways of looping with ‘while’ and ‘repeat’ which I have not included in this post.

R makes manipulation of matrices and data frames really easy. All the elements in a matrix are numeric while data frames can have different types for each of the element

E) Matrix
```> rnorm(12,5,2) [1] 2.699961 3.160208 5.087478 3.969129 3.317840 4.551565 2.585758 2.397780 [9] 5.297535 6.574757 7.468268 2.440835```

a) Create a vector of 12 random numbers with a mean of 5 and SD of 2
```> a <-rnorm(12,5,2) b) Convert the vector to a matrix with 4 rows and 3 columns > mat <- matrix(a,4,3) > mat[,1]     [,2]     [,3] [1,] 5.197010 3.839281 9.022818 [2,] 4.053590 5.321399 5.587495 [3,] 4.225763 4.873768 6.648151 [4,] 4.709784 4.129093 2.575523```

c) Subset rows 1 & 2 from the matrix
```> mat[1:2,] [,1]     [,2]     [,3] [1,] 5.19701 3.839281 9.022818 [2,] 4.05359 5.321399 5.587495 ```
d) Subset matrix a rows 1& 2 and with columns 2 & 3
```> mat[1:2,2:3] [,1]     [,2] [1,] 3.839281 9.022818 [2,] 5.321399 5.587495 ```
e) Subset matrix a for all row elements for the column 3
```> mat[,3] [1] 9.022818 5.587495 6.648151 2.575523 ```
e) Add row names and column names for the matrix as follows
> names <- c(“tim”,”pat”,”joe”,”jim”)
> v <- data.frame(names,mat)
```> v names       X1       X2       X3 1   tim 5.197010 3.839281 9.022818 2   pat 4.053590 5.321399 5.587495 3   joe 4.225763 4.873768 6.648151 4   jim 4.709784 4.129093 2.575523```

```> colnames(v) <- c("names","a","b","c") > v names        a        b        c 1   tim 5.197010 3.839281 9.022818 2   pat 4.053590 5.321399 5.587495 3   joe 4.225763 4.873768 6.648151 4   jim 4.709784 4.129093 2.575523```

F) Data Frames
In R data frames are the most important method to manipulate large amounts of data. One can read data in .csv format into data frame using
`df <- read.csv(“mydata.csv”)`
To get a feel of data frames it is useful to play around with the numerous data sets that are available with the installation of R
To check the available dataframes do
```>data() AirPassengers                    Monthly Airline Passenger Numbers 1949-1960 BJsales                          Sales Data with Leading Indicator BJsales.lead (BJsales)           Sales Data with Leading Indicator BOD                              Biochemical Oxygen Demand CO2                              Carbon Dioxide Uptake in Grass Plants ChickWeight                      Weight versus age of chicks on different diets ...```

I will be using the mtcars data frame. Here are some of the most important commands on data frames
a) load data from mtcars
`data(mtcars)`
b) ```> head(mtcars,3) # Display the top 3 rows of the data frame mpg cyl disp  hp drat    wt  qsec vs am gear carb Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1```

c) ```> tail(mtcars,4) # Display the boittom 4 rows of the data frame mpg cyl disp  hp drat   wt qsec vs am gear carb Ford Pantera L 15.8   8  351 264 4.22 3.17 14.5  0  1    5    4 Ferrari Dino   19.7   6  145 175 3.62 2.77 15.5  0  1    5    6 Maserati Bora  15.0   8  301 335 3.54 3.57 14.6  0  1    5    8 Volvo 142E     21.4   4  121 109 4.11 2.78 18.6  1  1    4    2```

d) ```> names(mtcars)  # Display the names of the columns of the data frame [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"```

e) ```> summary(mtcars) # Display the summary of the data frame mpg             cyl             disp             hp             drat             wt Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760   Min.   :1.513 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581 Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695   Median :3.325 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597   Mean   :3.217 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930   Max.   :5.424 qsec             vs               am              gear            carb Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000 1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000 Median :17.71   Median :0.0000   Median :0.0000   Median :4.000   Median :2.000 Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812 3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000 Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000```

f) ```> str(mtcars) # Generate a concise description of the data frame - values in each column, factors 'data.frame':   32 obs. of  11 variables: \$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... \$ cyl : num  6 6 4 6 8 6 8 4 4 6 ... \$ disp: num  160 160 108 258 360 ... \$ hp  : num  110 110 93 110 175 105 245 62 95 123 ... \$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... \$ wt  : num  2.62 2.88 2.32 3.21 3.44 ... \$ qsec: num  16.5 17 18.6 19.4 17 ... \$ vs  : num  0 0 1 1 0 1 0 1 1 1 ... \$ am  : num  1 1 1 0 0 0 0 0 0 0 ... \$ gear: num  4 4 4 3 3 3 3 4 4 4 ... \$ carb: num  4 4 1 1 2 1 4 2 2 4 ... ```
g) ```> mtcars[mtcars\$mpg == 10.4,] #Select all rows in mtcars where the mpg column has a value 10.4 mpg cyl disp  hp drat    wt  qsec vs am gear carb Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4 Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4```

h) ```> mtcars[(mtcars\$mpg >20) & (mtcars\$mpg <24),] # Select all rows in mtcars where the mpg > 20 and mpg < 24 mpg cyl  disp  hp drat    wt  qsec vs am gear carb Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2```

i) ```> myset <- mtcars[(mtcars\$cyl == 6) | (mtcars\$cyl == 4),] # Get all calls which are either 4 or 6 cylinder > myset mpg cyl  disp  hp drat    wt  qsec vs am gear carb Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2… … …```

j) ```> mean(myset\$mpg) # Determine the mean of the set created above [1] 23.97222 ```
k) `> table(mtcars\$cyl) #Create a table of cars which have 4,6, or 8 cylinders`

4  6  8
11  7 14

G) lapply,sapply,tapply
I use the iris data set for these commands
`a) > data(iris)` #Load iris data set

b)``` > names(iris)  #Show the column names of the data set [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species" c) > lapply(iris,class) #Show the class of all the columns in iris \$Sepal.Length [1] "numeric" \$Sepal.Width [1] "numeric" \$Petal.Length [1] "numeric" \$Petal.Width [1] "numeric" \$Species [1] "factor"```

d) ```> sapply(iris,class) # Display a summary of the class of the iris data set Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species "numeric"    "numeric"    "numeric"    "numeric"     "factor"```

e) ```tapply: Instead of getting the mean for each of the species as below we can use tapply > a <-iris[iris\$Species == "setosa",] > mean(a\$Sepal.Length) [1] 5.006 > b <-iris[iris\$Species == "versicolor",] > mean(b\$Sepal.Length) [1] 5.936 > c <-iris[iris\$Species == "virginica",] > mean(c\$Sepal.Length) [1] 6.588```

> tapply(iris\$Sepal.Length,iris\$Species,mean)
setosa versicolor  virginica
5.006      5.936      6.588

Hopefully this highly condensed version of R will set you on a R-oll.