Introduction
This is the final and concluding part of my series on ‘Practical Machine Learning with R and Python’. In this series I included the implementations of the most common Machine Learning algorithms in R and Python. The algorithms implemented were
1. Practical Machine Learning with R and Python – Part 1 In this initial post, I touch upon regression of a continuous target variable. Specifically I touch upon Univariate, Multivariate, Polynomial regression and KNN regression in both R and Python
2. Practical Machine Learning with R and Python – Part 2 In this post, I discuss Logistic Regression, KNN classification and Cross Validation error for both LOOCV and K-Fold in both R and Python
3. Practical Machine Learning with R and Python – Part 3 This 3rd part included feature selection in Machine Learning. Specifically I touch best fit, forward fit, backward fit, ridge(L2 regularization) & lasso (L1 regularization). The post includes equivalent code in R and Python.
4. Practical Machine Learning with R and Python – Part 4 In this part I discussed SVMs, Decision Trees, Validation, Precision-Recall, AUC and ROC curves
5. Practical Machine Learning with R and Python – Part 5 In this penultimate part, I touch upon B-splines, natural splines, smoothing spline, Generalized Additive Models(GAMs), Decision Trees, Random Forests and Gradient Boosted Treess.
In this last part I cover Unsupervised Learning. Specifically I cover the implementations of Principal Component Analysis (PCA). K-Means and Heirarchical Clustering. You can download this R Markdown file from Github at MachineLearning-RandPython-Part6
Note: Please listen to my video presentations Machine Learning in youtube
1. Machine Learning in plain English-Part 1
2. Machine Learning in plain English-Part 2
3. Machine Learning in plain English-Part 3
Check out my compact and minimal book “Practical Machine Learning with R and Python:Third edition- Machine Learning in stereo” available in Amazon in paperback($12.99) and kindle($8.99) versions. My book includes implementations of key ML algorithms and associated measures and metrics. The book is ideal for anybody who is familiar with the concepts and would like a quick reference to the different ML algorithms that can be applied to problems and how to select the best model. Pick your copy today!!

1.1a Principal Component Analysis (PCA) – R code
Principal Component Analysis is used to reduce the dimensionality of the input. In the code below 8 x 8 pixel of handwritten digits is reduced into its principal components. Then a scatter plot of the first 2 principal components give a very good visial representation of the data
library(dplyr)
library(ggplot2)
digits= read.csv("digits.csv")
digitClasses <- factor(digits$X0.000000000000000000e.00.29)
digitsPCA=prcomp(digits[,1:64])
df <- data.frame(digitsPCA$x)
df1 <- cbind(df,digitClasses)

1.1 b Variance explained vs no principal components – R code
In the code below the variance explained vs the number of principal components is plotted. It can be seen that with 20 Principal components almost 90% of the variance is explained by this reduced dimensional model.
digits= read.csv("digits.csv")
digitClasses <- factor(digits$X0.000000000000000000e.00.29)
digitsPCA=prcomp(digits[,1:64])
sd=digitsPCA$sdev
digitsVar=digitsPCA$sdev^2
percentVarExp=digitsVar/sum(digitsVar)

1.1c Principal Component Analysis (PCA) – Python code
import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
digits = load_digits()
pca = PCA(2)
projected = pca.fit_transform(digits.data)

1.1 b Variance vs no principal components
– Python code
import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
digits = load_digits()
pca = PCA(64)
projected = pca.fit_transform(digits.data)
varianceExp= pca.explained_variance_ratio_
totVarExp=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)

1.2a K-Means – R code
In the code first the scatter plot of the first 2 Principal Components of the handwritten digits is plotted as a scatter plot. Over this plot 10 centroids of the 10 different clusters corresponding the 10 diferent digits is plotted over the original scatter plot.
library(ggplot2)
digits= read.csv("digits.csv")
digitClasses <- factor(digits$X0.000000000000000000e.00.29)
digitsPCA=prcomp(digits[,1:64])
df <- data.frame(digitsPCA$x)
df1 <- cbind(df,digitClasses)
a<- df[,1:2]
k<-kmeans(a,10,1000)
df2<-data.frame(k$centers)

1.2b K-Means – Python code
The centroids of the 10 different handwritten digits is plotted over the scatter plot of the first 2 principal components.
import numpy as np
from sklearn.decomposition import PCA
from sklearn import decomposition
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.cluster import KMeans
digits = load_digits()
pca = PCA(2)
projected = pca.fit_transform(digits.data)
kmeans = KMeans(n_clusters=10)
kmeans.fit(projected)
y_kmeans = kmeans.predict(projected)
centers = kmeans.cluster_centers_
centers

1.3a Heirarchical clusters – R code
Herirachical clusters is another type of unsupervised learning. It successively joins the closest pair of objects (points or clusters) in succession based on some ‘distance’ metric. In this type of clustering we do not have choose the number of centroids. We can cut the created dendrogram mat an appropriate height to get a desired and reasonable number of clusters These are the following ‘distance’ metrics used while combining successive objects
- Ward
- Complete
- Single
- Average
- Centroid
iris <- datasets::iris
iris2 <- iris[,-5]
species <- iris[,5]
d_iris <- dist(iris2)
hc_iris <- hclust(d_iris, method = "average")
sub_grp <- cutree(hc_iris, k = 3)
table(sub_grp)
## sub_grp
## 1 2 3
## 50 64 36

1.3a Heirarchical clusters – Python code
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
iris = load_iris()
Z = linkage(iris.data, 'average')