Simplifying Machine Learning: Bias, Variance, regularization and odd facts – Part 4

In both linear and logistic regression the choice of the degree of the polynomial for the hypothesis function is extremely critical. A low degree for the polynomial can result in an underfit, while a very high degree can overfit the data as shown below

41

The figure on the left the data is underfit as we try to fit the data with a first order polynomial which is a straight line. This is a case of strong ‘bias’

The rightmost figure a much higher polynomial is used. All the data points are covered by the polynomial curve however it is not effective in predicting other values. This is a case of overfitting or a high variance.

The middle figure is just right as it intuitively fits the data points the best possible way.

A similar problem exists with logistic regression as shown below

42

There are 2 ways to handle overfitting

a)      Reducing the number of features selected

b)      Using regularization

In regularization the magnitude of the parameters Ɵ is decreased to reduce the effect of overfitting

Hence if we choose a hypothesis function

hƟ (x) = Ɵ0 + Ɵ1x12 + Ɵ2x22 + Ɵ3x33 +  Ɵ4x44

 

The cost function for this without regularization as mentioned in earlier posts

J(Ɵ) = 1/2m Σ(hƟ (xi  – yi)2

Where the key is minimize the above function for the least error

The cost function with regularization becomes

J(Ɵ) = 1/2m Σ(hƟ (xi  – yi)2 + λ Σ Ɵj2

 

As can be seen the regularization now adds a factor Ɵj2  as a part of the cost function which needs to be minimized.

Hence with the regularization factor the problem of underfitting/overfitting can be solved

43

However the trick is determine the value of λ. If λ is too big then it would result in underfitting or resulting in a high bias.

Similarly the regularized equation for logistic regression is as shown below

J(Ɵ) = |1/m Σ  -y * log(hƟ (x))  – (1-y) * (log(1 – hƟ (x))  | + λ/2m Σ Ɵj2

Some tips suggested by Prof Andrew Ng while determining the parameters and features for regression

a)      Get as many training examples. It is worth spending more effort in getting as much examples

b)      Add additional features

c)      Observe changes to the learning algorithm with different values of λ

This post is continued in my next post – Simplifying ML: Impact of degree of polynomial on bias, variance and other insights

Note: This post, in line with my previous posts on Machine Learning,  is based on the Coursera course on Machine Learning by Professor Andrew Ng


Find me on Google+

4 thoughts on “Simplifying Machine Learning: Bias, Variance, regularization and odd facts – Part 4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s