Wednesday, March 27, 2019

R code for Gradient Descent

Assume that we have one dimensional data and y=1.2*(x-2)^2+3.2. This implies that y has a closed form solution, known apriori. Then it is straight forward to obtain the first derivative and perform a gradient descent. Here is the R code for the above.



Now suppose, we still have one dimensional data but the functional form of y is unknown. How would we do Gradient Descent in that case? Below is the R code to do just that.



Now here are some variants of these to experiment with:
a. Replace the squared loss with a differentiable loss function of your choice and observe the impact on your favorite data set.
b. How does the parameters of the algorithm (alpha, # of iterations, choice of starting point) affect the convergence time of the algorithm?
c. How will you modify this code to implement the stochastic gradient descent algorithm?
d. Suppose we added a bias term to your hypothesis and asked you to repeat the experiments. What do you observe -- is the bias term beneficial?
e. Suppose we changed the hypothesis to be nonlinear for e.g. h(x) = w^2 x+ wx + b. Is the solution you find any better?
f. How will you modify the code above to implement Newton's algorithm (Clue: You need to use Taylor expansion for representing the function with higher order terms).

If you can report the above results on your favorite data set, I'd like to hear from you!

Also, feel free to ask questions in the comments if you are wondering about something.

Friday, March 15, 2019

Contour Plots of Loss Functions in R

In machine learning, loss functions are used to estimate how well learning algorithms perform. It is often written as loss = L(y, y_hat) where y is the true label and y_hat is predicted. Commonly used loss functions include Squared, Absolute or Laplace, Huber, Hinge, Logistic and others.

Using the Iris data set from UCIrvine,  I demonstrate how contour plots of loss functions can be obtained using R.

Background Information: The Iris data after being downloaded, was pre-processed in the following manner: Only two classes (100 examples) were selected to ensure the problem remained that of binary classification. Furthermore, two attributes were selected to enable visualizations via contour plots. The glmnet package was used to build a lasso model as shown below:



The filled contour plot generated from this is shown below:
Add in your favorite loss function -- Huber or Hinge and see some nice contour plots with the Iris data!