Wednesday, March 4, 2020

Creating 3D Surface Plots in R -- Examples Using Iris Data


I downloaded the Iris data set from the UCI Machine Learning Repository and pre-processed it as follows: I just took the first two classes (I wanted a binary classification dataset) which had 100 instances and changed the class labels to 0/1. I retained all four of the attributes -- Sepal Length, Sepal Width, Petal Length and Petal Width.

Next, I took two attributes at a time, built a linear model (lm) in R and fitted the data. I wanted to visualize the surface of the loss functions using Absolute Loss and Squared Loss. Here is the code I used to generate my plots.

# Surface Plots for Squared Loss Functions
# Testing with Iris Data
require(grDevices)
A<-read.csv('/Users/haimontidutta/Courses/MGS662/Spring2020/Labs/Lab2/iris_new.csv',header=TRUE,sep=",")
fit.linear1<-lm(Class~Sepal_Width+Petal_Width,data=A)
summary(fit.linear1)
#Divide into 25 parts
xgrid=seq(min(A[,1]), max(A[,1]), 0.088)
#Divide into 4 parts
ygrid=seq(min(A[,2]), max(A[,2]), 0.425)
#Plotting absolute loss
zt=matrix(fit.linear1$residuals,25,4)
persp(x=xgrid[1:25],y=ygrid[1:4],z=zt,main = "Exponential Loss Function", xlab = "Sepal Width", ylab = "Petal Width", zlab="Squared Error", theta=30, phi=15, col= c('orangered', 'salmon'),shade = 0.2)
#filled.contour(x=xgrid[1:25],y=ygrid[1:4],z=zt,plot.title = title(main = "Absolute Loss Function", xlab = "Sepal Length", ylab = "Sepal Width"), color.palette=terrain.colors)
view raw ContPlot_v1a.R hosted with ❤ by GitHub

I present below a summary of my results:

Model 1a. Absolute Loss – Sepal Length, Sepal Width
Model 1a. Absolute Loss – Sepal Length, Sepal Width
Model
> summary(fit.linear1)
Call:
lm(formula = Class ~ Sepal_Length + Sepal_Width, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.50174 -0.15280 0.00804 0.11770 0.47855
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.30877 0.24897 -1.24 0.218
Sepal_Length 0.48334 0.03305 14.62 <2e-16 ***
Sepal_Width -0.59327 0.04455 -13.32 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2069 on 97 degrees of freedom
Multiple R-squared: 0.8339, Adjusted R-squared: 0.8305
F-statistic: 243.6 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_SL_SW hosted with ❤ by GitHub




Model 1b. Absolute Loss – Sepal Length, Petal Length
Call:
lm(formula = Class ~ Sepal_Length + Petal_Length, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.29105 -0.06426 -0.01290 0.06173 0.39573
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.14751 0.13595 1.085 0.281
Sepal_Length -0.13747 0.03009 -4.568 1.45e-05 ***
Petal_Length 0.38596 0.01333 28.951 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.112 on 97 degrees of freedom
Multiple R-squared: 0.9513, Adjusted R-squared: 0.9503
F-statistic: 947.2 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_SL_PL hosted with ❤ by GitHub




Model 1c. Absolute Loss – Sepal Length, Petall Width
Call:
lm(formula = Class ~ Sepal_Length + Petal_Width, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.39458 -0.08604 0.00170 0.05645 0.33772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.12734 0.17173 0.742 0.4602
Sepal_Length -0.06211 0.03566 -1.742 0.0847 .
Petal_Width 0.90761 0.04041 22.461 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1397 on 97 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9227
F-statistic: 591.9 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_SL_PW hosted with ❤ by GitHub




Model 1d. Absolute Loss – Sepal Width, Petal Length
Call:
lm(formula = Class ~ Sepal_Width + Petal_Length, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.24538 -0.07411 -0.00882 0.05725 0.35507
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.171799 0.102235 1.680 0.0961 .
Sepal_Width -0.173716 0.027256 -6.374 6.23e-09 ***
Petal_Length 0.302473 0.008957 33.768 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1037 on 97 degrees of freedom
Multiple R-squared: 0.9583, Adjusted R-squared: 0.9574
F-statistic: 1114 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_SW_PL hosted with ❤ by GitHub




Model 1e. Absolute Loss – Sepal Width, Petal Width

Call:
lm(formula = Class ~ Sepal_Width + Petal_Width, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.31670 -0.05463 0.00531 0.07211 0.25097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.61014 0.09891 6.169 1.59e-08 ***
Sepal_Width -0.22490 0.02803 -8.023 2.40e-12 ***
Petal_Width 0.74613 0.02357 31.662 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.11 on 97 degrees of freedom
Multiple R-squared: 0.9531, Adjusted R-squared: 0.9521
F-statistic: 984.6 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_SW_PW hosted with ❤ by GitHub


Width

Model 1f. Absolute Loss Petal Length, Petall Width

Call:
lm(formula = Class ~ Petal_Length + Petal_Width, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.24365 -0.07120 -0.01361 0.04766 0.39400
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.39256 0.04210 -9.323 3.88e-15 ***
Petal_Length 0.24956 0.04091 6.100 2.17e-08 ***
Petal_Width 0.22717 0.10465 2.171 0.0324 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1206 on 97 degrees of freedom
Multiple R-squared: 0.9436, Adjusted R-squared: 0.9424
F-statistic: 810.7 on 2 and 97 DF, p-value: < 2.2e-16
view raw Model_PL_PW hosted with ❤ by GitHub




I found that the model with Sepal Width and Petal Length has the highest Adjusted R Squared (0.9574).

What did you discover? Please do let me know! Also, if you want to play around with colors in R -- here is a nice reference. Enjoy coding in R!