---
title: "ADA2: Class 03, Ch 02 Introduction to Multiple Linear Regression"
author: "Name Here"
date: "mm/dd/yyyy"
output:
pdf_document:
number_sections: yes
toc: yes
html_document:
toc: true
number_sections: true
code_folding: show
---
```{R, echo=FALSE}
# I set some GLOBAL R chunk options here.
# (to hide this message add "echo=FALSE" to the code chunk options)
knitr::opts_chunk$set(comment = NA, message = FALSE, warning = FALSE, width = 100)
knitr::opts_chunk$set(fig.align = "center", fig.height = 4, fig.width = 6)
knitr::opts_chunk$set(cache = TRUE, autodep=TRUE) #$
```
# Auction selling price of antique grandfather clocks
The data include the selling price in pounds sterling at auction of 32 antique grandfather clocks,
the age of the clock in years, and the number of people who made a bid.
In the sections below, describe the relationship between variables and develop a model
for predicting selling `Price` given `Age` and `Bidders`.
```{R}
fn.data <- "http://statacumen.com/teach/ADA2/worksheet/ADA2_WS_03_auction.txt"
auction <- read.table(fn.data, header=TRUE)
str(auction)
summary(auction)
```
## __(1 p)__ Scatterplot matrix
_In a scatterplot matrix below interpret the relationship between each pair of variables.
If a transformation is suggested by the plot (that is, because there is a curved relationship),
also plot the data on the transformed scale and
perform the following analysis on the transformed scale.
Otherwise indicate that no transformation is necessary._
```{R}
library(ggplot2)
library(GGally)
p <- ggpairs(auction)
print(p)
```
### Solution
## __(1 p)__ Correlation matrix
_Below is the correlation matrix and tests for the hypothesis that each correlation is equal to zero.
Interpret the hypothesis tests and relate this to the plot that you produced above._
```{R}
# correlation matrix and associated p-values testing "H0: rho == 0"
library(Hmisc)
rcorr(as.matrix(auction))
```
### Solution
## __(1 p)__ Plot interpretation
_Below are two plots.
The first has $y =$ Price, $x =$ Age, and colour = Bidders,
and the second has $y =$ Price, $x =$ Bidders, and colour = Age.
Interpret the relationships between all three variables, simultaneously.
For example, say how Price relates to Age,
then also how Price relates to Bidders conditional on Age being a specific value._
```{R, fig.height = 4, fig.width = 10, echo=FALSE}
auction$id <- 1:dim(auction)[1]
# ggplot: Plot the data with linear regression fit and confidence bands
library(ggplot2)
p1 <- ggplot(auction, aes(x = Age, y = Price, label = id))
p1 <- p1 + geom_point(aes(colour = Bidders), size=3)
# plot labels next to points
p1 <- p1 + geom_text(hjust = 0.5, vjust = -0.5, alpha = 1/4, colour = 2)
# plot regression line and confidence band
p1 <- p1 + geom_smooth(method = lm)
p1 <- p1 + labs(title="Selling Price by Age with colored Bidders")
#print(p1)
# ggplot: Plot the data with linear regression fit and confidence bands
library(ggplot2)
p2 <- ggplot(auction, aes(x = Bidders, y = Price, label = id))
p2 <- p2 + geom_point(aes(colour = Age), size=3)
# plot labels next to points
p2 <- p2 + geom_text(hjust = 0.5, vjust = -0.5, alpha = 1/4, colour = 2)
# plot regression line and confidence band
p2 <- p2 + geom_smooth(method = lm)
p2 <- p2 + labs(title="Selling Price by Bidders with colored Age")
#print(p2)
library(gridExtra)
grid.arrange(grobs = list(p1, p2), nrow=1)
```
### Solution
## __(2 p)__ Multiple regression assumptions (assessing model fit)
_Below the multiple regression is fit.
Start by assessing the model assumptions by interpretting what you learn from each of the six plots._
_If assumptions are not met, attempt to address by transforming a variable and
restart at the beginning using the new transformed variable._
```{R}
# fit the simple linear regression model
lm.p.a.b <- lm(Price ~ Age + Bidders, data = auction)
```
Plot diagnostics.
```{R, fig.height = 6, fig.width = 10, echo=FALSE}
# plot diagnistics
par(mfrow=c(2,3))
plot(lm.p.a.b, which = c(1,4,6))
plot(auction$Age, lm.p.a.b$residuals, main="Residuals vs Age")
# horizontal line at zero
abline(h = 0, col = "gray75")
plot(auction$Bidders, lm.p.a.b$residuals, main="Residuals vs Bidders")
# horizontal line at zero
abline(h = 0, col = "gray75")
# Normality of Residuals
library(car)
qqPlot(lm.p.a.b$residuals, las = 1, id.n = 3, main="QQ Plot")
## residuals vs order of data
#plot(lm.p.a.b$residuals, main="Residuals vs Order of data")
# # horizontal line at zero
# abline(h = 0, col = "gray75")
```
### Solution
From the diagnostic plots above,
(1)
(2)
(3)
(4)
(5)
(6)
## __(1 p)__ Added variable plots
_Use partial regression residual plots (added variable plots)
to check for the need for transformations.
If linearity is not supported, address and restart at the beginning._
```{R, fig.height = 4, fig.width = 8, echo=FALSE}
library(car)
avPlots(lm.p.a.b, id.n=3)
```
### Solution
## __(1 p)__ Multiple regression hypothesis tests
_State the hypothesis test and conclusion for each regression coefficient._
```{R}
# fit the simple linear regression model
lm.p.a.b <- lm(Price ~ Age + Bidders, data = auction)
# use summary() to get t-tests of parameters (slope, intercept)
summary(lm.p.a.b)
```
### Solution
## __(1 p)__ Multiple regression interpret coefficients
_Interpret the coefficients of the multiple regression model._
### Solution
## __(1 p)__ Multiple regression $R^2$
_Interpret the Multiple R-squared value._
### Solution
## __(1 p)__ Summary
_Summarize your findings in one sentence._
### Solution
```{R}
## Aside: While I generally recommend against 3D plots for a variety of reasons,
## so you can visualize the surface fit in 3D, here's a 3D version of the plot.
## I will point out a feature in this plot that we would't see in other plots
## and would typically only be detected by careful consideration
## of a "more complicated" second-order model that includes curvature.
# library(rgl)
# library(car)
# scatter3d(Price ~ Age + Bidders, data = auction)
```