---
title: "ADA1: Class 09, Linear Regression"
author: "Your Name Here"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
toc: true
---
# Rubric
Answer the questions in this document, compile to html, print to pdf, and submit to UNM Learn.
__Do not__ add this to your "ALL" `.Rmd` document.
## Part 1, simple linear regression intuition-building exercise
(20-25 min)
Use this online app to play with the next challenges.
* Open the Applet: http://www.shodor.org/interactivate/activities/Regression/
* Top left: plot area where you'll click to add points
* Bottom left: "Reset" the plot area
* Top right: select to fit your own line (then also select "Move Your Fit Line" in bottom right), or show the "best fit" line
* Bottom right: Select option to:
* Add Points (then click in plot area to add a point)
* Remove Points (then click on points to remove a point)
* Move Points (then click and drag point to move a point)
* Move Your Fit Line (then click on green anchor points to move the line)
The Applet works well, but you do need to have the correct option selected in the bottom right.
Try not to have too much displayed at the same time.
Reset the plot to clean up between each challenge.
Have fun!
1. __Choose 3 points where the line of best fit is $y = 0 + 1 x$.__
1. Click "Add Points" and click 3 times in the plot area.
2. Click "Display line of best fit" to show the red "best fit" line.
3. Click "Move Points" then move the points in the plot area.
4. Try to get within $\pm 0.05$ of the target intercept of 0 and slope of 1. Note that the app displays the equations slope before the intercept, as in $y = 1 x + 0$.
* When done, deselect "Display line of best fit" and click "Reset".
2. __Fit your own line to 7 points.__
1. Click "Add Points" and click 7 times in the plot area.
2. Click "Fit your own line" to show the green "your own" line.
3. Click "Move Your Fit Line" then use the green circles on the line to move it.
4. Recall that the best line passes through the mean (center) of the data and minimizes the sum of squared error (in the $y$ direction).
5. Click "Display line of best fit" to see how close your line was to the red "best fit" line.
3. Repeat a couple times.
* Another good app for comparing your own line to the best fit: https://www.geogebra.org/m/xC6zq7Zv
* When done, deselect "Fit your own line" and "Display line of best fit" and click "Reset".
3. __Illustrate the concept of leverage.__
* "Leverage" is a measure of how much a point is an outlier (extreme) in
the $x$ direction. It's called leverage because points with high
leverage potentially have a lot of influence on the regression line
slope, pulling it up or down like a lever.
1. Click "Display line of best fit" to show the red "best fit" line.
2. Click "Add Points" and place 9 points in a "cluster" on one side of the plot.
3. Place 1 "solo" point by itself on the other side of the plot.
4. Click "Move Points" then move the "solo" point up and down and notice how the regression line responds.
5. Move one of the "cluster" points up and down and notice how the regression line responds.
* When done, deselect "Display line of best fit" and click "Reset".
4. __Relationship between correlation and slope.__
1. Click "Add Points" and click 7 times in the plot area.
2. Click "Display line of best fit" to show the red "best fit" line.
3. Click "Move Points" then move the points in the plot area.
4. Make both of the conditions true at the same time: (1) $r < 0$ and (2) a best fit line with a positive slope.
## Part 2, interpreting analysis
### Five questions to answer
Refer to the __data and output__ below to answer these questions.
Answer the questions in this document, compile to html, print to pdf, and submit to UNM Learn.
1. (2 p) Write the regression equation, $y = \hat{\beta}_0 + \hat{\beta}_1 x$ by replacing $a$ and $b$ in the equation below with their values.
$y = a + b x$
2. (2 p) Interpret the slope.
3. (2 p) Interpret $R^2$.
4. (2 p) Complete this table of predictions.
Replace the question marks with values. You can use R as a calculator.
`agewks` | `shearpsi`
-- | --
5 | ?
20 | ?
40 | ?
5. (2 p) Predictions: How comfortable do you feel ("good" or "bad") about the model predictions for each of these values, and why?
* `agewks` = 5:
* `agewks` = 20:
* `agewks` = 40:
### Data and output
A rocket motor is manufactured by bonding an igniter propellant and a sustainer
propellant together inside a metal housing. The shear strength of the bond
between the two types of propellant is an important quality characteristic. It
is suspected that shear strength is related to the age in weeks of the batch of
sustainer propellant. Twenty observations on these two characteristics are
given below. The first column is shear strength in psi (`shearpsi`), the second is age of
propellant in weeks (`agewks`).
```{R}
## Save the ADA1_CL_09_Data-RocketPropellant.dat file to your computer
# this file uses spaces as delimiters, so use read.table()
dat_rocket <- read.table("ADA1_CL_09_Data-RocketPropellant.dat", header = TRUE)
str(dat_rocket)
head(dat_rocket)
library(ggplot2)
p <- ggplot(dat_rocket, aes(x = agewks, y = shearpsi))
p <- p + theme_bw()
p <- p + geom_point()
p <- p + geom_smooth(method = lm, se = FALSE, fullrange = TRUE)
p <- p + xlim(0, NA)
print(p)
# fit the simple linear regression model
lm_s_a <- lm(shearpsi ~ agewks, data = dat_rocket)
# use summary() to parameters estimates (slope, intercept) and other summaries
summary(lm_s_a)
```