---
title: "S4R: Class 25 Simple linear regression"
author: "Your Name Here"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
toc: true
---
---
This assignment is separate from your project.
Include your answers in this document in the sections below the rubric.
__Plan:__
Use the height and hand span data collected in class to fit and interpret a simple linear regression model:
plot the data,
center the explanatory variable `HandSpan_cm`,
fit a simple linear regression model, and
interpret the parameter estimate table.
# Rubric
Answer the questions with the data example that we collected earlier in the semester.
## Read data and plot by gender
```{r}
library(tidyverse)
```
```{R}
dat.hand <-
read_csv("https://statacumen.com/teach/S4R/worksheet/S4R_WS_23_Correlation_Height-HandSpan_Data.csv")
dat.hand <- na.omit(dat.hand)
dat.hand$Gender_M_F <- factor(dat.hand$Gender_M_F, levels = c("F", "M"))
str(dat.hand)
```
Plot data for `Height_in` vs `HandSpan_cm` for Females and Males.
```{R}
library(ggplot2)
p <- ggplot(dat.hand, aes(x = HandSpan_cm, y = Height_in))
# linear regression fit and confidence bands
p <- p + geom_smooth(method = lm, se = TRUE)
# jitter a little to uncover duplicate points
p <- p + geom_jitter(position = position_jitter(.1), alpha = 0.75)
# separate for Females and Males
p <- p + facet_wrap(~ Gender_M_F, nrow = 1)
print(p)
```
---
__Change the code to use Males for the remaining analysis.__
```{R}
# choose one by uncommenting the one you want to use and commenting the other:
dat.use <-
dat.hand %>%
filter(Gender_M_F == "F") # use Females
#filter(Gender_M_F == "M") # use Males
```
## Center the explanatory variable `HandSpan_cm`
Recentering the $x$-variable doesn't change the model,
but it does provide an interpretation for the intercept of the model.
For example, if you interpret the intercept for the regression lines above,
it's the "expected height for a person with a hand span of zero",
but that's not meaningful.
__(2 p)__
Choose a sensible value to center your data on.
A good choice is a nice round number near the mean (or center) of your data.
This becomes the value for the interpretation of your model intercept
(value of $y$ when $x=0$).
I use the value 20, which means that our new `HandSpan_cm_centered` is 0 for a
Female with a "typical" handspan of 20 cm, -2 for 18 cm, and +2 for 22 cm.
```{R}
dat.use <-
dat.use %>%
mutate(
HandSpan_cm_centered = HandSpan_cm - 20
)
# let's look at the data to see that the centered variable makes sense
dat.use
```
## Fit a simple linear regression model
```{R}
# fit model
lm.fit <- lm(Height_in ~ HandSpan_cm_centered, data = dat.use)
```
Here's the data you're using for the linear regression,
with the regression line and confidence and prediction intervals.
```{R}
library(ggplot2)
p <- ggplot(dat.use, aes(x = HandSpan_cm_centered, y = Height_in))
p <- p + geom_vline(xintercept = 0, alpha = 0.25)
# linear regression fit and confidence bands
p <- p + geom_smooth(method = lm, se = FALSE)
# jitter a little to uncover duplicate points
p <- p + geom_jitter(position = position_jitter(.1), alpha = 0.75)
p <- p + labs(
title = "Height on centered Hand span for Females"
, x = "Hand span centered at 20 cm"
)
print(p)
```
## Interpret the parameter estimate table
Here's the parameter estimate table.
We're estimating the $\beta$ parameter coefficients in the regression model
$\hat{y}_i = \beta_0 + \beta_1 x_i$.
```{R}
summary(lm.fit)
```
__(2 p)__
Assuming the model fits well, complete this equation
(fill in the $\hat{\beta}$ values below with values from the table)
with the appropriate numbers from the table above
(3 numbers: each beta and the HandSpan centering value).
__The regression line is__
$\hat{\textrm{Height_in}} = \hat{\beta}_0 + \hat{\beta}_1 \textrm{(HandSpan_cm - 20)}$.
__(2 p)__
State the hypothesis test related to the slope of the line,
indicate the p-value for the test,
and state the conclusion.
Words and notation:
* Words:
* Notation: $H_0:\beta_? = ?$ vs $H_A:\beta_? \ne ?$
__(2 p)__
Interpret the slope coefficient in the context of the model by changing this
generic sentence to relate to your hypothesis.
For each unit increase in $x$, we expect an increase of beta1 in $y$.
__(2 p)__
State and interpret the $R^2$ value.