---
title: "ADA2: Class 01, R, Review"
author: Your Name
date: last-modified
description: |
[Advanced Data Analysis 2](https://StatAcumen.com/teach/ada2),
Stat 428/528, Spring 2024, Prof. Erik Erhardt, UNM
format:
html:
theme: litera
highlight-style: atom-one
page-layout: full # article, full # https://quarto.org/docs/output-formats/page-layout.html
toc: true
toc-location: body # body, left, right
number-sections: false
self-contained: false # !!! this can cause a render error
code-overflow: scroll # scroll, wrap
code-block-bg: true
code-block-border-left: "#30B0E0"
code-copy: false # true, false, hover a copy buttom in top-right of code block
fig-width: 6
fig-height: 4
fig-align: center # default, left, right, or center
execute: # https://quarto.org/docs/computations/execution-options.html, https://quarto.org/docs/computations/r.html
cache: false # false, true
echo: true # true, false Include the source code in output
warning: true # true, false Include warnings in the output.
error: true # true, false Include errors in the output (note that this implies that errors executing code will not halt processing of the document).
---
Write R code to answer the quiz questions on Learn using the dataset below.
# Rubric for grading
For these questions below:
* 3. (2 p) plot and interpretation.
* 5. (2 p) plot and interpretation.
* 7. (2 p) plot and interpretation.
* 10. (4 p) code and output appear correct, no errors.
Note that because the __Quiz 1 questions__ also use this data, those questions are also in this document typeset in preformatted text, like this:
```
Quiz 1. What was the lowest recorded punting distance among the 13 participants?
```
---
# American Football Punters
## [Description](http://www.statsci.org/data/general/punting.html)
Investigators studied physical characteristics and ability in 13 football
punters. Each volunteer punted a football ten times. The investigators recorded
the average distance for the ten punts, in feet. They also recorded the average
hang time (time the ball is in the air before the receiver catches it) for the
ten punts, in seconds. In addition, the investigators recorded five measures of
strength and flexibility for each punter: right leg strength (pounds), left leg
strength (pounds), right hamstring muscle flexibility (degrees), left hamstring
muscle flexibility (degrees), and overall leg strength (foot-pounds). From the
study "The relationship between selected physical performance variables and
football punting ability" by the Department of Health, Physical Education and
Recreation at the Virginia Polytechnic Institute and State University, 1983.
```
Variable Description
------------- --------------------------------
Distance Distance travelled in feet
Hang Time in air in seconds
R_Strength Right leg strength in pounds
L_Strength Left leg strength in pounds
R_Flexibility Right leg flexibility in degrees
L_Flexibility Left leg flexibility in degrees
O_Strength Overall leg strength in pounds
```
Data File: `ADA2_CL_01_punting.csv`
## Source
The Relationship Between Selected Physical Performance Variables and Football
Punting Ability. Department of Health, Physical Education and Recreation,
Virginia Polytechnic Institute and State University, 1983.
---
# Rubric
1. Read the data set into R.
```{R}
library(erikmisc)
library(tidyverse)
# First, download the data to your computer,
# save in the same folder as this Rmd file.
# read the data
dat_punt <- read_csv("ADA2_CL_01_punting.csv", skip = 1)
str(dat_punt)
#dat_punt
```
2. Generate summaries `summary()` and frequency tables `table()` for each variable.
Answer questions 1--7.
```{R}
# I'll get you started with the code, the rest is up to you.
summary(dat_punt)
apply(dat_punt, 2, table)
```
Note that you can do even better than reading the numbers from above to answer the specific __quiz questions__.
Instead, you can (not required) write code that returns the specific values you want.
For example:
* 1. The minimum distance is `r min(dat_punt$Distance)` ft.
```
Quiz 1. What was the lowest recorded punting distance among the 13 participants?
Quiz 2. What was the highest recorded hang time among the 13 participants?
Quiz 3. Is the range of values for R_Strength the same or different than the range of values for L_Strength?
Quiz 4. What percentage of the sample has a L_Strength of 110 pounds?
Quiz 5. Is the range of values for R_Flexibility the same or different than the range of values for L_Flexibility?
Quiz 6. What percentage of the sample has a L_Flexibility of 106 degrees?
Quiz 7. What is the most common value for O_Strength (i.e., what is the modal value)?
```
3. (2 p) Plot $y=$`Distance` and $x=$`Hang` and interpret the plot in terms of
linearity and strength of correlation.
```{R}
# plot distance by hang
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
```
4. Calculate the Pearson correlation between `Distance` and `Hang`
(read the help for performing the hypothesis test).
Answer questions 8--9.
```
Quiz 8. What is the correlation between Distance and Hang?
Quiz 9. The corresponding p-value for the correlation between Distance and Hang is ____.
```
5. (2 p) Create a new categorical (factor) variable, `O_StrengthFac`, from the quantitative
variable overall leg strength (`O_Strength`) to indicate high leg strength:
code less than 200 as 0 (low leg strength) and at least 200 as 1 (high leg strength).
```{R}
# create categorical variable
```
Plot $y=$`Distance` and $x=$`O_StrengthFac` and interpret the comparison of
distance by strength group.
```{R}
# plot distance by strength group
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
```
6. Use a two-sample $t$-test (assume equal variance) to test whether
$H_0: \mu_{\textrm{low}} = \mu_{\textrm{high}}$, that the population means for distance are equal for the
two overall leg strength groups you created.
Answer questions 10--11.
```
Quiz 10. Is distance significantly associated with overall strength (categorical) at an alpha = 0.05 level?
Quiz 11. What is the mean distance in feet for the low and high strength groups, respectively?
```
7. (2 p) Plot $y=$`Distance` and $x=$`R_Flexibility` and interpret the relationship.
```{R}
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
```
8. Regress $y=$`Distance` on $x=$`R_Flexibility`.
Answer questions 12--13.
```
Quiz 12. What is the expected increase in distance for each degree increase in flexibility?
Quiz 13. Is distance significantly associated with flexibility at an alpha = 0.05 level?
```
9. Create a new variable which is the mean of the right leg and left leg
flexibility variables, `O_Flexibility`. Generate a frequency distribution for
this new variable.
Answer questions 14--15.
```
Quiz 14. What is the median value for your new variable that is the mean of the right and left leg flexibility?
Quiz 15. What percentage of the sample has a mean flexibility no more than 86 degrees?
```
10. (4 p) Upload your error-free program (html output as PDF file) showing your work and your
plots for additional points.