ADA2: Class 01, R, Review

Advanced Data Analysis 2, Stat 428/528, Spring 2023, Prof. Erik Erhardt, UNM

Author

Your Name

Published

December 17, 2022

Write R code to answer the quiz questions on Learn using the dataset below.

Rubric for grading

For these questions below:

    1. (2 p) plot and interpretation.
    1. (2 p) plot and interpretation.
    1. (2 p) plot and interpretation.
    1. (4 p) code and output appear correct, no errors.

Note that because the Quiz 1 questions also use this data, those questions are also in this document typeset in preformatted text, like this:

  Quiz 1. What was the lowest recorded punting distance among the 13 participants?

American Football Punters

Description

Investigators studied physical characteristics and ability in 13 football punters. Each volunteer punted a football ten times. The investigators recorded the average distance for the ten punts, in feet. They also recorded the average hang time (time the ball is in the air before the receiver catches it) for the ten punts, in seconds. In addition, the investigators recorded five measures of strength and flexibility for each punter: right leg strength (pounds), left leg strength (pounds), right hamstring muscle flexibility (degrees), left hamstring muscle flexibility (degrees), and overall leg strength (foot-pounds). From the study “The relationship between selected physical performance variables and football punting ability” by the Department of Health, Physical Education and Recreation at the Virginia Polytechnic Institute and State University, 1983.

Variable        Description
-------------   --------------------------------
Distance        Distance travelled in feet
Hang            Time in air in seconds
R_Strength      Right leg strength in pounds
L_Strength      Left leg strength in pounds
R_Flexibility   Right leg flexibility in degrees
L_Flexibility   Left leg flexibility in degrees
O_Strength      Overall leg strength in pounds

Data File: ADA2_CL_01_punting.csv

Source

The Relationship Between Selected Physical Performance Variables and Football Punting Ability. Department of Health, Physical Education and Recreation, Virginia Polytechnic Institute and State University, 1983.


Rubric

  1. Read the data set into R.
library(erikmisc)
── Attaching packages ─────────────────────────────────────── erikmisc 0.1.20 ──
✔ tibble 3.1.8     ✔ dplyr  1.1.0
── Conflicts ─────────────────────────────────────────── erikmisc_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
erikmisc, solving common complex data analysis workflows
  by Dr. Erik Barry Erhardt <erik@StatAcumen.com>
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.4
✔ ggplot2   3.4.1     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# First, download the data to your computer,
#   save in the same folder as this Rmd file.

# read the data
dat_punt <- read_csv("ADA2_CL_01_punting.csv", skip = 1)
Rows: 13 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (7): Distance, Hang, R_Strength, L_Strength, R_Flexibility, L_Flexibilit...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(dat_punt)
spc_tbl_ [13 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Distance     : num [1:13] 162 144 148 164 192 ...
 $ Hang         : num [1:13] 4.75 4.07 4.04 4.18 4.35 4.16 4.43 3.2 3.02 3.64 ...
 $ R_Strength   : num [1:13] 170 140 180 160 170 150 170 110 120 130 ...
 $ L_Strength   : num [1:13] 170 130 170 160 150 150 180 110 110 120 ...
 $ R_Flexibility: num [1:13] 106 92 93 103 104 101 108 86 90 85 ...
 $ L_Flexibility: num [1:13] 106 93 78 93 93 87 106 92 86 80 ...
 $ O_Strength   : num [1:13] 241 195 153 197 267 ...
 - attr(*, "spec")=
  .. cols(
  ..   Distance = col_double(),
  ..   Hang = col_double(),
  ..   R_Strength = col_double(),
  ..   L_Strength = col_double(),
  ..   R_Flexibility = col_double(),
  ..   L_Flexibility = col_double(),
  ..   O_Strength = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
#dat_punt
  1. Generate summaries summary() and frequency tables table() for each variable. Answer questions 1–7.
# I'll get you started with the code, the rest is up to you.
summary(dat_punt)
    Distance          Hang         R_Strength      L_Strength   
 Min.   :104.9   Min.   :3.020   Min.   :110.0   Min.   :110.0  
 1st Qu.:140.2   1st Qu.:3.640   1st Qu.:130.0   1st Qu.:130.0  
 Median :150.2   Median :4.040   Median :150.0   Median :150.0  
 Mean   :148.2   Mean   :3.921   Mean   :147.7   Mean   :143.8  
 3rd Qu.:163.5   3rd Qu.:4.180   3rd Qu.:170.0   3rd Qu.:160.0  
 Max.   :192.0   Max.   :4.750   Max.   :180.0   Max.   :180.0  
 R_Flexibility    L_Flexibility      O_Strength   
 Min.   : 85.00   Min.   : 78.00   Min.   :130.2  
 1st Qu.: 90.00   1st Qu.: 86.00   1st Qu.:153.9  
 Median : 93.00   Median : 93.00   Median :197.1  
 Mean   : 95.69   Mean   : 91.23   Mean   :196.2  
 3rd Qu.:103.00   3rd Qu.: 94.00   3rd Qu.:240.6  
 Max.   :108.00   Max.   :106.00   Max.   :266.6  
apply(dat_punt, 2, table)
$Distance

104.93 105.67 117.59 140.25    144  147.5 150.17    162  162.5  163.5 165.17 
     1      1      1      1      1      1      1      1      1      1      1 
171.75    192 
     1      1 

$Hang

3.02  3.2  3.6 3.64 3.68 3.85 4.04 4.07 4.16 4.18 4.35 4.43 4.75 
   1    1    1    1    1    1    1    1    1    1    1    1    1 

$R_Strength

110 120 130 140 150 160 170 180 
  1   2   1   2   1   2   3   1 

$L_Strength

110 120 130 140 150 160 170 180 
  2   1   2   1   3   1   2   1 

$R_Flexibility

 85  86  89  90  92  93  95 101 103 104 106 108 
  1   1   1   1   2   1   1   1   1   1   1   1 

$L_Flexibility

 78  80  83  86  87  92  93  94  95 106 
  1   1   1   1   1   1   3   1   1   2 

$O_Strength

130.24 132.68 152.99 153.92 154.64 195.49 197.09 205.88 219.25 240.57 260.56 
     1      1      1      1      1      1      1      1      1      2      1 
266.56 
     1 

Note that you can do even better than reading the numbers from above to answer the specific quiz questions. Instead, you can (not required) write code that returns the specific values you want. For example:

    1. The minimum distance is 104.93 ft.
  Quiz 1. What was the lowest recorded punting distance among the 13 participants?
  Quiz 2. What was the highest recorded hang time among the 13 participants?
  Quiz 3. Is the range of values for R_Strength the same or different than the range of values for L_Strength?
  Quiz 4. What percentage of the sample has a L_Strength of 110 pounds?
  Quiz 5. Is the range of values for R_Flexibility the same or different than the range of values for L_Flexibility?
  Quiz 6. What percentage of the sample has a L_Flexibility of 106 degrees?
  Quiz 7. What is the most common value for O_Strength (i.e., what is the modal value)?
  1. (2 p) Plot \(y=\)Distance and \(x=\)Hang and interpret the plot in terms of linearity and strength of correlation.
# plot distance by hang
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
  1. Calculate the Pearson correlation between Distance and Hang (read the help for performing the hypothesis test). Answer questions 8–9.
  Quiz 8. What is the correlation between Distance and Hang?
  Quiz 9. The corresponding p-value for the correlation between Distance and Hang is ____.
  1. (2 p) Create a new categorical (factor) variable, O_StrengthFac, from the quantitative variable overall leg strength (O_Strength) to indicate high leg strength: code less than 200 as 0 (low leg strength) and at least 200 as 1 (high leg strength).
# create categorical variable

Plot \(y=\)Distance and \(x=\)O_StrengthFac and interpret the comparison of distance by strength group.

# plot distance by strength group
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
  1. Use a two-sample \(t\)-test (assume equal variance) to test whether \(H_0: \mu_{\textrm{low}} = \mu_{\textrm{high}}\), that the population means for distance are equal for the two overall leg strength groups you created. Answer questions 10–11.
  Quiz 10. Is distance significantly associated with overall strength (categorical) at an alpha = 0.05 level?
  Quiz 11. What is the mean distance in feet for the low and high strength groups, respectively?
  1. (2 p) Plot \(y=\)Distance and \(x=\)R_Flexibility and interpret the relationship.
library(ggplot2)
# p <- ggplot(dat_punt, aes(x = , y = ))
# ...
# print(p)
  1. Regress \(y=\)Distance on \(x=\)R_Flexibility. Answer questions 12–13.
  Quiz 12. What is the expected increase in distance for each degree increase in flexibility?
  Quiz 13. Is distance significantly associated with flexibility at an alpha = 0.05 level?
  1. Create a new variable which is the mean of the right leg and left leg flexibility variables, O_Flexibility. Generate a frequency distribution for this new variable. Answer questions 14–15.
  Quiz 14. What is the median value for your new variable that is the mean of the right and left leg flexibility?
  Quiz 15. What percentage of the sample has a mean flexibility no more than 86 degrees?
  1. (4 p) Upload your error-free program (html output as PDF file) showing your work and your plots for additional points.