The following data are from a study on the compressive strength of an alloy fastener used in the construction of aircraft. Ten pressure loads, increasing in units of 200 psi from 2500 psi to 4300 psi, were used with different numbers of fasteners being tested at each of the loads. The table below gives the number of fasteners failing out of the number tested at each load.
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
dat_fastener <-read_csv("ADA2_CL_17_fastener.csv" ) %>%# Augment the dataset with the `Load` we wish to predict in a later part.bind_rows(c(Load =3400, Tested =NA, Failed =NA) )
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): Load, Tested, Failed
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(ggplot2)p <-ggplot(dat_fastener, aes(x = Load, y = p_hat))p <- p +geom_smooth(se =FALSE)p <- p +geom_point()p <- p +scale_y_continuous(limits =c(0,1))p <- p +labs(title ="Observed proportion of fasteners failing at each load")print(p)
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Comment on how the proportion of failures depends on load.
Solution
[answer]
(2 p) Compute the empirical logits and interpret plot against load
Compute the empirical logits at each load.
# emperical logits## replace "NA" with the equation for the empirical logits## SOLUTION HERE# emperical logitsdat_fastener <- dat_fastener %>%mutate(emp_logit =NA )
Present a graphical summary that provides information on the adequacy of a logistic regression model relating the probability of fastener failure as a function of load.
library(ggplot2)p <-ggplot(dat_fastener, aes(x = Load, y = emp_logit))p <- p +geom_point()p <- p +geom_smooth(method = lm, se =FALSE)p <- p +labs(title ="Empirical logits")print(p)
`geom_smooth()` using formula = 'y ~ x'
Interpret the plot regarding whether the empirical logits appear linear.
Solution
[answer]
(2 p) Fit a logistic regression model, interpret deviance lack-of-fit
Fit a logistic model relating the probability of fastener failure to load.
glm_fa <-glm(cbind(Failed, Tested - Failed) ~ Load , family = binomial , data = dat_fastener )# Test residual deviance for lack-of-fit (if > 0.10, little-to-no lack-of-fit)dev_p_val <-1-pchisq(glm_fa$deviance, glm_fa$df.residual)dev_p_val
[1] 0.999957
Look at the residual deviance lack-of-fit statistic. Is there evidence of any gross deficiencies with the model?
Solution
[answer]
(2 p) Interpret logistic regression coefficients
Does load appear to be a useful predictor of the probability of failure? Interpret the hypothesis test.
summary(glm_fa)
Call:
glm(formula = cbind(Failed, Tested - Failed) ~ Load, family = binomial,
data = dat_fastener)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.29475 -0.11129 0.04162 0.08847 0.35016
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.3397115 0.5456932 -9.785 <2e-16 ***
Load 0.0015484 0.0001575 9.829 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 112.83207 on 9 degrees of freedom
Residual deviance: 0.37192 on 8 degrees of freedom
(1 observation deleted due to missingness)
AIC: 49.088
Number of Fisher Scoring iterations: 3
Solution
[answer]
(1 p) Write model equation
Provide an equation relating the fitted probability of fastener failure to the load on the probability scale: \(\tilde{p} = \ldots\). I have provided the equation on the logit scale.
The MLE of the predicted probabilities satisfy the logit equation \[
\log \left( \frac{\tilde{p}}{1-\tilde{p}} \right)
=
-5.34
+ 0.00155 \textrm{ Load}
.
\]
Solution
[answer]
(0 p) Plot the fitted probabilities as a function of Load
I’ll give you this one for free.
# predict() uses all the Load values in dataset, including appended valuesfit_logit_pred <-predict( glm_fa , data.frame(Load = dat_fastener$Load) , type ="link" , se.fit =TRUE ) %>%as_tibble()# put the fitted values in the data.framedat_fastener <- dat_fastener %>%mutate(fit_logit = fit_logit_pred$fit , fit_logit_se = fit_logit_pred$se.fit# added "fit_p" to make predictions at appended Load values , fit_p =exp(fit_logit) / (1+exp(fit_logit))# CI for p fitted values , fit_p_lower =exp(fit_logit -1.96* fit_logit_se) / (1+exp(fit_logit -1.96* fit_logit_se)) , fit_p_upper =exp(fit_logit +1.96* fit_logit_se) / (1+exp(fit_logit +1.96* fit_logit_se)) )
library(ggplot2)p <-ggplot(dat_fastener, aes(x = Load, y = p_hat))# predicted curve and point-wise 95% CIp <- p +geom_ribbon(aes(x = Load, ymin = fit_p_lower, ymax = fit_p_upper), alpha =0.2)p <- p +geom_line(aes(x = Load, y = fit_p), colour ="red")# fitted valuesp <- p +geom_point(aes(y = fit_p), size =2, colour ="red")# observed valuesp <- p +geom_point(size =2)p <- p +scale_y_continuous(limits =c(0, 1))p <- p +labs(title ="Observed and predicted probability of fastener failure" , y ="Probability" )print(p)