---
title: "ADA1: Class 24, Statistical Communication"
author: "Your Name Here"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
toc: true
---
Include your answers in this document in the sections below the rubric.
# Rubric
---
There's a lot we could talk about regarding statistical communication.
For example, we can talk about lying with statistics (where numbers might be actually made up),
or making a claim contradicted by statistical evidence (or not checking the evidence at all),
or ignoring the baseline (failing to answer the question "compared to what?"),
or making many arbitrary comparisons or data dredging until finding a significant result,
or making misleading comparisons,
or biases introduced by poor sampling (selection bias),
or any number of other things.
We all already know that 73% of statistics are just made up (including this one).
Below I've selected (psuedorandomly) a few ideas for us to think about and
discuss in your tables with a peer mentor.
# Key statistical principles
It is possible to lie (or to make mistakes) by ignoring some key statistical principles.
Below, quickly give an example of each that you've seen this semester, or make up an example.
1. __(1 p)__ Correlation does not imply causation.
2. __(2 p)__ "Statistically significant" does not necessarily mean "important".
3. __(2 p)__ Not "statistically significant" is not the same as zero (or no effect).
4. __(1 p)__ Misleading extrapolation.
# Searching for statistical significance
A colleague who works at a university statistical consulting service reported
the following story. A company wanted to get a drug approved, but their study
appeared to have no statistically significant results. The researchers at the
company broke up the data into subgroups in about 15 or 20 ways, and then they
found something significant.
5. __(2 p)__ Is this data manipulation? What should the statistician do? In
this case, the company reported the results and their stock went up 50%.
# Models for guessing on multiple-choice exams
Consider a test with several true/false questions. If all the students answer a
question correctly, then presumably they all know the answer. Now suppose that
half the students get a certain question correct. Then, how many students do
you think knew the correct answer? 50%? One possibility is that none of the
students knew the correct answer, and they were all guessing!
Now consider a question that is answered correctly by 80% of the students. If a
student chosen at random knows the correct answer with probability $p$, or
guesses with probability $1-p$, then we can write, approximately, $p+0.5(1-p) =
0.8$, which yields the estimate $p = 0.6$. Thus, a reasonable guess is that 60%
of the students actually know the correct answer and 40% were guessing. The
conditional probability of a student knowing the correct answer, given that he
or she answered the question correctly, is $60\%/80\% = 0.75$.
_Where is the ethical dilemma here?_ Consider now the task of giving each
student a total grade for the exam. The reasoning above suggests that they
should get no credit for correctly answering the question that 50% of the
students answered correctly (since the evidence is that they were all
guessing), and they should get 0.75 credit for answering the question that 80%
answered correctly, and so forth. Thus, the amount of points that a question is
worth should depend on the probability that a student's correct answer was not
due to guessing.
6. __(2 p)__ The ethical question is of the fairness of deciding the grading
system after the exam has been taken by the students. Is it fair for two
students to get the same number of questions correct on the exam but different
total scores, because the "circumstantial evidence" suggests that one of the
students was more likely than the other to be guessing?