Include your answers in this document in the sections below the rubric.


There’s a lot we could talk about regarding statistical communication. For example, we can talk about lying with statistics (where numbers might be actually made up), or making a claim contradicted by statistical evidence (or not checking the evidence at all), or ignoring the baseline (failing to answer the question “compared to what?”), or making many arbitrary comparisons or data dredging until finding a significant result, or making misleading comparisons, or biases introduced by poor sampling (selection bias), or any number of other things. We all already know that 73% of statistics are just made up (including this one).

Below I’ve selected (psuedorandomly) a few ideas for us to think about and discuss in your tables with a peer mentor.

Key statistical principles

It is possible to lie (or to make mistakes) by ignoring some key statistical principles. Below, quickly give an example of each that you’ve seen this semester, or make up an example.

  1. (1 p) Correlation does not imply causation.

  2. (2 p) “Statistically significant” does not necessarily mean “important”.

  3. (2 p) Not “statistically significant” is not the same as zero (or no effect).

  4. (1 p) Misleading extrapolation.

Searching for statistical significance

A colleague who works at a university statistical consulting service reported the following story. A company wanted to get a drug approved, but their study appeared to have no statistically significant results. The researchers at the company broke up the data into subgroups in about 15 or 20 ways, and then they found something significant.

  1. (2 p) Is this data manipulation? What should the statistician do? In this case, the company reported the results and their stock went up 50%.

Models for guessing on multiple-choice exams

Consider a test with several true/false questions. If all the students answer a question correctly, then presumably they all know the answer. Now suppose that half the students get a certain question correct. Then, how many students do you think knew the correct answer? 50%? One possibility is that none of the students knew the correct answer, and they were all guessing!

Now consider a question that is answered correctly by 80% of the students. If a student chosen at random knows the correct answer with probability \(p\), or guesses with probability \(1-p\), then we can write, approximately, \(p+0.5(1-p) = 0.8\), which yields the estimate \(p = 0.6\). Thus, a reasonable guess is that 60% of the students actually know the correct answer and 40% were guessing. The conditional probability of a student knowing the correct answer, given that he or she answered the question correctly, is \(60\%/80\% = 0.75\).

Where is the ethical dilemma here? Consider now the task of giving each student a total grade for the exam. The reasoning above suggests that they should get no credit for correctly answering the question that 50% of the students answered correctly (since the evidence is that they were all guessing), and they should get 0.75 credit for answering the question that 80% answered correctly, and so forth. Thus, the amount of points that a question is worth should depend on the probability that a student’s correct answer was not due to guessing.

  1. (2 p) The ethical question is of the fairness of deciding the grading system after the exam has been taken by the students. Is it fair for two students to get the same number of questions correct on the exam but different total scores, because the “circumstantial evidence” suggests that one of the students was more likely than the other to be guessing?