ABDA

UNM Stat 579: Applied Bayesian Data Analysis

This course is cancelled for Fall 2016.

I’m sad to cancel this course, which I was extremely excited for.  Because I was named a 2016-17 UNM Teaching Fellow, I will instead be focused on a “flipped”-style redesign of Stat 145 during the 2016-17 academic year.  I intend to revisit this course after my 2017-18 sabbatical.

Goal

Learn to apply Bayesian data analysis methods (generalized linear multilevel modeling (GLMMs)) by using modern statistical software (R, Rstudio) and MCMC sampling (Stan) while producing beautiful (LaTeX) and reproducible (knitr) reports with informative plots (ggplot2) and tables (xtable), which you’ll be proud to present (poster/presentation).  If successful, this course will make Bayesian analysis your default statistical methodology.

ThisIsStatisticsWhy You Need to Study Statistics

Fall 2016 Syllabus is below table


Fall 2016 schedule; Time: TR 1230-1345; Location: CENT 1032; Stat 579.002, CRN 56760


News:

News will appear here.

Did you receive a registration error? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the pre-reqs)?
If you are waitlisted and qualified and we have enough seats, I will override you into the course. Don’t worry.

Step 0: Before our first class please read through the following and install the required software on your computer. If you don’t have a computer, let me know and I will help you find a resource.

  1. Create a Google account (if you don’t already have one) to use with crowdgrader.
  2. Sign up for crowdgrader (which uses gmail account).
  3. Complete a quick 6-question survey so I can link your crowdgrader gmail account with your UNM user ID for homework assignments.
  4. Install or upgrade R (windows or mac) then Rstudio. Videos that may be helpful:
  5. Install R packages, also update all packages within RStudio.
  6. Install RStan (to interface with R)
  7. Install the Rethinking R package.
  8. Install Mendeley.
  9. Install LaTeX (for poster at end of semester).

Timetable

Each week has this structure:

  1. Pre-class (Tuesday): Reading, Video, Quiz (due before class — solutions become available Tue 12:30, after the quiz is due)
  2. In-class: Activities in class Tuesday submitted to UNM Learn (evaluated by TA within 1 week), Tues 5pm turn in what you have, Wed 5pm turn in completed assignment. Thursday we will start the homework in class to allow you to struggle but get questions answered before finishing on your own.
  3. Post-class (Thursday): Homework (crowdgrader, due following Thursday before class)
  4. Post-class (Following Thursday-Tuesday): Grading (crowdgrader, following 1 week + Tuesday before class)

We will use:
UNM Learn
Video lectures
In-class team assignments.

Course notes and code

will be posted here.

Wk-Date Cl Topic Reading, Video, Quiz In-class Worksheet, Data Homework HW Submit Grading Due before class

 



Syllabus

Description: Learn to apply Bayesian data analysis methods (generalized linear multilevel modeling (GLMMs)) by using modern statistical software (R, Rstudio) and MCMC sampling (Stan) while producing beautiful (LaTeX) and reproducible (knitr) reports with informative plots (ggplot2) and tables (xtable), which you’ll be proud to present (poster/presentation).   If successful, this course will make Bayesian analysis your default statistical methodology.
Prerequisite: Stat 428/528 (ADA2, or equivalent) and Stat 561 (Probability, can take concurrently)
Semesters offered: Varies
Lecture: Time: TR 1230-1345; Location: CENT 1032; Stat 579.002, CRN 56760 Video
Office hours: TBA, and by appointment in SMLC 312
email: “Erik B. Erhardt” <erike@stat.unm.edu>, please include “ABDA” in subject line
Textbooks:
Required:  Statistical Rethinking, by Richard McElreath. Buy it here: CRC Press, Amazon.com (Don’t buy the Kindle version.) (book review)
Recommended: ARM by Gelman and Hill; BDA3 by Andrew Gelman, et al.; BIDA by Christensen et al
Laptops running R: You are required to bring a laptop to class each day to participate in the exercises.  If you don’t have a laptop, let me know and I will help you find one to use for the semester.

Student learning outcomes (not yet updated)

At the end of the course, you will be able to:
General outcomes:

  1. Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment.
  2. Distinguish a testable scientific hypothesis or data-supported interpretation from an opinion.
  3. Understand from a data story the goals of the study and apply the correct statistical procedure.
  4. Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.

Topical outcomes:

  1. Define parameters of interest and hypotheses in words and notation.
  2. Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, five-number summary, confidence intervals, and p-values, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
  3. Distinguish between statistical significance and scientific relevance.
  4. Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
  5. Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
  6. Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
  7. Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
  8. Make evidence-based decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
  9. Discover relationships and make predictions through model development and selection.

Meeting the learning outcomes

You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.

Learning in this class occurs by:

  1. Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
  2. Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
  3. Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.

Assessment

This is roughly correct.  I will adjust this by the start of the semester.

  • Quizzes will be due each Tuesday before class.  Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade, the lowest few are dropped.)  Most weeks plan for 1-2 hours reading and video, 20 minute quiz.
  • In-class assignments are due each day by the end of day (midnight), submitted to UNM Learn.  Purpose: to struggle and find success in class with the concepts and skills. (About 24, includes class participation, 20% of final grade, the lowest several are dropped.) Most weeks plan to finish in class.
  • Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to crowdgrader.org (75% of HW grade). Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of final grade, the lowest few are dropped.) Most weeks plan on 1-4 hours per assignment.
  • Peer grading is due by the following Tuesday after each homework is due (25% of HW grade). Purpose: to gain skill assessing the work of others, as well as see alternative strategies to answer questions.  Most weeks this will take about 30 minutes to grade 5 other students’s HW.
  • Poster will be developed through semester (most HW assignment contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.)  In the last couple weeks, assembling this poster may take 3-5 hours, using a template provided to you.
  • Course surveys are due at the beginning and end of the course. Purpose: to participate in national project-based learning projects and improve course.  (About 2, 4% of final grade.)

Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.95 for graduate students and 0.90 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95], so that your grade is slightly higher than you earned.

 

All homework assignments in this class are electronic, submitted to UNM Learn or crowdgrader.com for grading, except for the final poster.

Crowdgrader:

  1. Students usually get far more feedback on their work than they would get from over-worked teaching assistants/faculty.
  2. Students get to see what other students are doing, and they can learn from the work of others (taking the best ideas, and leaving the rest).
  3. In exchange for this, they need to put in some amount of work in reviewing the work of others.
  4. It is important that students understand that their final grade is determined both by the quality of their work, and by the precision of the grades they give, and the helpfulness of the reviews they write.

Late assignments will not be accepted.

Rubrics guide assessment (and self-assessment) of homework, code, projects, exams, and presentations.  Each assignment will have its own specific rubric.

All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier).
Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution (in a non-fixed-width font) should be provided in addition to R output.

Collaboration and citation

For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).

As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me.

I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).

Disability statement

If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.


Our Classroom

Why the round tables, video lectures, and in-class teamwork on assignments?  We’re doing this because:

  • I want you to be empowered with statistics.
  • I believe everyone should get out of this course with awesome skills.
  • Real-time feedback promotes efficient learning.

“It encourages me to engage actively with the course material and take responsibility for my learning.”

GAISE Connections

Our six recommendations include the following:

  1. Emphasize statistical literacy and develop statistical thinking
  2. Use real data
  3. Stress conceptual understanding, rather than mere knowledge of procedures
  4. Foster active learning in the classroom
  5. Use technology for developing conceptual understanding and analyzing data
  6. Use assessments to improve and evaluate student learning

Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
– Confucius


Archive

Help

LaTeX wiki, lshort, Detexify LaTeX symbols (linux texlive package management)
R tutorials: TryR (gentle), Kelly Black
R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results.
Knitr in Rstudio (knitr is modern version of Sweave introdemoguide)
xtable to produce LaTeX tabular environment from R data.frames
Cookbook for R for helpful examples, visualization tutorials, diagrams
Image formats: vector (pdf, eps) vs raster (jpeg, bmp, tiff, gif)

Resources

UNM has license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these.

R is currently available in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB IT-LoboLab Pod and IT-LoboLab Classroom.

style matters. There is a lot of online help on R, such as at UCLA, try-r, and Google’s Intro to R video series. Usually try searching for “R [mytopic]” and you’ll get lots of results.  ggplot2 plotting cookbook.

R reference card by Jonathan Baron.

Translate between MATLAB and R.

Figure checklist.  Choosing the right chart.  Nature Methods points of view on visualization.

Statistical consulting and collaboration slides

Raster vs vector graphics.

Statistics pre-req refresher from Khan Academy.

Coursera has a free 4-week course on computing for data analysis with R.

Muddy points in perspective.

R+LaTeX+knitr for reproducible research.  See my SC1 lecture notes (Ch01).

Asking smart questions

Smart Questions” guide (note “hackers build things, crackers break them”)
Email Question Rubric:
* Send one email per question.
— Use “Reply” to continue conversation on a question; send a new email for a new question.
*  Include “ADA1” as the first word of the subject line in new emails (if replying, just use reply).
*  Begin email with a short question summary.
*  When possible, include commented code in email body
— Comments should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce problem.
— Code should include a “Minimum representative test cast” (http://www.catb.org/esr/faqs/smart-questions.html#code)
*  If attaching code, please include all the files necessary to run your code (data, etc.).

Acumen in Statistics