ADA2 S16

Archived from Spring 2016 (Current year here.)

UNM Stat 428/528: Advanced Data Analysis II (ADA2)

Spring 2016 Syllabus is below table. Spring 2016 schedule; Time: TR 1530-1645; Location: DSH 224; Stat 428, CRN 25445; Stat 528, CRN 25449 + Peer mentors via UNM Stat 495/595: Statistics Education Practicum (SEP) Stat 495.— or Stat 595.—, CRN —– or —–


This Is Statistics

Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (xtable) by writing code (R, Rstudio) to answer questions using fundamental statistical methods (analysis of covariance, logistic regression, and multivariate methods), which you’ll be proud to present (poster).



See you next semester.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files. I recommend using a very systematic folder structure, such as a main folder called Stat428_ADA2, with subfolders called homework, in-class, reading, poster, etc.

Course content

Weekly structure (also see Assessment below)

  1. Pre-class (Tuesday): Reading, Video, Quiz (due before class — solutions become available Tue 3:30, after the quiz is due)
  2. In-class: Activities in class Tuesday submitted to UNM Learn (evaluated by TA within 1 week), Tues 5pm turn in what you have, Wed 5pm turn in completed assignment. Thursday we will start the homework in class to allow you to struggle but get questions answered before finishing on your own.
  3. Post-class (Thursday): Homework (crowdgrader, due following Thursday before class, feedback available until Tuesday after grading).  Assignments will be common for all students.
  4. Post-class (Following Thursday-Tuesday): Grading (crowdgrader, following 1 week + Tuesday before class)
UNM Learn for content, YouTube Video playlist (try 1.5 speed, then pause/rewatch as you need). Video: Upgrading R on Windows.

Course notes, code, data, and video lectures

Notes from Spring 2016: ADA2_notes_S16.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at
Ch Chapter Title Notes R code Datasets Video lectures playlist
01 R statistical software and review pdf R turkey.csv, rocket.dat 01-1, 01-2
02 Introduction to Multiple Linear Regression pdf R indian.dat, gce.dat 02-1, 02-2
03 A Taste of Model Selection for Multiple Regression pdf R ratliver.csv 03-1, 03-2
04 One Factor Designs and Extensions pdf R none 04
05 Paired Experiments and Randomized Block Experiments pdf R battery.dat, beetles.dat, itch.csv, ratinsulin.dat 05-0 05-1 05-2 05-3 05-4 05-5 05-6 05-7 05-8 05-9
06 A Short Discussion of Observational Studies pdf R sat.dat 06
07 Analysis of Covariance: Comparing Regression Lines pdf R tools.dat, toolsfake.dat, twins.dat 07-1 07-2 07-3 HW helper video
08 Polynomial Regression pdf R cloudpoint.dat, mooney.dat 08-1 08-2
09 Discussion of Response Models with Factors and Predictors pdf R faculty.dat 09-1 09-2 09-3
10 Automated Model Selection for Multiple Regression pdf R oxygen.dat 10-1 10-2 10-3
11 Logistic Regression pdf R beetles.dat, leuk.dat, menarche.csv, shuttle.csv, trauma.dat 11-1 11-2 11-3 11-4
12 An Introduction to Multivariate Methods pdf R none 12
13 Principal Component Analysis pdf R bgs.dat, shells.dat, sparrows.dat, temperature.dat 13-1 13-2 13-3
14 Cluster Analysis pdf R birthdeath.dat, teeth.dat 14-1 14-2 14-3
15 Multivariate Analysis of Variance pdf R shells_mf.dat 15
16 Discriminant Analysis pdf R mower.dat 16-1 16-2
17 Classification pdf R business.dat 17-1 17-2 17-3
18 Data Cleaning pdf R conversions.txt, dalton.txt, dirty_iris.csv, edits.txt, people.txt, unnamed.txt
lm_diag_plots.R function for a large set of standard diagnostic plots

Passion Driven Statistics (PDS) data

Install PDS package. AddHealthW1 Sampling Design, Codebook, RData. AddHealthW4  Sampling Design, Codebook, RData. NESARC  Sampling Design, Codebook, RData. OutlookOnLife  Sampling Design, Codebook, RData. GapMinder  Sampling Design, Codebook, RData. (I reserve the right to continue to improve the materials throughout the semester.)


Wk-Date Cl Topic Reading, Video, Quiz In-class Worksheet, Data Homework HW Submit Grading Due before class
00-01/18 00 Install software See Step 0 video: 00
01-01/19 01 01 R, Review read: Ch 01 video: 01-1, 01-2 quiz: in-class 01 R Review Rmd html dat Videos: 1, 2, 3 We will grade this assignment in class on Thursday, plan to finish it in the first half of Thursday’s class. Note: numbers refer to week numbers
01-01/21 02 3:30 Submit as HW to CG –> 4:00 Grade 4:20 Review feedback 4:30 Calculate grades, discuss Turn in and grade: 00 crowdgrader 1/21 Submit 1/21 Grade No HW 01
02-01/26 03 02 Introduction to Multiple Linear Regression read: Ch 02 video: 02-1, 02-2 quiz: 02 In-class: Rmd html dat Submit your html with solutions by 5pm. Quiz 02
02-01/28 04 02 Mult LR Rmd html dat Submit your html to crowdgrader. 02 crowdgrader 2/4 Submit 2/9 Grade solutions
03-02/02 05 03 A Taste of Model Selection for Multiple Linear Regression read: Ch 03, 04 video: 03-1, 03-2, 04 quiz: 03 In-class: Rmd html dat sol: Rmd html Submit your html with solutions by 5pm. Quiz 03
03-02/04 06 04 Experimental Design: One and Two Factor Designs 03 Taste Model Sel Rmd html dat 03 crowdgrader 2/11 Submit 2/16 Grade solutions Turn in HW 02
04-02/09 07 05 Paired Experiments and Randomized Block Designs read: Ch 05 (start – 5.2) video: 05-0 05-1 05-2 05-3 05-4 05-5 quiz: 04 In-class: Rmd html Quiz 04, Grade HW 02
04-02/11 08 04 Experiments 1 Rmd html 04 crowdgrader 2/18 Submit 2/23 Grade solutions Turn in HW 03
05-02/16 09 read: Ch 05 (5.3 – end) video: 05-6 05-7 05-8 05-9 quiz: 05 In-class: Rmd html dat Quiz 05, Grade HW 03
05-02/18 10 05 Experiments 2 Rmd html dat 05 crowdgrader 2/25 Submit 3/1 Grade solutions Turn in HW 04
06-02/23 11 06 Discussion of Observational Studies read: Ch 06-07 video: 06 07-1 07-2 07-3 quiz: 0607 In-classhtml turn in paper version Quiz 06, Grade HW 04
06-02/25 12 07 Analysis of Covariance: Comparing Regression Lines 06 ANCOVA 1 Rmd html dat 06 crowdgrader 3/3 Submit 3/8 Grade solutions Turn in HW 05
07-03/01 13 08 Polynomial Regression read: Ch 08-1 08-2 09-1 09-2 09-3 video: quiz: 07 In-class: Rmd html dat Quiz 07, Grade HW 05
07-03/03 14 09 Response Models with Factors and Predictors 07 ANCOVA 2 Rmd html dat Helper video 07 crowdgrader 3/10 Submit 5pm 3/22 Grade a solution Turn in HW 06
08-03/08 15 10 Model Selection for Multiple Regression read: Ch 10 video: 10-1 10-2 10-3 quiz: 08 Quiz 08, Grade HW 06
08-03/10 16 08 Model Selection Turn in HW 07
09-03/15 17 Spring Break
09-03/17 18 Spring Break
10-03/22 19 11 Logistic Regression read: Ch 11 video: 11-1 11-2 11-3 11-4 quiz: 10 In-class: Rmd html dat Choose/define poster project requiring a method from class: ANCOVA, Logistic multiple regression, PCA, etc. Poster Planning Rmd html Due 3/29 Quiz 10, Grade HW 07
10-03/24 20 10 Logistic Regression Rmd html dat 10 crowdgrader 3/31 Submit 4/5 Grade solutions
11-03/29 21 12 An Introduction to Multivariate Methods read: Ch 12-13 video: 12 13-1 13-2 13-3 quiz: 11 In-class: Rmd html dat Quiz 11
11-03/31 22 13 Principal Components Analysis (PCA) 11 PCA Rmd html dat 11 crowdgrader 4/7 Submit 4/12 Grade solutions Turn in HW 10
12-04/05 23 14 Cluster Analysis read: Ch 14-15 video: 14-1 14-2 14-3 15 quiz: 12 In-class: Clustering Rmd html dat Quiz 12, Grade HW 10
12-04/07 24 15 Multivariate Analysis of Variance (MANOVA) 12 MANOVA Rmd html dat 12 crowdgrader 4/14 Submit 4/19 Grade solutions Turn in HW 11
13-04/12 25 16 Discriminant Analysis 17 Classification read: Ch 16-17 video: 16-1 16-2 17-1 17-2 17-3 quiz: 13 In-class: Discriminant analysis for classification Rmd html dat Quiz 13, Grade HW 11
13-04/14 26 13+11+17 PCA and logistic regression classifcation 13+11+17 PCA and logistic Classification Rmd html dat 13 crowdgrader 4/21 Submit 4/26 Grade solutions Turn in HW 12
14-04/19 27 Posters begin Poster document 1/2: Analysis, Due Friday 4/22 Rmd html Grade HW 12
14-04/21 28 14 crowdgrader 4/22 Submit 4/26 Grade solutions Turn in HW 13, Turn in Poster Doc 1/2 Fri 4/22
15-04/26 29 Poster document 2/2: Intro/Discuss/Bib, Due Friday 4/29 Rmd html Grade HW 13
15-04/28 30 15 crowdgrader 4/29 Submit 5/3 Grade solutions Turn in Poster Doc 2/2 Fri 4/29
16-05/03 31 Survey Poster finalize Poster template pdf, Rnw, sty, bib, logo Example poster pdf, Rnw Transition from Markdown to LaTeX Video for poster transition Poster printing ARI Graphix $9 poster printing Open Mon-Fri 7:30-5:30 Do not use their website! Do: Email, indicate to print “in color on bond paper” and attach poster pdf file. Price is $0.75/sq ft. 4716 McLeod Rd NE
16-05/05 32 POSTERS Poster session in SMLC lobby 3:30-5:30pm Poster reviewing rubric Submit poster pdf to UNM Learn Fri 5/6 5pm
17-05/10 FINALS WEEK (no final) Surveys Due — submit receipt or confirmation page to UNM LearnLearning StudioEvalKit in Learn Surveys Due 5/10 5pm


Description: A continuation of 427/527 that focuses on methods for analyzing multivariate data and categorical data. Topics include MANOVA, principal components, discriminant analysis, classification, factor analysis, analysis of contingency tables including log-linear models for multidimensional tables and logistic regression. Prerequisite: Stat 427 (ADA1) Semesters offered: Spring Lecture: Stat 428/528.001 (CRN 25445 or 25449), TR 1530-1645, DSH 224 Video email: “Erik B. Erhardt” <>, please include “ADA2” in subject line Textbook: Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 978-0-387-79053-4. The book is not required, but it will provide a backup for what you learn in class. Office hours: SMLC 312, TR 1300-1500 Laptops running R: I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, there are some laptops in class and teamwork is encouraged — sit next to someone friendly who likes to share.

Teaching Assistants and Peer Mentors

Stat grad students TAs

Chauntal Andrews <>, office hours Mon 0900-1100 in SMLC 301

Peer Mentors

Carrie Booth <>, Education grad student, ADA course alumnus and Delta Alpha Pi Honor Society (Disability Achievement Pride) Member John Pesko, Stat PhD candidate Igor Litvinovich, Stat graduate student Adam Barkalow, ADA course alumnus

Student learning outcomes

Similar as in ADA1, but at a higher level.


  • Quizzes will be due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade, the lowest few are dropped.) Most weeks plan for 1-3 hours reading and video, 30-60 minute quiz.
  • In-class assignments are due each day by 5pm, submitted to UNM Learn. Purpose: to struggle and find success in class with the concepts and skills. (About 12, includes class participation, 20% of final grade, the lowest few are dropped.) Plan to start and finish in class.
  • Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to (75% of HW grade). Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of final grade, the lowest few are dropped.) Most weeks plan on 2-8 hours per assignment.
  • Peer grading is due by the following Tuesday after each homework is due (25% of HW grade). Purpose: to gain skill assessing the work of others, as well as see alternative strategies to answer questions. Most weeks this will take about 45 minutes to grade 5 other students’s HW.
  • Poster will be developed and completed in the last weeks of the semester, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 2% preparation, 10% poster, 2% presentation, and 2% evaluations of others of final grade.) In the last couple weeks, assembling this poster may take 3-5 hours, using a template provided to you.
  • Course surveys are to collect information to help facilitate the class or to encourage participation in course evaluations. Purpose: to participate in national project-based learning projects and improve course. (About 2, 4% of final grade [and a simple way to go from B+ to A].)
Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.95 for graduate students and 0.90 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95], so that your grade is slightly higher than you earned. Student Attendance:  If a student has more than 3 absences, I reserve the right to assign to that student a WF and drop mid-semester or assign an F at the end of the semester without warning.  Students in this situation need to speak with Erik immediately. All homework assignments in this class are electronic, submitted to for grading, except for the final poster.


  1. Students usually get far more feedback on their work than they would get from over-worked teaching assistants/faculty.
  2. Students get to see what other students are doing, and they can learn from the work of others (taking the best ideas, and leaving the rest).
  3. In exchange for this, they need to put in some amount of work in reviewing the work of others.
  4. It is important that students understand that their final grade is determined both by the quality of their work, and by the precision of the grades they give, and the helpfulness of the reviews they write.
Late assignments will not be accepted. Rubrics guide assessment (and self-assessment) of homework, code, projects, exams, and presentations. Each assignment will have its own specific rubric. All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier). Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution (in a non-fixed-width font) should be provided in addition to R output.

Collaboration and citation

For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.). As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me. I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).


Disability statement

If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB. Peer mentor Carrie Booth <>, Education grad student, course alumnus, and member of the Delta Alpha Pi Honor Society (Disability Achievement Pride) has a background in special education and is familiar with challenges surrounding learning and accessibility for students with disabilities in college. She has offered to be available, if you choose to seek her out, for assistance. I legally can’t connect her to you, but you can let her know if you have needs she’ll be particularly qualified to be helpful with. I’m glad to have her available since one of her goals through DAP is to reduce stigma of students with disabilities in academia through visibility, achievement, and community service.

Title IX statement

In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15).   This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity. For more information on the campus policy regarding sexual misconduct.

Our Classroom

We’re doing this because:
  • We want you to be empowered with statistics.
  • We believe everyone should get out of this course with awesome skills
  • Real-time feedback promotes efficient learning
“It encourages me to engage actively with the course material and take responsibility for my learning.”

GAISE Connections

Our six recommendations include the following:
  1. Emphasize statistical literacy and develop statistical thinking
  2. Use real data
  3. Stress conceptual understanding, rather than mere knowledge of procedures
  4. Foster active learning in the classroom
  5. Use technology for developing conceptual understanding and analyzing data
  6. Use assessments to improve and evaluate student learning

Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius


Did you receive a registration error? Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-reqs)? If you are waitlisted and qualified and we have enough seats, I will override you into the course. Don’t worry. Step 0: Before our first class (Tue 1/19) please read through the following and install the required software on your computer. If you don’t have a computer, there are classroom computers which will be of limited availability when the room is open.
  1. Create a Google account (if you don’t already have one) to use with crowdgrader.
  2. Sign up for crowdgrader (which uses gmail account).
  3. Complete a quick 5-question survey so I can link your crowdgrader gmail account with your UNM user ID for homework assignments.
  4. Install or upgrade R (windows or mac) then Rstudio. Videos that may be helpful:
  5. Install R packages, also update all packages within RStudio.
  6. Install Mendeley.
  7. Install LaTeX (for poster at end of semester).

Random stuff

innovationAcademy video UNM has license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these. R is currently available in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB IT-LoboLab Pod and IT-LoboLab Classroom. R style matters. There is a lot of online help on R, such as at UCLA, try-r, and Google’s Intro to R video series. Usually try searching for “R [mytopic]” and you’ll get lots of results. ggplot2 plotting cookbook. R reference card by Jonathan Baron. Translate between MATLAB and R. Figure checklist. Choosing the right chart. Nature Methods points of view on visualization. Statistical consulting and collaboration slides Raster vs vector graphics. Statistics pre-req refresher from Khan Academy. Coursera has a free 4-week course on computing for data analysis with R. Muddy points in perspective. R+LaTeX+knitr for reproducible research. See my SC1 lecture notes (Ch01), and Mohammad Arbabshirani’s notes (pdf, rnw). Asking smart questionsSmart Questions” guide (note “hackers build things, crackers break them”) Email Question Rubric: * Send one email per question. — Use “Reply” to continue conversation on a question; send a new email for a new question. * Include “ADA2” as the first word of the subject line in new emails (if replying, just use reply). * Begin email with a short question summary. * When possible, include commented code in email body — Comments should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce problem. — Code should include a “Minimum representative test cast” ( * If attaching code, please include all the files necessary to run your code (data, etc.). Help: LaTeX wiki, lshort, Detexify LaTeX symbols (linux texlive package management) R tutorials: TryR (gentle), Kelly Black R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results. Knitr in Rstudio (knitr is modern version of Sweave intro, demo, guide) xtable to produce LaTeX tabular environment from R data.frames Cookbook for R for helpful examples, visualization tutorials, diagrams Image formats: vector (pdf, eps) vs raster (jpeg, bmp, tiff, gif)

Why stats now?

140,000 analysts needed. Important enough to have a US Chief Data Scientist (1) (2).

Citing and using notes, including previous editions

Citing lecture notes, example: Erhardt EB, Bedrick EJ, and Schrader RM. (2016) Lecture notes for Advanced Data Analysis 2. Retrieved Mar 1, 2016, from, 136–144. Notes from Spring 2015 using R: ADA2_notes_S15.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at Notes from Spring 2014 using R: ADA2_notes_S14.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at Notes from Spring 2013 using R: ADA2_notes_S13.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at Notes from Spring 2012 using SAS: ADA2_notes_S12.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at
R tutorials: TryR (gentle), Kelly Black Cookbook for R for helpful examples, visualization tutorials, diagrams.  

Table of selected statistical methods

The data and design determines which method you use: original or UCLA. Here’s a table of methods with the applicable semester of ADA and Chapter.
Number of Dependent Variables Number of Independent Variables Type of Dependent Variable(s) Type of Independent Variable(s) Measure Test(s) ADA-Ch
1 0 (1 population) continuous normal not applicable (none) mean one-sample t-test 1-02
continuous non-normal median one-sample median 1-06
categorical proportions Chi Square goodness-of-fit, binomial test 1-07
1 (2 independent populations) normal 2 categories mean 2 independent sample t-test 1-03
non-normal medians Mann Whitney, Wilcoxon rank sum test 1-06
categorical proportions Chi square test Fisher’s Exact test 1-07
0 (1 population measured twice) or 1 (2 matched populations) normal not applicable/ categorical means paired t-test 1-02
non-normal medians Wilcoxon signed ranks test 1-06
categorical proportions McNemar, Chi-square test 1-07
1 (3 or more populations) normal categorical means one-way ANOVA 1-05
non-normal medians Kruskal Wallis 1-06
categorical proportions Chi square test 1-07
2 or more (e.g., 2-way ANOVA) normal categorical means Factorial ANOVA 2-05
non-normal medians Friedman test not
categorical proportions log-linear, logistic regression 2-11
0 (1 population measured 3 or more times) normal not applicable means Repeated measures ANOVA not
1 normal continuous correlation, simple linear regression 1-08
non-normal non-parametric correlation 1-08
categorical categorical or continuous logistic regression 2-11
continuous discriminant analysis 2-16
2 or more normal continuous multiple linear regression 2-02
categorical logistic regression 2-11
normal mixed categorical and continuous Analysis of Covariance, General Linear Models (regression) 2-09
categorical logistic regression 2-11
2 2 or more normal categorical MANOVA 2-15
2 or more 2 or more normal continuous multivariate multiple linear regression not
2 sets of 2 or more 0 normal not applicable canonical correlation not
2 or more 0 normal not applicable factor analysis not
0 or more mixed categorical and continuous principal component analysis (w/multiple regression) 2-13
categorical cluster analysis 2-13
discriminant analysis 2-16
classification 2-17

Acumen in Statistics