UNM Stat 428/528: Advanced Data Analysis II (ADA2)
Table of Contents
Spring 2021 Syllabus is below tables
 Spring 2021
 Time: None/Always (Remote Arranged)
 Location: Zoom
 Stat 428.001, CRN 33933; Stat 528.001, CRN 33935
Step 0
Before our first “class” (Mon 1/18/21) please read through the following actions and install the required software on your computer. Install R (windows or mac) or upgrade, then RStudio.
 Install R video (5 min).
 Install R packages, also update all packages within RStudio.
 Set up your computer
 RStudio disable notebook
 Operating system to be more friendly to programming.
Goal
Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (multiple regression, analysis of covariance, logistic regression, and multivariate methods), which you’ll be proud to present (poster).
Course content
Weekly structure
(also see “Assessment” below) Preparation (Tuesday): Reading, Video, Quiz due Tue 11:50 PM.
 Worksheet 1 (Tuesday): Assignment due by Fri 11:50 PM.
 Worksheet 2 (Thursday): Assignment due by Mon 11:50 PM.
 UNM Learn for taking quizzes (graded automatically) and submitting assignments (evaluated by TA within 1 week).
 Lectures: YouTube Video playlist (try 1.5 speed, then pause and review as needed).
 Assignments: YouTube Video playlist (walkthrough of each assignment)
Course notes, code, data, and video lectures
Notes from Spring 2020: ADA2_notes_S20.pdf includes all chapters in one document. Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf.Ch  Chapter Title  Notes  R code  Datasets  Video lectures playlist 

01  R statistical software and review  R  turkey.csv, rocket.dat  (Videos based on S16 notes) 011, 012  
02  Introduction to Multiple Linear Regression  R  indian.dat, gce.dat  021, 022  
03  A Taste of Model Selection for Multiple Regression  R  ratliver.csv  031, 032  
04  One Factor Designs and Extensions  R  none  04  
05  Paired Experiments and Randomized Block Experiments  R  battery.dat, beetles.dat, itch.csv, ratinsulin.dat  050 051 052 053 054 055 056 057 058 059  
06  A Short Discussion of Observational Studies  R  sat.csv  06  
07  Analysis of Covariance: Comparing Regression Lines  R  tools.dat, toolsfake.dat, twins.dat  071 072 073 HW helper video  
08  Polynomial Regression  R  cloudpoint.dat, mooney.dat  081 082  
09  Discussion of Response Models with Factors and Predictors  R  faculty.dat  091 092 093  
10  Automated Model Selection for Multiple Regression  R  oxygen.dat  101 102 103  
11  Logistic Regression  R  beetles.dat, leuk.dat, menarche.csv, shuttle.csv, trauma.dat  111 112 113 114  
12  An Introduction to Multivariate Methods  R  none  12  
13  Principal Component Analysis  R  bgs.dat, shells.dat, sparrows.dat, temperature.dat  131 132 133  
14  Cluster Analysis  R  birthdeath.dat, teeth.dat  141 142 143  
15  Multivariate Analysis of Variance  R  shells_mf.dat  15  
16  Discriminant Analysis  R  mower.dat  161 162  
17  Classification  R  business.dat  171 172 173  
18  Data Cleaning  R  conversions.txt, dalton.txt, dirty_iris.csv, edits.txt, people.txt, unnamed.txt 
(I reserve the right to continue to improve the materials throughout the semester.)
Timetable
Date  Class  Topic  Reading, Video, Quiz  class Worksheet, Data 

01/10  00  Install software 


01/19  01  01 R statistical software and review  
01/21  02 


01/26  03  02 Introduction to Multiple Linear Regression  
01/28  04  
02/02  05  03 A Taste of Model Selection for Multiple Linear Regression  
02/04  06  04 Experimental Design: One and Two Factor Designs  
02/09  07  05 Paired Experiments and Randomized Block Designs  
02/11  08  
02/16  09  
02/18  10  
02/23  11  06 Discussion of Observational Studies  
02/25  12  07 Analysis of Covariance: Comparing Regression Lines  
03/02  13  08 Polynomial Regression  
03/04  14  09 Response Models with Factors and Predictors 


03/09  15  10 Model Selection for Multiple Regression  
03/11  16  
03/16  Spring Break  
03/18  Spring Break  
03/23  17  11 Logistic Regression  
03/25  18  
03/30  19  12 An Introduction to Multivariate Methods 13 Principal Components Analysis (PCA)  
04/01  20  
04/06  21  PCA, continued  
04/08  22  
04/13  23  14 Cluster Analysis  
04/15  24  
04/20  25  15 Multivariate Analysis of Variance (MANOVA) 


04/22  26  
04/27  27  16 Discriminant Analysis 17 Classification  
04/29  28  
05/04  29  13+11+17 PCA and logistic regression classification 


05/06  30  
05/11  FINALS WEEK  (no final)  Surveys Due — submit receipt or confirmation page to UNM Learn * Learning Studio * EvalKit in Learn 
Syllabus
 Description: A continuation of 427/527 that focuses on methods for analyzing multivariate data and categorical data. Topics include MANOVA, principal components, discriminant analysis, classification, factor analysis, analysis of contingency tables including loglinear models for multidimensional tables and logistic regression.
 Prerequisite: Stat 427/527 (ADA1)
 Semesters offered: Spring
 Lecture: Stat 428/528.001 (CRN 33933 or 33935) (usually: TR 15301645, CTLB 300 Video)
 Email: Please include “ADA2” in the subject line of all emails.
Instructors
 Professor
 Erik Erhardt <erike@stat.unm.edu>, he/him
 Teaching Assistants
 Ola Anifowoshe <oanifowoshe@unm.edu>, he/him
 Mohammad Ahmadi <mahmadi@unm.edu>, he/him
 Peer Learning Facilitators (PLF)
 Pratap Khattri <pkhattri@unm.edu>, he/him
Office hours
See email “ADA2, Stat 428/528, Announcements” from 1/21/21 for Zoom links and instructions.Time  Mon  Tue  Wed  Thu  Fri  Sat  Sun 
8 AM  
9 AM  
10 AM  OA  PK  PK  
11 AM  OA  PK  PK  PK  
12 PM  MA  PK  
1 PM  EE  EE  EE  
2 PM  EE  OA  EE  MA  EE  
3 PM  OA  OA  
4 PM  MA  OA  
5 PM  
6 PM  PK  
7 PM  PK  
8 PM  
9 PM 
 We are also all available by appointment by email if these many hours do not work for you.
Student learning outcomes
Similar to ADA1, but at a higher level.Assessment
 Quizzes will be due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade.) Most weeks plan for 12 hours reading and video, 20minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
 Viewing quiz solutions after the due date in UNM Learn is not intuitive. Click on the “Begin” button (this is the nonintuitive part since you are not actually beginning the quiz), then click “View All Attempts” to see the scores. Finally, click the score in the “Calculated Grade” column to see the feedback for each question of the quiz.
 Worksheet assignments. Purpose: to struggle and find success in class with the concepts and skills. (About 24, includes class participation, 78% of final grade) Most weeks plan to finish in class.
 Course surveys are due at the end of the course (EvalKit). (About 1, 2% of final grade.)
 The lowest 2weeks worth of assignments are dropped, so your lowest 2 quizzes and 4 worksheet assignments are not included in the calculation of your grade.
Submission
 All assignments in this class are electronic, submitted to UNM Learn. For all submissions: (1) In RMarkdown, knit Rmd file to HTML, (2) Open HTML file in your internet browser, (3) Print HTML to pdf file, (4) Submit pdf to UNM Learn.
 Browser choice: Chrome is the best browser choice. On a Mac, Safari adds “.txt” to RMarkdown files when downloaded, and Firefox sometimes fails on upload of a pdf to UNM Learn.
 Late assignments will not be accepted.
 Rubrics guide assessment (and selfassessment) of homework, code, projects, exams, and presentations. Each assignment will have its own specific rubric.
 The use of R and RMarkdown are required for the course. This will include all of the R code for the assignment with the part of the problem it addresses in a fixedwidth font and syntax highlighting. You will weave your code with prose narrations of your work and solutions.
Collaboration and citation
 For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).
 As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free to discuss your problems with other classmates or meet with or email questions to the TAs or me.
 I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL into your code as a comment (which is doubly helpful to you for finding the resource again).
Statements
Accommodations
 In accordance with University Policy 2310 and the Americans with Disabilities Act (ADA), academic accommodations may be made for any student who notifies the instructor of the need for an accommodation. It is imperative that you take the initiative to bring such needs to the instructor’s attention, as I am not legally permitted to inquire. Students who may require assistance in emergency evacuations should contact the instructor as to the most appropriate procedures to follow. Contact Accessibility Resource Center at 2773506 for additional information.
 UNM is committed to providing courses that are inclusive and accessible for all participants. As your instructor, it is my objective to facilitate an accessible classroom setting, in which students have full access and opportunity. If you are experiencing physical or academic barriers, or concerns related to mental health, physical health and/or COVID19, please consult with me after class, via email/phone or during office hours. You are also encouraged to contact the Accessibility Resource Center at arcsrvs@unm.edu or by phone 2773506.
Credithours
 This is a threecredithour course. Class meets for two 75minute sessions of direct instruction for fifteen weeks during the semester. Students are expected to complete a minimum of six hours of outofclass work (or homework, study, assignment completion, and class preparation) each week.
Title IX statement
 In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see page 15 of https://www2.ed.gov/about/offices/list/ocr/docs/qa201404titleix.pdf) requires that any report of gender discrimination that includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity (https://oeo.unm.edu). For more information on the campus policy regarding sexual misconduct, see: https://policy.unm.edu/universitypolicies/2000/2740.html
Citizenship and/or Immigration Status
 All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergencyrelated absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on our website: http://undocumented.unm.edu/.
Support in Receiving Help and in Doing What is Right
 I encourage students to be familiar with services and policies that can help them navigate UNM successfully. Many services exist to help you succeed academically and to find your place at UNM, see students.unm.edu or ask me for information about the right resource center or person to contact. UNM has important policies to preserve and protect the academic community, especially policies on student grievances (Faculty Handbook D175 and D176), academic dishonesty (FH D100), and respectful campus (FH CO9). These are in the Student Pathfinder (https://pathfinder.unm.edu) and the Faculty Handbook (https://handbook.unm.edu) Please ask for help in understanding and avoiding plagiarism or academic dishonesty, which can both have very serious disciplinary consequences.
Land Acknowledgement
 Founded in 1889, the University of New Mexico sits on the traditional homelands of the Pueblo of Sandia. The original peoples of New Mexico Pueblo, Navajo, and Apache since time immemorial, have deep connections to the land and have made significant contributions to the broader community statewide. We honor the land itself and those who remain stewards of this land throughout the generations and also acknowledge our committed relationship to Indigenous peoples. We gratefully recognize our history.
Our Classroom
We’re doing this because: We want you to be empowered with statistics.
 We believe everyone should get out of this course with awesome skills
 Realtime feedback promotes efficient learning
GAISE Connections
Our six recommendations include the following: Emphasize statistical literacy and develop statistical thinking
 Use real data
 Stress conceptual understanding, rather than mere knowledge of procedures
 Foster active learning in the classroom
 Use technology for developing conceptual understanding and analyzing data
 Use assessments to improve and evaluate student learning
Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius
Archive
Passion Driven Statistics (PDS) data
 Install PDS package.
 AddHealthW1 Sampling Design, Codebook, RData.
 AddHealthW4 Sampling Design, Codebook, RData.
 NESARC Sampling Design, Codebook, RData.
 OutlookOnLife Sampling Design, Codebook, RData.
 GapMinder Sampling Design, Codebook, RData.
Step 0
Before our first class (Tue 1/21) please read through the following actions and install the required software on your computer and complete the brief survey. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open. Video for this process (ignore the “crowdgrader” portion). Complete surveys
 a short Opinio presurvey required for classroom assessment (1/20 – 2/1/2020).
 Install R (windows or mac) or upgrade, then Rstudio. Videos that may be helpful:
 Install R on Mac (2 min).
 Install R for Windows (3 min).
 Install R and RStudio on Windows (5 min).
 Install R packages,
 Run RStudio
 Run code in R packages.
 Update all packages, RStudio Packages tab, click “update”, click “select all”, and “Install Updates”. Say “Yes” to restart R, but if it asks a second time, say “No”. Say “No” to “install from sources” if it asks.
 Set up your computer
 RStudio disable notebook
 Operating system to be more friendly to programming.
 (Postpone until later: Install LaTeX (for poster at end of the semester).)
RMarkdown and knitr issues
 R errors, unresolved, and out of time If you’re saying: “An error while knitting keeps me from turning in the assignment…”, then use code chunk option
```{r, error = TRUE}to ignore the error and continue. This will allow you to turn in partial assignments with errors.
 Unicode compile problems: If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”. ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break. You get unwanted Unicode when you copy/paste from a pdf or some other source into your code. To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent. To do this: CtrlF to find, search for “[^\x00\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one. As it finds instances, replace the characters manually until there are no more. These characters will typically be curly quotes or fancy dashes.
Precourse todos
Did you receive a registration error for Spring 2021? Send me an email with the following answers: What registration error did you get (copy/paste is best)?
 What is your UNM ID?
 What is your Math/Stat background (that is, do you have the prerequisites)?
3/1/17 – Data resources for poster:
 List of 50+
 kaggle
 drivendata
 538
 agridat package
 wise data sources
 statsci datasets
 vanderbilt datasets
Citing and using notes, including previous editions
Citing lecture notes: Erhardt EB, Bedrick EJ, and Schrader RM. (2020) Lecture notes for Advanced Data Analysis 2. Retrieved Mar 1, 2020, from statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf, 136–144. Notes from Spring 2020 using R with tidyverse: ADA2_notes_S20.pdf includes all chapters in one document. Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf.
 Notes from Spring 2017 using R: ADA2_notes_S17.pdf
 Notes from Spring 2016 using R: ADA2_notes_S16.pdf
 Notes from Spring 2015 using R: ADA2_notes_S15.pdf
 Notes from Spring 2014 using R: ADA2_notes_S14.pdf
 Notes from Spring 2013 using R: ADA2_notes_S13.pdf
 Notes from Spring 2012 using SAS: ADA2_notes_S12.pdf