UNM Stat 427/527: Advanced Data Analysis I (ADA1)
Fall 2020 Syllabus is below tables
Fall 2020
Time: None/Always (Remote Arranged)
Location: Zoom
Stat 427.001, CRN 59508; Stat 527.001, CRN 59509
COVID19 Year
Remote Arranged: A fully remote course in which all components are delivered remotely and there are no set times for facetoface or remote meetings. Coursework will be done remotely and your coursework for a given day or week, such as viewing lectures and completing modules, can be completed online within deadlines set by the instructor.
Step 0
Before our first “class” (Fri 8/21) please read through the following actions and install the required software on your computer.
 Install R (windows or mac) or upgrade, then RStudio.
 Install R video (5 min).
 Install R packages, also update all packages within RStudio.
 Set up your computer
 RStudio disable notebook
 Operating system to be more friendly to programming.
Goal
Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (all one and twovariable methods), which you’ll be proud to present (poster).
Course content
Weekly structure
(also see “Assessment” below)
 Preparation (Tuesday): Reading, Video, Quiz due Tue 11:59 PM.
 Worksheet 1 (Tuesday): Assignment due by Fri 11:59 PM.
 Worksheet 2 (Thursday): Assignment due by Mon 11:59 PM.
UNM Learn for taking quizzes (graded automatically) and submitting assignments (evaluated by TA within 1 week).
Lectures: YouTube Video playlist (try 1.5 speed, then pause as needed).
Assignments: YouTube Video playlist
Course notes, code, data, and video lectures
Second text: PDS Textbook
Notes from Fall 2019: ADA1_notes_F19.pdf includes all chapters in one document.
Citing lecture notes example: Erhardt EB, Bedrick EJ, and Schrader RM. (2019) Lecture notes for Advanced Data Analysis 1. Retrieved Sep 1, 2019, from statacumen.com/teach/ADA1/notes/ADA1_notes.pdf, 136–144.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA1/notes/ADA1_notes_F19.pdf.
Ch  Chapter Title  Notes  R code  Datasets  Video lectures playlist  Helper videos 

00  Introduction to R, Rstudio, and ggplot  R  001 002  markdown, 01 PDS codebook, 01 HW codebook, 02 HW Lit review 

01  Summarizing and Displaying Data  R  011  03 HW 03 subset  
02  Estimation in OneSample Problems  R  021 022 023  
03  TwoSample Inferences  R  031 032 033  
04  Checking Assumptions  R  041  
05  OneWay Analysis of Variance  R  CHDS dat desc  051 (no videos recorded) 

06  Nonparametric Methods  R  061 onesample, 062 paired, 063 twosample, 064 ANOVA, 065 perm test.  
07  Categorical Data Analysis  R  071 intro, 072 single prop, 073 GOFtest, 074 two prop & cond prob, …  
08  Correlation and Regression  R  BodyMass dat desc pdf  081 corr/log, 082 corr hyp test, 083 LS reg eq, 084 085  
09  Introduction to the Bootstrap  R  091  
10  Power and Sample size  R  101  
11  Data Cleaning  R  111  14 HW to poster  
12  ADA2 Ch 11 Logistic Regression  R  121 122 123 124  Upgrading R on Windows 
ada_functions.R function for a large set of standard diagnostic plots.
PassionDriven Statistics (PDS) data
 NESARC Sampling Design, Codebook, RData. Alcohol abuse and related conditions.
 Unique ID “IDNUM”.
 A few numeric variables: AGE (in data, not in codebook), …
UNM Learn for taking quizzes (graded automatically) and submitting assignments (evaluated by TA within 1 week).
Lectures: YouTube Video playlist (try 1.5 speed, then pause as needed).
Assignments: YouTube Video playlist
Erik’s example homework document: NESARC data, nicotine and depression.
Use these files as a model for your assignments: .Rmd + .bib = .html.
These are the files that Erik develops in the assignment videos: Rmd html
Timetable
Date  Class  Topic  Reading, Video, Quiz  class Worksheet, Data 

08/17  00  Install software, survey  Step 0 (above)  
08/18  01  
08/20  02  RMarkdown  read: PDS Ch 3; video: Rmd, Ch 23 (Only 9:55 – 15:15 PDS Data), Med records (Ignore “crowdgrader” in last minute) 
01a Medical records Rmd html Download Rmd file to your computer, open in RStudio, edit it, print HTML to pdf, turn in assignment by Monday midnight to UNM Learn.Class 02, Medical Records (separate)Due M 08/24 
08/25  03  Codebook  video: 01 Personal codebook; read: PDS Ch 23; quiz: 02 codebook (see additional codebook link under Quizzes in Learn)Quiz 03, CodebookDue F 08/25 
Quiz due Thu 8/27, and “literature review” questions won’t be graded. 
08/27  04  Class 04, Personal Codebook Rmd html
(Find this assignment contained the the Outline below) video: CL04 Due M 08/31 

09/01  05  R programming, data subset and numerical summaries  read: Ch 00 R, Ch 01 R; video: Ch 00 p1, Ch 00 p2, Ch 01; quiz: 03 programming, univariateQuiz 05, Plotting univariateDue T 09/01 
ADA1 ALL Outline file Rmd html Start using this Rmd file. All of your assignments will be written in this file.Read dataset in R, create subset of data, rename variables, numerical summaries.Class 05, Data subset and numerical summariesvideo: CL05a, CL05b, CL05c Due F 09/04 
09/03  06  Plotting univariate  (Due date one day later for Labor Day)  Class 06, Plotting univariate
video: CL06a, CL06b, CL06c, CL06d T 09/08 
09/08  07  Plotting bivariate, numeric response  read: PDS Ch 9, Ch 00 R; quiz: quizQuiz 07, Plotting bivariateDue T 09/08 
Class 07, Plotting bivariate, numeric response
Due F 09/11 
09/10  08  Plotting bivariate, categorical response  Class 08, Plotting bivariate, categorical response
Due M 09/14 

09/15  09  Simple linear regression, intro  read: Ch 8.4, 8.2 R; video: 081 corr/log, 083 LS reg eq; quiz: quizQuiz 09, Simple linear regression, Logarithm transformationDue T 09/15 
Rmd html dat Build intuition using SLR App, interpret properties of linear regression fit.Class 09, Simple linear regression (separate)video: CL09 Due F 09/18 
09/17  10  Class 10, Simple linear regression
video: CL10 Due M 09/21 

09/22  11  Logarithm transformation  (novel example)
Quiz 11, (NONE) 
Rmd html dat Plot, transform, plot, and interpret. video: CL11 Class 11, Logarithmic transformation, intro (separate) Due F 09/25 
09/24  12  Class 12, Logarithmic transformation
video: CL12 Due M 09/28 

09/29  13  Correlation  read: Ch 8.1, 8.3.1 R, Ch 7.5.1 only sections on “conditional probability” and the following example R video: 081 corr/log, 082 corr hyp test, 074 two prop & cond prob; quiz: Quiz 13, Correlation, Categorical contingency table Due T 09/29 
Rmd html dat1 dat2
Class 13, Correlation , intro (separate) video: CL13 Due F 10/02 
10/01  14  Categorical contingency tables  quiz 06b, Guess Ages (for next inclass)  Inclass: Rmd html d1 Interpret condition proportions in two examples. Simpson’s ParadoxClass 14, Categorical contingency table (separate)Due M 10/05 
10/06  15  Quiz 15, (NONE)  Class 15, Correlation and Categorical contingency tables
Due M 10/12 

10/08  Fall Break 1/2 (Wed 10/7)  Spurious Correlations  BBC Radio 4: More or Less, “sampling” 9 min audio  
10/13  16  Parameter estimation (onesample)  read: Ch 2.12.2 R; video: see table above; quiz: quizQuiz 16, Inference and Parameter estimationDue T 10/13 
Inclass: Rmd html Guess Ages, Legos. (Legos part 2 Rmd html dat, diagram).Class 16, Parameter estimation (onesample) (separate)Due F 10/16 
10/15  17  Inclass: Rmd html Water on Earth.Class 17, Inference and Parameter estimation (onesample)Due M 10/19 

10/20  18  Hypothesis testing (twosample)  read: Ch 2.3end R Ch 3 R; video: see table above; quiz: quizQuiz 18, Hypothesis testingDue T 10/20 
Inclass: Rmd html one and twosample tests using data we collected in class.Class 18, Hypothesis testing (one and twosample) (separate)Due F 10/23 
10/22  19  Paired data, assumption assessment  Class 19, Paired data, assumption assessment (separate)
Due M 10/26 

10/27  20  ANOVA, posthoc comparisons  read: Ch 2.2.1, Ch 3.4 & 3.6, Ch 4, Ch 5; video: see table above; quiz: quizQuiz 20, ANOVA, Pairwise comparisonsDue T 10/27 
Inclass: Rmd html Paired data and checking model assumptions.Class 20, Hypothesis testing (one and twosample)Due F 10/30 
10/29  21  Inclass: Rmd html ANOVA, model assumptions, and paired comparisons.Class 21, ANOVA, Pairwise comparisons (separate)Due M 11/02 

11/03  Fall Break 2/2 (Tue 11/03)  
11/05  22  Quiz 22, (NONE)  Class 22, ANOVA and Assessing Assumptions
Due M 11/09 

11/10  23  Nonparametric methods  read: Ch 6, Ch 7.27.4, Ch 10; video: see table above; quiz: quizQuiz 23, Nonparametric methods, Binomial and Multinomial testsDue T 11/10 
Inclass: Rmd html NP onesample tests and CIs, and ANOVA with pairwise comparisons.Class 23, Nonparametric methods (separate)Due F 11/13 
11/12  24  Binomial and multinomial proportion tests  Inclass: Rmd html dat Popular kids.Class 24, Binomial and Multinomial tests (separate)Due M 11/16 

11/17  25  Twoway categorical tables  read: Ch 7.8end, Ch 8.58.7; video:; quiz: quizQuiz 25, Twoway categorical tablesDue T 11/17 
Class 25, Twoway categorical tables (separate)
Due F 11/20 
11/19  26  Simple linear regression, inference  Inclass: Rmd html Regression of height vs hand span using data from our class.Class 26, Simple linear regression (separate)Due M 11/23 

11/24  27  Quiz 27, (NONE)  Class 27, Twoway categorical and simple linear regression
Due M 11/30 

11/26  Thanksgiving break  Summary of Methods we’ve covered  
12/01  28  Logistic regression, intro  read: ADA2 Ch 11.13, 11.6, PDS Ch 16; video:; quiz: quizQuiz 28, Logistic regressionDue T 12/01 
Inclass: Rmd html AddHealth W4 Pregnancy.Class 28, Logistic regression (separate)Due F 12/04 
12/03  29  Inclass: Course evaluation, submit receipt (capture screen image) as inclass assignment.
Class 29, Logistic regression Due M 12/07


12/06  Finals week  (no final)  Congratulations on a great semester! 
(I reserve the right to continue to modify the schedule and improve the materials throughout the semester.)
Syllabus
Description: Statistical tools for scientific research, including parametric and nonparametric methods for ANOVA and group comparisons, simple linear and multiple linear regression, and basic ideas of experimental design and analysis. Emphasis placed on the use of statistical packages such as R. Course cannot be counted in the hours needed for graduate degrees in Mathematics and Statistics.
Prerequisite: Math 1350 [Stat 145] (or other intro stats course)
Semesters offered: Fall
Lecture: Stat 427.001, CRN 59508; Stat 527.001, CRN 59509; TR 15301645; Location: Zoom
Instructors
Please include “ADA1” in the subject line of all emails.
Professor
Erik Erhardt <erike@stat.unm.edu>, he/him
Teaching Assistants
Ola Anifowoshe <oanifowoshe@unm.edu>, he/him
Jonathan Emery <jemery2016@unm.edu>, he/him
Peer Learning Facilitators (PLF)
John Romero <johnromero14@unm.edu>, he/him
Pratap Khattri <pkhattri@unm.edu>, he/him
Coby Segay <csegay@unm.edu>, he/him
Jacob Matthew Moya <jmoya67@unm.edu>, he/him
Office hours
See email “ADA1, Stat 427/527, Announcements” from 8/23/20 for Zoom links and instructions.
Time  Mon  Tue  Wed  Thu  Fri  Sat  Sun 
8 AM  
9 AM  
10 AM  JM  
11 AM  OA  OA  OA  CS  
12 PM  OA  OA  OA  PK  
1 PM  EE  EE  EE  
2 PM  EE  EE  EE  JM  JM  
3 PM  JE  JR  JE  JR  JM  JM  
4 PM  CS  JE  JR  JE  JR  
5 PM  CS  JE  JR  JE  
6 PM  PK  PK  
7 PM  PK  CS  PK  
8 PM  CS  
9 PM 
 We are also all available by appointment by email if these many hours do not work for you.
Student learning outcomes
At the end of the course, you will be able to: (student results: R, all years, 2015, 2014, 2013, 2012)
General outcomes:
 Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment.
 Distinguish a testable scientific hypothesis or datasupported interpretation from an opinion.
 Understand from a data story the goals of the study and apply the correct statistical procedure.
 Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.
Topical outcomes:
 Define parameters of interest and hypotheses in words and notation.
 Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, fivenumber summary, confidence intervals, and pvalues, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
 Distinguish between statistical significance and scientific relevance.
 Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
 Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
 Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
 Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
 Make evidencebased decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
 Discover relationships and make predictions through model development and selection.
Meeting the learning outcomes
You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.
Learning in this class occurs by:
 Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
 Discussion – interaction with classmates to assemble and synthesize information utilizing the collective skills and knowledge base of the group.
 Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Notetaking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.
Assessment
 Quizzes will be due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade.) Most weeks plan for 12 hours reading and video, 20minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
 Viewing quiz solutions after the due date in UNM Learn is not intuitive. Click on the “Begin” button (this is the nonintuitive part since you are not actually beginning the quiz), then click “View All Attempts” to see the scores. Finally, click the score in the “Calculated Grade” column to see the feedback for each question of the quiz.
 Worksheet assignments. Purpose: to struggle and find success in class with the concepts and skills. (About 24, includes class participation, 78% of final grade) Most weeks plan to finish in class.
 Course surveys are due at the end of the course (EvalKit). (About 1, 2% of final grade.)
 The lowest 2weeks worth of assignments are dropped, so your lowest 2 quizzes and 4 worksheet assignments.
Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.98 for graduate students and 0.95 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95] so that your grade is slightly higher than you earned.
All assignments in this class are electronic, submitted to UNM Learn. For all submissions: (1) In RMarkdown, knit Rmd file to HTML, (2) Open HTML file in your internet browser, (3) Print HTML to pdf file, (4) Submit pdf to UNM Learn.
Browser choice: Chrome is the best choice. On a Mac, Safari adds “.txt” to RMarkdown files when downloaded, and Firefox sometimes fails on upload of a pdf to UNM Learn.
Late assignments will not be accepted.
Rubrics guide assessment (and selfassessment) of homework, code, projects, exams, and presentations. Each assignment will have its own specific rubric.
Use of R and RMarkdown are required for the course. This will include all of the R code for the assignment with the part of the problem it addresses in a fixedwidth font and syntax highlighting. You will weave your code with prose narrations of your work and solutions.
Collaboration and citation
For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to submit substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).
As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free to discuss your problems with other classmates or meet with or email questions to me or the TAs.
I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again). You won’t be the first person to do anything in this class, so give credit where it’s due.
Statements
Accommodations
In accordance with University Policy 2310 and the Americans with Disabilities Act (ADA), academic accommodations may be made for any student who notifies the instructor of the need for an accommodation. It is imperative that you take the initiative to bring such needs to the instructor’s attention, as I am not legally permitted to inquire. Students who may require assistance in emergency evacuations should contact the instructor as to the most appropriate procedures to follow. Contact Accessibility Resource Center at 2773506 for additional information.
UNM is committed to providing courses that are inclusive and accessible for all participants. As your instructor, it is my objective to facilitate an accessible classroom setting, in which students have full access and opportunity. If you are experiencing physical or academic barriers, or concerns related to mental health, physical health and/or COVID19, please consult with me after class, via email/phone or during office hours. You are also encouraged to contact the Accessibility Resource Center at arcsrvs@unm.edu or by phone 2773506.
Credithours
This is a threecredithour course. Class meets for two 75minute sessions of direct instruction for fifteen weeks during the semester. Students are expected to complete a minimum of six hours of outofclass work (or homework, study, assignment completion, and class preparation) each week.
Title IX statement
In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see page 15 of https://www2.ed.gov/about/offices/list/ocr/docs/qa201404titleix.pdf) requires that any report of gender discrimination that includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity (https://oeo.unm.edu). For more information on the campus policy regarding sexual misconduct, see: https://policy.unm.edu/universitypolicies/2000/2740.html
Citizenship and/or Immigration Status
All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergencyrelated absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on our website: http://undocumented.unm.edu/.
Support in Receiving Help and in Doing What is Right
I encourage students to be familiar with services and policies that can help them navigate UNM successfully. Many services exist to help you succeed academically and to find your place at UNM, see students.unm.edu or ask me for information about the right resource center or person to contact. UNM has important policies to preserve and protect the academic community, especially policies on student grievances (Faculty Handbook D175 and D176), academic dishonesty (FH D100), and respectful campus (FH CO9). These are in the Student Pathfinder (https://pathfinder.unm.edu) and the Faculty Handbook (https://handbook.unm.edu) Please ask for help in understanding and avoiding plagiarism or academic dishonesty, which can both have very serious disciplinary consequences.
Land Acknowledgement
Founded in 1889, the University of New Mexico sits on the traditional homelands of the Pueblo of Sandia. The original peoples of New Mexico Pueblo, Navajo, and Apache since time immemorial, have deep connections to the land and have made significant contributions to the broader community statewide. We honor the land itself and those who remain stewards of this land throughout the generations and also acknowledge our committed relationship to Indigenous peoples. We gratefully recognize our history.
Our Classroom
We’re doing this because:
 We want you to be empowered with statistics.
 We believe everyone should get out of this course with awesome skills
 Realtime feedback promotes efficient learning
“It encourages me to engage actively with the course material and to take responsibility for my learning.”
GAISE Connections
Our six recommendations include the following:
 Teach statistical thinking.
 Teach statistics as an investigative process of problemsolving and decision making.
 Give students experience with multivariable thinking.
 Focus on conceptual understanding.
 Integrate real data with a context and purpose.
 Foster active learning.
 Use technology to explore concepts and analyze data.
 Use assessments to improve and evaluate student learning.
Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
– Confucius
Archive
Precourse todos
Did you receive a registration error for Fall 2020? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the prerequisites)?
If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.
Course introduction materials
Problems installing PDS package? Solution.
If you had problems installing the PDS package, no problem; here’s how to get the data:
1. Download the “.RData” file above for your dataset.
2. Where I have “library(PDS)” in my code, change it to the two lines below. You’ll need to update the “PATH_TO_FILE” below to the path on your computer’s hard drive, and “filename” needs to be changed to the name of the file. This will directly read the data file.
# library(PDS) setwd("/PATH_TO_FILE") load("filename.RData")
Unicode compile problems: If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”. ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break. You get unwanted Unicode when you copy/paste from a pdf or some other source into your code. To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent. To do this: CtrlF to find, search for “[^\x00\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one. As it finds instances, replace the characters manually until there are no more. These characters will typically be curly quotes or fancy dashes.