UNM Stat 428/528: Advanced Data Analysis II (ADA2)
Spring 2023 The syllabus is below the tables.- Spring 2023
- Time: 9:30-10:45 AM
- Location: CTLB 300
- Stat 428.001, CRN 33933; Stat 528.001, CRN 33935
Goal
Learn to produce beautiful (markdown) and reproducible (quarto) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (multiple regression, analysis of covariance, logistic regression, and multivariate methods), which you’ll be proud to present (poster).

Pre-course to-dos
Did you receive a registration error for Spring 2023? Send me an email with the following answers:- What registration error did you get (copy/paste is best)?
- What is your UNM ID?
- What is your Math/Stat background (that is, do you have the prerequisites)?
Step 0
Before our first “class” (Mon 1/16/23) please read through the following actions and install the required software on your computer.- Install:
- Install R packages.
- Follow these instructions: R packages. (Ignore warning about rtools or any packages unavailable.)
- In RStudio, open Packages tab, click on “Update”, Select All, Install Updates (“No” to restart, “No” to compile from source).
- Install erikmisc package (also at the end of “Install R packages”, above).
- Submit these the two lines to the R console:
install.packages("devtools")
devtools::install_github("erikerhardt/erikmisc")
- If it asks to update packages (it should not ask this if you updated packages above), press 3 [Enter] for “None”.
- If asks about “make” command, click “Cancel”.
- If asks about “git” command, click “Cancel”.
- Make sure it works by printing the logo:
library(erikmisc)
erikmisc_logo()
- Submit these the two lines to the R console:
- Set up your computer
- RStudio disable notebook
- Operating system to be more friendly to programming.
type="binary"
option.
Course content
Weekly structure
(also see “Assessment” below)- Preparation (Tuesday): Reading, Video, Quiz due Tue 9:30 AM (before class).
- Worksheet 1 (Tuesday): Assignment due by Fri 11:50 PM.
- Worksheet 2 (Thursday): Assignment due by Mon 11:50 PM.
- This is the typical schedule; some dates may differ depending on the circumstance.
- UNM Canvas for taking quizzes (graded automatically) and submitting assignments (evaluated by TA within 1 week).
- Lectures: YouTube Video playlist (try 1.5 speed, pause and review as needed).
Assignments: YouTube Video playlist (walk-through of each assignment)
Course notes, code, data, and video lectures
Notes from Spring 2020: ADA2_notes_S20.pdf includes all chapters in one document.
Ch | Chapter Title | Notes | R code | Datasets | Video lectures playlist |
---|---|---|---|---|---|
01 | R statistical software and review | R | turkey.csv, rocket.dat | (Videos based on S16 notes) 01-1, 01-2 | |
02 | Introduction to Multiple Linear Regression | R | indian.dat, gce.dat | 02-1, 02-2 | |
03 | A Taste of Model Selection for Multiple Regression | R | ratliver.csv | 03-1, 03-2 | |
04 | One Factor Designs and Extensions | R | none | 04 | |
05 | Paired Experiments and Randomized Block Experiments | R | battery.dat, beetles.dat, itch.csv, ratinsulin.dat | 05-0 05-1 05-2 05-3 05-4 05-5 05-6 05-7 05-8 05-9 | |
06 | A Short Discussion of Observational Studies | R | sat.csv | 06 | |
07 | Analysis of Covariance: Comparing Regression Lines | R | tools.dat, toolsfake.dat, twins.dat | 07-1 07-2 07-3 HW helper video | |
08 | Polynomial Regression | R | cloudpoint.dat, mooney.dat | 08-1 08-2 | |
09 | Discussion of Response Models with Factors and Predictors | R | faculty.dat | 09-1 09-2 09-3 | |
10 | Automated Model Selection for Multiple Regression | R | oxygen.dat | 10-1 10-2 10-3 | |
11 | Logistic Regression | R | beetles.dat, leuk.dat, menarche.csv, shuttle.csv, trauma.dat | 11-1 11-2 11-3 11-4 | |
12 | An Introduction to Multivariate Methods | R | none | 12 | |
13 | Principal Component Analysis | R | bgs.dat, shells.dat, sparrows.dat, temperature.dat | 13-1 13-2 13-3 | |
14 | Cluster Analysis | R | birthdeath.dat, teeth.dat | 14-1 14-2 14-3 | |
15 | Multivariate Analysis of Variance | R | shells_mf.dat | 15 | |
16 | Discriminant Analysis | R | mower.dat | 16-1 16-2 | |
17 | Classification | R | business.dat | 17-1 17-2 17-3 | |
18 | Data Cleaning | R | conversions.txt, dalton.txt, dirty_iris.csv, edits.txt, people.txt, unnamed.txt |
(I reserve the right to continue to improve the materials throughout the semester.)
Timetable
Date | Class | Topic | Reading, Video, Quiz | class Worksheet, Data |
---|---|---|---|---|
01/16 | 00 | Install software |
|
|
01/17 | 01 | 01 R statistical software and review | ||
01/19 | 02 |
|
||
01/24 | 03 | 02 Introduction to Multiple Linear Regression | ||
01/26 | 04 | |||
01/31 | 05 | 03 A Taste of Model Selection for Multiple Linear Regression | ||
02/02 | 06 | 04 Experimental Design: One- and Two-Factor Designs | ||
02/07 | 07 | 05 Paired Experiments and Randomized Block Designs | ||
02/09 | 08 | |||
02/14 | 09 | |||
02/16 | 10 | |||
02/21 | 11 | 06 Discussion of Observational Studies | ||
02/23 | 12 | 07 Analysis of Covariance: Comparing Regression Lines | ||
02/28 | 13 | 08 Polynomial Regression | ||
03/02 | 14 | 09 Response Models with Factors and Predictors |
|
|
03/07 | 15 | 10 Model Selection for Multiple Regression | ||
03/09 | 16 | |||
03/14 | Spring Break | |||
03/16 | Spring Break | |||
03/21 | 17 | 11 Logistic Regression | ||
03/23 | 18 | |||
03/28 | 19 | 12 An Introduction to Multivariate Methods 13 Principal Components Analysis (PCA) | ||
03/30 | 20 | PCA, continued | ||
04/04 | 21 | 14 Cluster Analysis 15 Multivariate Analysis of Variance (MANOVA) | ||
04/06 | 22 | |||
04/11 | 23 | 16 Discriminant Analysis 17 Classification | ||
04/13 | 24 | |||
04/18 | 25 | 13+11+17 PCA and logistic regression classification | ||
04/20 | 26 | |||
04/25 | 27 | 10 Model Selection for Multiple Regression, revisited: ATUS data subset and model selection | ||
04/27 | 28 | MS Stat Qual exam, you can do it! | ||
05/02 | 29 | |||
05/04 | 30 |
|
||
05/09 | FINALS WEEK | (no final) | Congratulations on a great semester! |
Syllabus
- Description: A continuation of 427/527 that focuses on methods for analyzing multivariate data and categorical data. Topics include MANOVA, principal components, discriminant analysis, classification, factor analysis, analysis of contingency tables including log-linear models for multidimensional tables and logistic regression.
- Prerequisite: Stat 427/527 (ADA1)
- Semesters offered: Spring
- Lecture: Stat 428/528.001 (CRN 33933 or 33935) (TR 0930-1045, CTLB 300 Video)
- Email: Please include “ADA2” in the subject line of all emails.
Instructors
- Professor
- Erik Erhardt <erike@stat.unm.edu>, he/him
- Teaching Assistants
- Behzad FallahiFard <bfallahifard@unm.edu>, he/him
- Mingyue Liu <mingyueliu@unm.edu>, she/her
- Peer Learning Facilitators (PLF)
- Alexis P Amodio-Cardwell, she/her
Office hours
See email “ADA2, Stat 428/528, Announcements” from 1/14/23 for Zoom links and instructions.Time | Mon | Tue | Wed | Thu | Fri | Sat | Sun |
8 AM | |||||||
9 AM | Class | Class | |||||
10 AM | BF | Class | BF | Class | |||
11 AM | BF | EE | EE | ||||
12 PM | |||||||
1 PM | |||||||
2 PM | EE | EE | |||||
3 PM | EE 3:30 | ML | EE 3:30 | ||||
4 PM | ML | ML | |||||
5 PM | ML | BF | |||||
6 PM | ML | BF | |||||
7 PM | |||||||
8 PM | |||||||
9 PM |
- We are also all available by appointment by email if these many hours do not work for you.
Student learning outcomes
Similar to ADA1, but at a higher level.Assessment
- Quizzes will be due each Tuesday before class (for fully face-to-face semesters). Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 15% of final grade.) Most weeks plan for 1-2 hours reading and video, 20-minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
- Viewing quiz solutions after the due date in UNM Canvas
is not intuitive. Click on the “Begin” button (this is the non-intuitive part since you are not actually beginning the quiz), then click “View All Attempts” to see the scores. Finally, click the score in the “Calculated Grade” column to see the feedback for each question of the quiz.
- Viewing quiz solutions after the due date in UNM Canvas
- Worksheet assignments. Purpose: to struggle and find success in class with the concepts and skills. (About 21, includes class participation, 83% of final grade) Most weeks plan to finish in class.
Poster will be developed through semester (most assignments contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.) In the last couple weeks, assembling this poster may take 5-10 hours, using a template provided to you.- Course surveys are due at the end of the course (EvalKit). (About 2, 2% of final grade.)
- Roughly speaking, the lowest 2-weeks worth of assignments are dropped, so your lowest 2 quizzes and 4 worksheet assignments are not included in the calculation of your grade (this may not include full-week assignments).
Collaboration and citation
- For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write-up. We expect everyone to hand in substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).
- As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free to discuss your problems with other classmates or meet with or email questions to the TAs or me.
- I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL into your code as a comment (which is doubly helpful to you for finding the resource again).
Statements
COVID-19 Health and Awareness
UNM is a mask friendly, but not a mask required, community. To be registered or employed at UNM, Students, faculty, and staff must all meet UNM’s Administrative Mandate on Required COVID-19 vaccination. If you are experiencing COVID-19 symptoms, please do not come to class. If you have a positive COVID-19 test, please stay home for five days and isolate yourself from others, per the Centers for Disease Control (CDC) guidelines. If you do need to stay home, please communicate with me by email; I can work with you to provide alternatives for course participation and completion. UNM faculty and staff know that these are challenging times. Please let me, an advisor, or another UNM staff member know that you need support so that we can connect you to the right resources. Please be aware that UNM will publish information on websites and email about any changes to our public health status and community response. Support- Student Health and Counseling (SHAC) at (505) 277-3136. If you are having active respiratory symptoms (e.g., fever, cough, sore throat, etc.) AND need testing for COVID-19; OR If you recently tested positive and may need oral treatment, call SHAC.
- LoboRESPECT Advocacy Center (505) 277-2911 can offer help with contacting faculty and managing challenges that impact your UNM experience.
Accommodations
UNM is committed to providing equitable access to learning opportunities for students with documented disabilities. As your instructor, it is my objective to facilitate an inclusive classroom setting, in which students have full access and opportunity to participate. To engage in a confidential conversation about the process for requesting reasonable accommodations for this class and/or program, please contact Accessibility Resource Center at arcsrvs@unm.edu or by phone at 505-277-3506. Support: Contact me at by email or in office/check-in hours and contact Accessibility Resource Center (https://arc.unm.edu/) at arcsrvs@unm.edu (505) 277-3506.Credit-hours
This is a three-credit-hour course. Class meets for two 65-minute sessions of direct instruction for fifteen weeks during the Fall 2022 semester. Please plan for a minimum of six hours of out-of-class work (or homework, study, assignment completion, and class preparation) each week. Support: Center for Academic Program Support (CAPS). Many students have found that time management workshops can help them meet their goals (consult (CAPS) website under “services”).Title IX statement
Our classroom and our university should always be spaces of mutual respect, kindness, and support, without fear of discrimination, harassment, or violence. Should you ever need assistance or have concerns about incidents that violate this principle, please access the resources available to you on campus. Please note that, because UNM faculty, TAs, and GAs are considered “responsible employees” any disclosure of gender discrimination (including sexual harassment, sexual misconduct, and sexual violence) made to a faculty member, TA, or GA must be reported by that faculty member, TA, or GA to the university’s Title IX coordinator. For more information on the campus policy regarding sexual misconduct and reporting, please see: https://policy.unm.edu/Citizenship and/or Immigration Status
All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergency-related absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on our website: http://undocumented.Land Acknowledgement
Founded in 1889, the University of New Mexico sits on the traditional homelands of the Pueblo of Sandia. The original peoples of New Mexico Pueblo, Navajo, and Apache since time immemorial, have deep connections to the land and have made significant contributions to the broader community statewide. We honor the land itself and those who remain stewards of this land throughout the generations and also acknowledge our committed relationship to Indigenous peoples. We gratefully recognize our history.Respectful and Responsible Learning
We all have shared responsibility for ensuring that learning occurs safely, honestly, and equitably. Submitting material as your own work that has been generated on a website, in a publication, by an artificial intelligence algorithm, by another person, or by breaking the rules of an assignment constitutes academic dishonesty. It is a student code of conduct violation that can lead to a disciplinary procedure. Please ask me for help in finding the resources you need to be successful in this course. I can help you use study resources responsibly and effectively. Off-campus paper writing services, problem-checkers and services, websites, and AIs can be incorrect or misleading. Learning the course material depends on completing and submitting your own work. UNM preserves and protects the integrity of the academic community through multiple policies, including policies on student grievances (Faculty Handbook D175 and D176), academic dishonesty (FH D100), and respectful campus (FH CO9). These are in the Student Pathfinder (https://pathfinder.unm.edu) and the Faculty Handbook (https://handbook.unm.edu). Support: Many students have found that time management workshops or work with peer tutors can help them meet their goals. These and are other resources are available through Student Learning Support at the Center for Teaching and Learning.Connecting to Campus and Finding Support
UNM has many resources and centers to help you thrive, including opportunities to get involved, mental health resources, academic support including tutoring, resource centers for people like you, free food at Lobo Food Pantry, and jobs on campus. Your advisor, staff at the resource centers and Dean of Students, and I can help you find the right opportunities for you.Support in Receiving Help
Students who ask for help are successful students. I encourage students to be familiar with services and policies that can help them navigate UNM successfully. Many services exist to help you succeed academically, such as peer tutoring at CAPS and http://mentalhealth.unm.edu. There are plenty of ways to find your place and your pack at UNM: see the “student guide” tab on my.unm, students.unm.edu, or ask me for information about the right resource center or person to contact.Doing the Right Thing
UNM has policies to preserve and protect you and the academic community available in the Student Pathfinder as well as in the Faculty Handbook. These include policies on student grievances D175 (undergraduates) and D176 (graduate and professional students), academic dishonesty (D100), and respectful campus (CO9). Please ask for help in understanding and avoiding plagiarism (passing the work or words of others off as your own work or words) or other forms of academic dishonesty. Doing something dishonest in a class or on an assignment can lead to serious academic consequences. Come talk with me about your concerns or needs for academic flexibility or talk with support staff at one of our student resource centers before you do something that may endanger your career.Our Classroom
We’re doing this because:- We want you to be empowered with statistics.
- We believe everyone should get out of this course with awesome skills
- Real-time feedback promotes efficient learning
GAISE Connections
Our six recommendations include the following:- Emphasize statistical literacy and develop statistical thinking
- Use real data
- Stress conceptual understanding, rather than mere knowledge of procedures
- Foster active learning in the classroom
- Use technology for developing conceptual understanding and analyzing data
- Use assessments to improve and evaluate student learning
Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius
Archive
Passion Driven Statistics (PDS) data
- Install PDS package.
- AddHealthW1 Sampling Design, Codebook, RData.
- AddHealthW4 Sampling Design, Codebook, RData.
- NESARC Sampling Design, Codebook, RData.
- OutlookOnLife Sampling Design, Codebook, RData.
- GapMinder Sampling Design, Codebook, RData.
Step 0
Before our first class (Tue 1/21) please read through the following actions and install the required software on your computer and complete the brief survey. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open. Video for this process (ignore the “crowdgrader” portion).- Complete surveys
- a short Opinio pre-survey required for classroom assessment (1/20 – 2/1/2020).
- Install R (windows or mac) or upgrade, then Rstudio. Videos that may be helpful:
- Install R on Mac (2 min).
- Install R for Windows (3 min).
- Install R and RStudio on Windows (5 min).
- Install R packages,
- Run RStudio
- Run code in R packages.
- Update all packages, RStudio Packages tab, click “update”, click “select all”, and “Install Updates”. Say “Yes” to restart R, but if it asks a second time, say “No”. Say “No” to “install from sources” if it asks.
- Set up your computer
- RStudio disable notebook
- Operating system to be more friendly to programming.
- (Postpone until later: Install LaTeX (for poster at end of the semester).)
Asking smart questions
- “Smart Questions” guide (note “hackers build things, crackers break them”)
- Follow this Rubric when emailing a question:
- Send a new email for each new question. Use “Reply” to continue a conversation on a question (do not start a new email, again).
- Include “ADA2” as the first word of the subject line in new emails (if replying, use reply), with the rest of the subject indicating the assignment and type of problem.
- Begin the email with a short question summary (that is, don’t bury your question in the middle of the third paragraph). Then, begin the detail of your question in the second paragraph.
- When possible, include commented code in the email body — Comments (starting with # symbol) should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce the problem.
- Attach your qmd file so that the instructor can reproduce the problem. If attaching code, please include all the files necessary to run your code (data, etc.)
- [Attaching code supersedes this:
Code should include a “Minimum representative test case” (http://www.catb.org/esr/faqs/]smart-questions.html#code) - Assume the best. Your instructors want to help and we will do our best. Do not abuse your helpers even if you feel frustrated.
RMarkdown and knitr issues
- R errors, unresolved, and out of time If you’re saying: “An error while knitting keeps me from turning in the assignment…”, then use code chunk option
```{r, error = TRUE}to ignore the error and continue. This will allow you to turn in partial assignments with errors.
- Unicode compile problems: If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”. ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break. You get unwanted Unicode when you copy/paste from a pdf or some other source into your code. To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent. To do this: Ctrl-F to find, search for “[^\x00-\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one. As it finds instances, replace the characters manually until there are no more. These characters will typically be curly quotes or fancy dashes.
Pre-course to-dos
Did you receive a registration error for Spring 2023? Send me an email with the following answers:- What registration error did you get (copy/paste is best)?
- What is your UNM ID?
- What is your Math/Stat background (that is, do you have the pre-requisites)?
3/1/17 – Data resources for poster:
- List of 50+
- kaggle
- drivendata
- 538
- agridat package
- wise data sources
- statsci datasets
- vanderbilt datasets
Citing and using notes, including previous editions
Citing lecture notes: Erhardt EB, Bedrick EJ, and Schrader RM. (2020) Lecture notes for Advanced Data Analysis 2. Retrieved Mar 1, 2020, from statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf, 136–144.- Notes from Spring 2020 using R with tidyverse: ADA2_notes_S20.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf.
- Notes from Spring 2017 using R: ADA2_notes_S17.pdf
- Notes from Spring 2016 using R: ADA2_notes_S16.pdf
- Notes from Spring 2015 using R: ADA2_notes_S15.pdf
- Notes from Spring 2014 using R: ADA2_notes_S14.pdf
- Notes from Spring 2013 using R: ADA2_notes_S13.pdf
- Notes from Spring 2012 using SAS: ADA2_notes_S12.pdf