UNM Stat 428/528: Advanced Data Analysis II (ADA2)
Table of Contents
Goal
Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (multiple regression, analysis of covariance, logistic regression, and multivariate methods), which you’ll be proud to present (poster).
News
Use the new COVID-19 Timetable near the top of this page for the rest of the semester assignments.COVID-19 Timetable
The rest of the semester will follow this schedule. Please ignore the original timetable further down. Changes:- No poster project or presentation. No final project.
- Remove MANOVA assignment (24), but keep reading and quiz content.
- Quizzes still due Tuesday, but by midnight. (Extra time the first week, until Thursday.)
- Spread all assignments out, one assignment per week (instead of two). Each assignment “starts” on Tuesday with a due date of the following Monday at midnight.
- Instructor and TA assistance will be via Zoom remote computer conferencing (more details below timetable).
- Video COVID-19 introduction. A great agent-based modeling of COVID-19 disease spread.
Date | Cl | Topic | Reading, Video, Quiz | In-class Worksheet, Data | Homework |
---|---|---|---|---|---|
03/24 | 19 | 11 Logistic Regression | read: Ch 11 video: 11-1 11-2 11-3 11-4 quiz: 10 Due: Thu 3/26 11:59PM | In-class: Rmd html dat video Due: Mon 3/30 11:59PM | |
03/26 | 20 | ||||
03/31 | 21 | HW: 20 Logistic Regression Rmd html dat video Due: Mon 4/6 11:59PM | |||
04/02 | 22 | ||||
04/07 | 23 | 12 An Introduction to Multivariate Methods | read: Ch 12-13 video: 12 13-1 13-2 13-3 quiz: 11 (2 parts) Due: Tue 4/7 11:59PM | In-class: Rmd html dat video Due: Mon 4/13 11:59PM | |
04/09 | 24 | ||||
04/14 | 25 | 13 Principal Components Analysis (PCA) | HW: 22 PCA Rmd html dat video Due: Mon 4/20 11:59PM | ||
04/16 | 26 | ||||
04/21 | 27 | 14 Cluster Analysis | read: Ch 14-15 video: 14-1 14-2 14-3 15 quiz: 12 (2 parts) Due: Tue 4/21 11:59PM | In-class: Clustering Rmd html dat video Due: Mon 4/27 11:59PM | |
04/23 | 28 | ||||
04/28 | 29 | 16 Discriminant Analysis 17 Classification | read: Ch 16-17 video: 16-1 16-2 17-1 17-2 17-3 quiz: 13 (2 parts) Due: Tue 4/28 11:59PM | In-class: Discriminant analysis for classification Rmd html dat video Due: Mon 5/4 11:59PM | |
04/30 | 30 | ||||
05/05 | 31 | 13+11+17 PCA and logistic regression classification | HW: 26+22+28 PCA and logistic Classification Rmd html dat video Due: Mon 5/11 11:59PM | ||
05/07 | 32 | ||||
05/12 | FINALS WEEK | (no final) | Surveys Due — * Learning Studio * EvalKit in Learn (log in, then right side under “EvaluationKIT”) | (no poster) |
COVID-19 Instructor support
Instructors Erik Erhardt <erike@stat.unm.edu>, he/him, Zoom Leah Puglisi <lhpuglisi@unm.edu>, she/her, Zoom Ola Anifowoshe <oanifowoshe@unm.edu>, he/him, Zoom Mohammad Ahmadi <mahmadi@unm.edu>, he/him, Zoom Procedure for online meetings via Zoom Click on Zoom link (within email: “ADA2 Zoom personal meeting rooms”) to connect to the instructor’s Personal Meeting Room. If prompted, download and install the Zoom client for your computer and let it run. Be prepared to share your screen with the instructor, either just your RStudio window or your desktop. If someone else is already in a meeting with the instructor, then you’ll be asked to be put on hold (into the “waiting room”) and you’ll be helped in the order that you called in. Times Mon: 12-3p Ola, 1-3p Leah, 3-5p Erik, Tue: 10a-12p Leah, 12-2p Mohammad, 2-4p Erik Wed: 9a-12p Ola, 1-3p Leah, 2-5p Mohammad Thu: 10a-12p Mohammad, 10a-12p Leah, 12-2p Ola, 2-4p Erik Fri: 10a-12p Ola, 10a-12p Leah, 12-2p Erik, 2-5p Mohammad Sat: NA Sun: NACourse content
Weekly schedule (also see Assessment below)
- Pre-class (pre-Tuesday): Reading, Video, Quiz (due before Tue class — solutions become available Tue 3:30 pm, after quiz is due)
- In-class (Tue): Worksheet started in class Tuesday submitted to UNM Learn by Wed 11:59 pm.
- In-class (Thu): Homework started in class Thursday submitted to UNM Learn by the next Thu 3:30 pm.
Course notes, code, data, and video lectures
Notes from Spring 2020: ADA2_notes_S20.pdf includes all chapters in one document. Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/notes/ADA2_notes_S20.pdf.Ch | Chapter Title | Notes | R code | Datasets | Video lectures playlist |
---|---|---|---|---|---|
01 | R statistical software and review | R | turkey.csv, rocket.dat | (Videos based on S16 notes) 01-1, 01-2 | |
02 | Introduction to Multiple Linear Regression | R | indian.dat, gce.dat | 02-1, 02-2 | |
03 | A Taste of Model Selection for Multiple Regression | R | ratliver.csv | 03-1, 03-2 | |
04 | One Factor Designs and Extensions | R | none | 04 | |
05 | Paired Experiments and Randomized Block Experiments | R | battery.dat, beetles.dat, itch.csv, ratinsulin.dat | 05-0 05-1 05-2 05-3 05-4 05-5 05-6 05-7 05-8 05-9 | |
06 | A Short Discussion of Observational Studies | R | sat.dat | 06 | |
07 | Analysis of Covariance: Comparing Regression Lines | R | tools.dat, toolsfake.dat, twins.dat | 07-1 07-2 07-3 HW helper video | |
08 | Polynomial Regression | R | cloudpoint.dat, mooney.dat | 08-1 08-2 | |
09 | Discussion of Response Models with Factors and Predictors | R | faculty.dat | 09-1 09-2 09-3 | |
10 | Automated Model Selection for Multiple Regression | R | oxygen.dat | 10-1 10-2 10-3 | |
11 | Logistic Regression | R | beetles.dat, leuk.dat, menarche.csv, shuttle.csv, trauma.dat | 11-1 11-2 11-3 11-4 | |
12 | An Introduction to Multivariate Methods | R | none | 12 | |
13 | Principal Component Analysis | R | bgs.dat, shells.dat, sparrows.dat, temperature.dat | 13-1 13-2 13-3 | |
14 | Cluster Analysis | R | birthdeath.dat, teeth.dat | 14-1 14-2 14-3 | |
15 | Multivariate Analysis of Variance | R | shells_mf.dat | 15 | |
16 | Discriminant Analysis | R | mower.dat | 16-1 16-2 | |
17 | Classification | R | business.dat | 17-1 17-2 17-3 | |
18 | Data Cleaning | R | conversions.txt, dalton.txt, dirty_iris.csv, edits.txt, people.txt, unnamed.txt |
(I reserve the right to continue to improve the materials throughout the semester.)
Timetable (OLD, do not use after 3/22)
Date | Cl | Topic | Reading, Video, Quiz | In-class Worksheet, Data | Homework |
---|---|---|---|---|---|
01/10 | 00 | Install software | See Step 0 video: 00 | ||
01/21 | 01 | 01 R, Review | read: Ch 01 video: 01-1, 01-2 | ||
01/23 | 02 | In-class quiz | In-class: 02 R Review Rmd html dat Videos: 1, 2, 3 | No HW 02 | |
01/28 | 03 | 02 Introduction to Multiple Linear Regression | read: Ch 02 video: 02-1, 02-2 quiz: 02 | In-class: Rmd html dat Submit pdf with solutions by Wed 5pm. | |
01/30 | 04 | HW: 04 Mult LR Rmd html dat Submit your pdf to UNM Learn. | |||
02/04 | 05 | 03 A Taste of Model Selection for Multiple Linear Regression | read: Ch 03, 04 video: 03-1, 03-2, 04 quiz: 03 (2 parts) | In-class: Rmd html dat | |
02/06 | 06 | 04 Experimental Design: One and Two Factor Designs | HW: 06 Taste Model Sel Rmd html dat | ||
02/11 | 07 | 05 Paired Experiments and Randomized Block Designs | read: Ch 05 (start – 5.2) video: 05-0 05-1 05-2 05-3 05-4 05-5 quiz: 04 | In-class: Rmd html | |
02/13 | 08 | HW: 08 Experiments 1 Rmd html | |||
02/18 | 09 | read: Ch 05 (5.3 – end) video: 05-6 05-7 05-8 05-9 quiz: 05 | In-class: Rmd html dat | ||
02/20 | 10 | HW: 10 Experiments 2 Rmd html dat | |||
02/25 | 11 | 06 Discussion of Observational Studies | read: Ch 06-07 video: 06 07-1 07-2 07-3 quiz: 06 (2 parts) | In-class: html turn in paper version Erik will bring print paper worksheets. | |
02/27 | 12 | 07 Analysis of Covariance: Comparing Regression Lines | HW: 12 ANCOVA 1 Rmd html dat Discuss Wald test matrix specification. | ||
03/03 | 13 | 08 Polynomial Regression | read: Ch 08-1 08-2 09-1 09-2 09-3 video: quiz: 07 (2 parts) | In-class: Rmd html dat | |
03/05 | 14 | 09 Response Models with Factors and Predictors | HW: 14 ANCOVA 2 Rmd html dat Helper video | ||
03/10 | 15 | 10 Model Selection for Multiple Regression | read: Ch 10 video: 10-1 10-2 10-3 quiz: 08 | HW 07 Continued in class | |
03/12 | 16 | HW 14 Continued in class, due Friday by midnight. | |||
03/17 | 17 | Spring Break | |||
03/19 | 18 | Spring Break | |||
03/24 | 19 | 11 Logistic Regression | read: Ch 11 video: 11-1 11-2 11-3 11-4 quiz: 10 | In-class: Rmd html dat | Poster: Poster Planning Rmd html Due Tuesday. Choose/define poster project requiring a method from class: ANCOVA, Logistic multiple regression, PCA, etc. |
03/26 | 20 | HW: 20 Logistic Regression Rmd html dat | |||
03/31 | 21 | 12 An Introduction to Multivariate Methods | read: Ch 12-13 video: 12 13-1 13-2 13-3 quiz: 11 (2 parts) | In-class: Rmd html dat | |
04/02 | 22 | 13 Principal Components Analysis (PCA) | HW: 22 PCA Rmd html dat | ||
04/07 | 23 | 14 Cluster Analysis | read: Ch 14-15 video: 14-1 14-2 14-3 15 quiz: 12 (2 parts) | In-class: Clustering Rmd html dat | |
04/09 | 24 | 15 Multivariate Analysis of Variance (MANOVA) | HW: 24 MANOVA Rmd html dat | ||
04/14 | 25 | 16 Discriminant Analysis 17 Classification | read: Ch 16-17 video: 16-1 16-2 17-1 17-2 17-3 quiz: 13 (2 parts) | In-class: Discriminant analysis for classification Rmd html dat | |
04/16 | 26 | 13+11+17 PCA and logistic regression classification | HW: 26+22+28 PCA and logistic Classification Rmd html dat | ||
04/21 | 27 | Posters begin | HW: Poster document 1 of 2: Analysis, Due Friday Rmd html | ||
04/23 | 28 | ||||
04/28 | 29 | HW: Poster document 2 of 2: Intro/Discuss/Bib, Due Friday Rmd html | |||
04/30 | 30 | ||||
05/05 | 31 | Survey Poster finalize | Poster template pdf, Rnw, sty, bib, logo Example poster pdf, Rnw Transition from Markdown to LaTeX Video for poster transition | $10 poster printing Minuteman Press, Eubank 1631 Eubank Boulevard NE, Suite D, Albuquerque, NM 87112 (505)881-0164 Open Mon-Fri 8a-5p Submit poster to website Project name: “UNM ADA2 class poster” Due Date: try to submit a few days early so the printer isn’t overwhelmed by requests Additional Details: “3’x4′ portrait poster on bond paper” File #1: Name the poster pdf with your name in the filename, such as “FirstLast_ADA1_poster.pdf”. Arrange to pick up the poster. | |
05/07 | 32 | POSTERS | Poster session in SMLC lobby 3:30-6:30pm | Poster: Submit poster pdf to UNM Learn Due Fri 5pm Poster reviewing rubric | |
05/12 | FINALS WEEK | (no final) | Surveys Due — submit receipt or confirmation page to UNM Learn * Learning Studio * EvalKit in Learn |
Syllabus
Description: A continuation of 427/527 that focuses on methods for analyzing multivariate data and categorical data. Topics include MANOVA, principal components, discriminant analysis, classification, factor analysis, analysis of contingency tables including log-linear models for multidimensional tables and logistic regression. Prerequisite: Stat 427 (ADA1) Semesters offered: Spring Lecture: Stat 428/528.001 (CRN 33933 or 33935), TR 1530-1645, CTLB 300 Video email: “Erik B. Erhardt” <erike@stat.unm.edu>, please include “ADA2” in the subject line Textbook: Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 978-0-387-79053-4. The book is not required, but it will provide a backup for what you learn in class. Laptops running R: I encourage you to bring a laptop to class each day so you can work on the exercises in class. If you don’t have one, no problem, there are laptops in class and teamwork is encouraged — sit next to someone friendly and discuss your work. Classroom computers: Please reboot classroom laptops at the end of class period by request of the IT staff. Saving data: If you’re using classroom computers, use flash drives or UNM’s OneDrive (available in LoboMail) for saving files. I recommend using the simple but systematic folder structure: one main folder called Stat428_ADA2 with all of your assignments (keep the original filenames) with subfolders for lecture notes and your poster.Instructors
Please include “ADA2” in the subject line of all emails.Professor
Erik Erhardt <erike@stat.unm.edu>, he/him, SMLC 312Teaching Assistants
Leah Puglisi <lhpuglisi@unm.edu>, she/her, SMLC 319 Ola Anifowoshe <oanifowoshe@unm.edu>, he/him, SMLC 208 Mohammad Ahmadi <mahmadi@unm.edu>, he/him, SMLC 323Additional Assistants, Peer Mentors, SEP
Kelli Kasper, she/her Grace Mayer, she/herOffice hours
Mon: 14:00-16:00 Leah Tue: 12:30-13:30 Leah, 13:30-15:00 Erik Wed: 9:00-11:00 Ola, 14:00-16:00 Mohammad Thu: 12:30-13:30 Ola, 13:30-15:00 Erik Fri: 14:00-16:00 Mohammad- We are also all available by appointment by email if these many hours do not work for you.
Student learning outcomes
Similar to ADA1, but at a higher level.Assessment
- Quizzes will be due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with a minimal lecture. (About 12, 20% of the final grade, the lowest few are dropped.) Most weeks plan for 1-3 hours reading and video, 30-60 minute quiz.
- Viewing quiz solutions after the due date in UNM Learn is not intuitive. Click on the “Begin” button (this is the non-intuitive part since you are not actually beginning the quiz), then click “View All Attempts” to see the scores. Finally, click “Calculated Grade” to see the feedback for each question of the quiz.
- In-class assignments are due the following day (Wed) by 5 pm, submitted to UNM Learn. Purpose: to struggle and find success in class with the concepts and skills. (About 12, includes class participation, 20% of the final grade, the lowest few are dropped.) Plan to start and finish in class, sometimes 1-2 hours beyond class.
- Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to UNM Learn. Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of the final grade, the lowest few are dropped.) Most weeks plan on 2-12 hours per assignment.
- Poster will be developed and completed in the last weeks of the semester, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (16% total: 1 poster and presentation, 2% preparation, 10% poster, 2% presentation, and 2% evaluations of others of the final grade.) In the last couple of weeks, assembling this poster may take 3-5 hours, using a template provided to you.
- Course surveys are to collect information to help facilitate the class or to encourage participation in course evaluations. Purpose: to participate in national project-based learning projects and improve the course. (About 2, 4% of final grade [and a simple way to go from B+ to A].)
Collaboration and citation
For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.). As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free to discuss your problems with other classmates or meet with or email questions to the TAs or me. I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL into your code as a comment (which is doubly helpful to you for finding the resource again).Statements
Disability statement
If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.Title IX statement
In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15). This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity. For more information on the campus policy regarding sexual misconduct.UNM Indigenous Peoples Land and Territory Acknowledgment
I would like to acknowledge the original peoples of this land. The Sandia Pueblo (other pueblo communities) and the Navajo nation have ties and stories on this land and within the broader community that are connected within New Mexico. I am grateful to be able to work here in relationship and strengthen community on this territory.Our Classroom
We’re doing this because:- We want you to be empowered with statistics.
- We believe everyone should get out of this course with awesome skills
- Real-time feedback promotes efficient learning
GAISE Connections
Our six recommendations include the following:- Emphasize statistical literacy and develop statistical thinking
- Use real data
- Stress conceptual understanding, rather than mere knowledge of procedures
- Foster active learning in the classroom
- Use technology for developing conceptual understanding and analyzing data
- Use assessments to improve and evaluate student learning
Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius
Archive
Passion Driven Statistics (PDS) data
Install PDS package. AddHealthW1 Sampling Design, Codebook, RData. AddHealthW4 Sampling Design, Codebook, RData. NESARC Sampling Design, Codebook, RData. OutlookOnLife Sampling Design, Codebook, RData. GapMinder Sampling Design, Codebook, RData. Old newsStep 0
Before our first class (Tue 1/21) please read through the following actions and install the required software on your computer and complete the brief survey. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open. Video for this process (ignore the “crowdgrader” portion).- Complete surveys
- a short Opinio pre-survey required for classroom assessment (1/20 – 2/1/2020).
- Install R (windows or mac) or upgrade, then Rstudio. Videos that may be helpful:
- Install R on Mac (2 min).
- Install R for Windows (3 min).
- Install R and RStudio on Windows (5 min).
- Install R packages,
- Run RStudio
- Run code in R packages.
- Update all packages, RStudio Packages tab, click “update”, click “select all”, and “Install Updates”. Say “Yes” to restart R, but if it asks a second time, say “No”. Say “No” to “install from sources” if it asks.
- Set up your computer
- RStudio disable notebook
- Operating system to be more friendly to programming.
- (Postpone until later: Install LaTeX (for poster at end of the semester).)
RMarkdown and knitr issues
R errors, unresolved, and out of time If you’re saying: “An error while knitting keeps me from turning in the assignment…”, then use code chunk option```{r, error = TRUE}to ignore the error and continue. This will allow you to turn in partial assignments with errors.
Pre-course to-dos
Did you receive a registration error for Fall 2019? Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-requisites)? If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.3/1/17 – Data resources for poster: List of 50+ kaggle drivendata 538 agridat package wise data sources statsci datasets vanderbilt datasets