ADA2_Pre-flipped

UNM Stat 428/528: Advanced Data Analysis II (ADA2)

Spring 2015 Syllabus is below table.


Spring 2015 schedule; Time: TR 1230-1345; Location: DSH 120; Stat 428, CRN 25445; Stat 527, CRN 25449
News:
Tentative Timetable
Wk-Date Ch Topic Slides Code Data pts HW sol Data Read HW Review HW Due Plot
01-01/13 01 R, Review Ch 01 R d1 d2 60 HW01 sol dat, FB 01/22 01/27 rivers
01-01/15 fire pie
02-01/20 02 Introduction to Multiple Linear Regression Ch 02 R d1 d2 60 HW02 sol dat, FB 01/29 01/03 lifecycle
02-01/22 03 A Taste of Model Selection for Multiple Linear Regression Ch 03 R d1 85 HW03 sol dat, FB 02/05 02/10 diss
03-01/27 04 Experimental Design: One and Two Factor Designs Ch 04 support
03-01/29 snow
04-02/03 05 Paired Experiments and Randomized Block Designs Ch 05 Coef R d1 d2 d3 d4 140 HW05 sol dat, FB 02/19 02/24 city
04-02/05 music
05-02/10 history
05-02/12 olymp 2
06-02/17 freq
06-02/19 06 Discussion of Observational Studies Ch 06 R d1 tilt 2
07-02/24 07 Analysis of Covariance: Comparing Regression Lines Ch 07 R d1 d2 d3 80 HW07 sol datFB 03/03 03/05 drought
07-02/26 08 Polynomial Regression Ch 08 R d1 d2 band 2
08-03/03 09 Response Models with Factors and Predictors Ch 09 R d1 100 HW09 sol (dat = HW05), FB 03/19 03/24
08-03/05 10 Model Selection for Multiple Regression Ch 10 R d1 pleasant
09-03/10 Spring Break best13
09-03/12 Spring Break
10-03/17 11 Logistic Regression Ch 11 R d1 d2 d3 d4 d5 80 HW11 sol dat, FB 03/31 04/02 elem
10-03/19 moon
11-03/24 12 An Introduction to Multivariate Methods Ch 12 R
11-03/26
12-03/31 13 Principal Components Analysis (PCA) Ch 13 R d1 d2 d3 d4 60 HW13 sol datFB 04/14 04/16 water
12-04/02 daily
13-04/07 14 Cluster Analysis Ch 14 R d1 d2
13-04/09 15 Multivariate Analysis of Variance (MANOVA) Ch 15 R d1
14-04/14 16 Discriminant Analysis Ch 16 R d1 die-d3
14-04/16 17 Classification Ch 17 R 100 HW17 sol R datFB 04/28 04/30 spiral
15-04/21 deadly
15-04/23 18 Data Cleaning Ch 18 R d1 d2 d3 d4 d5 90 HW18 sol datFB Had SN 05/7 by 3pm slid under my office door Math&Stat (SMLC 312) wcloud images
16-04/28 histomap
16-04/30 Evaluations click “login” on left
17-05/07 Finals Week FB
I recommend printing (two to a page, double-sided) only the upcoming chapter the day before class because future chapters are subject to edits. Notes from Spring 2014 using R: ADA2_notes.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/ADA2_notes.pdf. Citing lecture notes, example: Bedrick EJ, Schrader RM, and Erhardt EB. (2013) Lecture notes for Advanced Data Analysis 2. Retrieved Mar 1, 2013, from statacumen.com/teach/ADA2/ADA2_notes.pdf, 136–144. Notes from Spring 2013 using R: ADA2_notes_S13.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/ADA2_notes_S13.pdf. Notes from Spring 2012 using SAS: ADA2_notes_S12.pdf includes all chapters in one document. Creative Commons License Lecture notes for Advanced Data Analysis 2 (ADA2) Stat 428/528 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA2/ADA2_notes_S12.pdf.

Syllabus

Description: A continuation of 427/527 that focuses on methods for analyzing multivariate data and categorical data. Topics include MANOVA, principal components, discriminant analysis, classification, factor analysis, analysis of contingency tables including log-linear models for multidimensional tables and logistic regression. Prerequisite: Stat 427 (ADA1) Semesters offered: Spring Lecture: Stat 428/528.001 (CRN 25445 or 25449), TR 9:30–10:45, DSH 120 email: “Erik B. Erhardt” <erike@stat.unm.edu>, please include “ADA2” in subject line Textbook: Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 978-0-387-79053-4. The book is not required, but it will provide a backup for what you learn in class. Office hours: MSLC 312, Tue 11:00-12:00, Thu 14:00-15:00 Teaching Assistants Andisheh Dadashi, andishehATmath.unm.edu, Office Hours Mon and Wed 12:00-13:00, table outside SMLC 312 door Xichen Li, jessieliATunm.edu, Office Hours Fri 13:00-14:00, SMLC 201. R tutorials: TryR (gentle), Kelly Black Cookbook for R for helpful examples, visualization tutorials, diagrams.  

Assessment

Rubrics guide self-assessment of homework and code. Homework is designed to encourage you to review the material we’ve learned, synthesize new information from the R help pages or the web, and apply (and learn!) your new skills. Expect to spend 4-5 hours a week (outside of class!) to do well, and maybe double that to do outstandingly well. Start working on the homework when it is assigned, not the weekend before it’s due. Homework is due 1 week (or 2 classes, whichever is shorter) after we complete each chapter. Homework score includes an ethos (credibility) multiplier:
[points earned]*1.1 for exceptional work, code, exposition (rubric top level) [points earned]*1.0 acceptable to great work [points earned]*0.9 for unorganized work, undocumented code (rubric lower levels)
Header for homework assignments should include:
First Last ADA2 Stat 428 (or 528) HW ## MM/DD/YYYY
All R code for the assignment should be included with each problem. Please hand in a physical version of your homework – a grader will write comments on it and give it back to you. An electronic version will be accepted under exception circumstances (almost never). Late assignments will be penalized 20% if handed in (or slid under my office door) by 5pm the following day, and will not be accepted after that. Final grade is the proportion correct of HW points, possibly with a safety cushion built-in (such as by reducing the denominator).  S15 cushion divided final score by 0.99 for grad and 0.97 for undergrad students.

Disability statement

If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB. Random stuff: UNM R programming group, organized and taught by Christian Gunning, meeting at 12:00pm on Friday in the PIBBS space in Castetter Hall. UNM has license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these. R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results. ggplot2 plotting cookbook. R reference card by Jonathan Baron. Translate between MATLAB and R. Figure checklist. Choosing the right chart. Raster vs vector graphics. Statistics pre-req refresher from Khan Academy.  

Spring 2014 preamble: Did you receive a registration error? Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-reqs)? If you are waitlisted, I will override you into the course. Don’t worry.
Get started before class: Step 0: Set up R with Rstudio (1) Download R for windows or mac, (2) install Rstudio, and (3) install a package we’ll use with the following R command: install.packages("ggplot2"). R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results.

Spring 2014 schedule: Time: TR 0930-1045 Location: Hibben 105 Stat 428, CRN 25445 Stat 528, CRN 25449 Did you receive a registration error for Spring 2014? Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-reqs)?   Installing GGally on Max OSX: 1. Download http://cran.r-project.org/src/contrib/GGally_0.5.0.tar.gz into your Downloads Folder. 2. Download the X11 program from http://xquartz.macosforge.org/landing/ and install it. 3. Run X11 — you’ll be at a command prompt where you can navigate your hard drive. 4. Type “pwd” (don’t type the quotes) and it will probably tell you you’re in “/Users/[your user name]”. 5. Change to the Downloads folder with “cd Downloads”. 6. Type “pwd” and it should tell you you’re in “/Users/[your user name]/Downloads”. 7. List the files with “ls” and verify that file GGally_0.5.0.tar.gz is there. 8. Install GGally with “R CMD INSTALL GGally_0.5.0.tar.gz”, it will give some messages about installing, with “DONE (GGally)” at the end. 9. Restart RStudio and it will work (fingers crossed)!


Table of selected statistical methods

The data and design determines which method you use: original or UCLA. Here’s a table of methods with the applicable semester of ADA and Chapter.
Number of Dependent Variables Number of Independent Variables Type of Dependent Variable(s) Type of Independent Variable(s) Measure Test(s) ADA-Ch
1 0 (1 population) continuous normal not applicable (none) mean one-sample t-test 1-02
continuous non-normal median one-sample median 1-06
categorical proportions Chi Square goodness-of-fit, binomial test 1-07
1 (2 independent populations) normal 2 categories mean 2 independent sample t-test 1-03
non-normal medians Mann Whitney, Wilcoxon rank sum test 1-06
categorical proportions Chi square test Fisher’s Exact test 1-07
0 (1 population measured twice) or 1 (2 matched populations) normal not applicable/ categorical means paired t-test 1-02
non-normal medians Wilcoxon signed ranks test 1-06
categorical proportions McNemar, Chi-square test 1-07
1 (3 or more populations) normal categorical means one-way ANOVA 1-05
non-normal medians Kruskal Wallis 1-06
categorical proportions Chi square test 1-07
2 or more (e.g., 2-way ANOVA) normal categorical means Factorial ANOVA 2-05
non-normal medians Friedman test not
categorical proportions log-linear, logistic regression 2-11
0 (1 population measured 3 or more times) normal not applicable means Repeated measures ANOVA not
1 normal continuous correlation, simple linear regression 1-08
non-normal non-parametric correlation 1-08
categorical categorical or continuous logistic regression 2-11
continuous discriminant analysis 2-16
2 or more normal continuous multiple linear regression 2-02
non-normal
categorical logistic regression 2-11
normal mixed categorical and continuous Analysis of Covariance, General Linear Models (regression) 2-09
non-normal
categorical logistic regression 2-11
2 2 or more normal categorical MANOVA 2-15
2 or more 2 or more normal continuous multivariate multiple linear regression not
2 sets of 2 or more 0 normal not applicable canonical correlation not
2 or more 0 normal not applicable factor analysis not
0 or more mixed categorical and continuous principal component analysis (w/multiple regression) 2-13
categorical cluster analysis 2-13
discriminant analysis 2-16
classification 2-17

Acumen in Statistics