Current course is at ADA1, this is a previous year.
UNM Stat 427/527: Advanced Data Analysis I (ADA1)
This page is a reminder for what the course looked like before “flipping” it.
Fall 2014 schedule; Time: ; Location: ; Stat 427.001, CRN Stat 527.001, CRN
Did you receive a registration error for Fall 20xx? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the prereqs)?
If you are waitlisted, I will override you into the course. Don’t worry.
Before first day:
Step 0a: Set up R and Rstudio
(1) Download R for windows or mac, (2) install Rstudio, and (3) install a package we’ll use with the following R command:install.packages("ggplot2")
.
R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results.
Step 0c: A few more to come …
News: (I reserve the right to continue to improve the notes.)
Tentative Timetable for Fall 2014
WkDate  Ch  Topic  Slides Code Data  pts HW sol Data 
Read ISWR 
HW Due 
Plot 
0108/19  00  Introduction to R, Rstudio, and ggplot 
Ch 00 R  10 HW00 sol data? 
Step 0 Above 
08/21  day 
0108/21  (Find a HW/R buddy)  1.2, 1.3  Minard  
0208/26  Ch 2  pac,pi  
0208/28  01  Summarizing and Displaying Data 
Ch 01 R  60 HW01 sol d1 d2, FB 
09/11  crash  
0309/02  Ch 4  Nobel  
0309/04  Space  
0409/09  02  Estimation in OneSample Problems 
Ch 02 R  120 HW02 sol d1 d2 d3, FB 
09/25  9/11  
0409/11  N: Uncert  5.1  baby  
0509/16  fx23456  
0509/18  03  TwoSample Inferences 
Ch 03 R  includes Ch 4  5.3  ebola null 

0609/23  rad  
0609/25  04  Checking Assumptions 
Ch 04 R  130 HW03 sol d1 d2 d3, FB 
10/07  boyfr  
0709/30  sig  
0710/02  05  Oneway ANOVA  Ch 05 R CHDS dat desc 
80 HW05 sol FB 
7.1  10/21  worst,2 
0810/07  Obudg,2  
0810/09  Fall Break  
0910/14  Cancelled (ACASA)  bball  
0910/16  06  Nonparametric Methods 
(Midterm Review) Ch 06 R 
175 HW06 sol d1 d2 d3 d4 FB 
5.2, 5.5, 5.7, 7.2, 7.4 
11/04  bball2 
1010/21  
1010/23  feel,2  
1110/28  Midterm, Chs 15  Bring: UNM ID, pen(cil), and 4×6″ handwritten “help” card 
sol FB 
vote,2  
1110/30  07  Categorical Data Analysis 
Ch 07 R  105 HW07 sol FB 
Ch 8  11/20  choc p 
1211/04  cause grid 

1211/06  work  
1311/11  occupy  
1311/13  08  Correlation and Regression 
Ch 08 R  80 HW08 sol d1, FB 
Ch 6  12/04  food 
1411/18  terr  
1411/20  extrap,2  
1511/25  roulette  
1511/27  Thanksgiving break  sodapop  
1612/02  09  Bootstrap  Ch 09 R  text  
1612/04  Cancelled (travel)  wordcloud R  insur  
1712/09  Finals week  (no final)  FB  income 
10 Power and Sample size Ch 10 R
R functions written for these notes appearing in other chapters.
Statistical consulting and collaboration slides
Notes from Fall 2014 using R: ADA1_notes_F14.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F14.pdf.
Notes from Fall 2013 using R: ADA1_notes_F13.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F13.pdf.
Notes from Fall 2012 using R: ADA1_notes_F12.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F12.pdf.
Notes from Fall 2011 using Minitab: ADA1_notes_F11.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons AttributionNonCommercialShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F11.pdf.
Syllabus
Description: Statistical tools for scientific research, including parametric and nonparametric methods for ANOVA and group comparisons, simple linear and multiple linear regression and basic ideas of experimental design and analysis. Emphasis placed on the use of statistical packages such as R. Course cannot be counted in the hours needed for graduate degrees in Mathematics and Statistics.
Prerequisite: Stat 145 (or other intro stats course)
Semesters offered: Fall
Lecture: Stat 427/527.001, TR 12:30–13:45, Hibben 105
Office hours: Tue 11:0012:00, Thu 15:3016:30, and by appointment in SMLC 312
email: “Erik B. Erhardt” <erike@stat.unm.edu>, please include “ADA1” in subject line
Textbook: Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 9780387790534. The book is not required, but it will provide a backup for what you learn in class.
i>Clickers: Yes, we’re going to use clickers! You don’t need to buy a new one, you can get a used one, or you can share with someone who isn’t also in our class. Please bring the same one to class each day. Sorry, web clickers are not an option, since it won’t meet our needs (also some expense for using the web clicker system rather than the simple iClicker system).
Laptops running R: I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, teamwork is encouraged — sit next to someone friendly who likes to share.
Teaching Assistant and Grader
Stat grad students TAs:
Zhanna Galochkina <zhanna
Miao (Maggie) Yu <miaoyu
Student learning outcomes
At the end of the course, you will be able to: (student results: R, all years, 2014, 2013, 2012)
General outcomes:
1. Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge.
2. Distinguish a testable scientific hypothesis or datasupported interpretation from an opinion.
3. Understand from a data story the goals of the study and apply the correct statistical procedure.
4. Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.
Topical outcomes:
5. Define parameters of interest and hypotheses in words and notation.
6. Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, fivenumber summary, confidence intervals, and pvalues, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
7. Distinguish between statistical significance and scientific relevance.
8. Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, apply statistical models, by recommended programming practice including abstraction and documentation.
9. Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
10. Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
11. Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
12. Make evidencebased decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
13. Discover relationships and make predictions through model development and selection.
Meeting the learning outcomes
You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.
Learning in this class occurs by:
 Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
 Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
 Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.
Assessment
Rubrics guide assessment (and selfassessment) of homework, code, projects, exams, and presentations.
Homework is due 1 week (or 2 classes, whichever is shorter) after we complete each chapter. Extra points may be given for exceptional work based on the rubrics for homework and code. For example, if you earn a 5+5+5 on the rubric, then your grade will be increased by 10% of what you earned for doing exceptional work.
Header for homework assignments for each part:
First Last
ADA1 Stat 427 (or 527)
HW ##, Part #
MM/DD/YYYY
All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixedwidth font, such as Courier).
Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution (in a nonfixedwidth font) should be provided in addition to R output.
Grading breakdown
Semiweekly homework: 75%
Midterm exam: 20% (Given a few pages of output, answer several pages of questions by interpreting the plots, tests, CIs, etc.)
Class participation: 5% (i>Clicker)
Final grade will include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.95 for graduate students and 0.90 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95], so that your grade is slightly higher than you earned.
Please hand in a physical version of your homework and projects – a TA will write comments on it and give it back to you. An electronic version will be accepted under exception circumstances (almost never).
Late assignments will be penalized 20% if handed in by 5pm the following day, and will not be accepted after that. Please slide your late assignments under my office door (SMLC 312) after writing the date/time in the upperleft hand corner of when you’re turning it in.
Semiweekly homework
Homework is designed to encourage you to review the material we’ve learned, synthesize new information from the R help pages or the web, and apply (and learn!) your new skills. Expect to spend 45 hours a week (outside of class!) to do well, and maybe double that to do outstandingly well.
Model answers
My solutions posted after each HW is due will provide model answers to have a sense of the quality and content I’m looking for.
Collaboration and citation
For homeworks (and obviously team projects) I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty.
As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me.
I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code copy the URL in a comment (which is doubly helpful for finding the resource again).
Disability statement
If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.
Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
– Confucius
Random stuff:
UNM has license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these.
R is currently available in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB ITLoboLab Pod and ITLoboLab Classroom.
R style matters. There is a lot of online help on R, such as at UCLA, tryr, and Google’s Intro to R video series. Usually try searching for “R [mytopic]” and you’ll get lots of results. ggplot2 plotting cookbook.
R reference card by Jonathan Baron.
Translate between MATLAB and R.
Figure checklist. Choosing the right chart. Nature Methods points of view on visualization.
Raster vs vector graphics.
Statistics prereq refresher from Khan Academy.
Coursera has a free 4week course on computing for data analysis with R.
Muddy points in perspective.
R+LaTeX+knitr for reproducible research. See my SC1 lecture notes (Ch01), and Mohammad Arbabshirani’s notes (pdf, rnw).
Archive
Team projects (did only in F12)
Working in teams is how science gets done. Each member of the team is responsible for every part of the project. I know team projects can be frustrating, requiring maturity, mutual consideration, and professionalism throughout, but I hope to teach some skills that should make it less painful. More details will be provided when we start the first project, but expect to produce a 510 page report detailing the analysis of a data set or one you collect from a study you design.
Each project will receive a single grade, but individual grades will be weighted by effort as judged by the entire team.
Teams will be assigned by the TAs and myself. Teams can chose to fire team members who are not performing well (after meeting with me as a team), and individuals can choose to quit if they feel they are doing all the work.
Table of selected statistical methods
The data and design determines which method you use: original or UCLA.
Here’s a table of methods with the applicable semester of ADA and Chapter.
Number of Dependent Variables 
Number of Independent Variables 
Type of Dependent Variable(s) 
Type of Independent Variable(s) 
Measure  Test(s)  ADACh 
1  0 (1 population) 
continuous normal  not applicable (none) 
mean  onesample ttest 
102 
continuous nonnormal 
median  onesample median 
106  
categorical  proportions  Chi Square goodnessoffit, binomial test 
107  
1 (2 independent populations) 
normal  2 categories  mean  2 independent sample ttest 
103  
nonnormal  medians  Mann Whitney, Wilcoxon rank sum test 
106  
categorical  proportions  Chi square test Fisher’s Exact test 
107  
0 (1 population measured twice) or 1 (2 matched populations) 
normal  not applicable/ categorical 
means  paired ttest  102  
nonnormal  medians  Wilcoxon signed ranks test 
106  
categorical  proportions  McNemar, Chisquare test 
107  
1 (3 or more populations) 
normal  categorical  means  oneway ANOVA  105  
nonnormal  medians  Kruskal Wallis  106  
categorical  proportions  Chi square test  107  
2 or more (e.g., 2way ANOVA) 
normal  categorical  means  Factorial ANOVA  205  
nonnormal  medians  Friedman test  not  
categorical  proportions  loglinear, logistic regression 
211  
0 (1 population measured 3 or more times) 
normal  not applicable  means  Repeated measures ANOVA 
not  
1  normal  continuous  correlation, simple linear regression 
108  
nonnormal  nonparametric correlation 
108  
categorical  categorical or continuous 
logistic regression  211  
continuous  discriminant analysis 
216  
2 or more  normal  continuous  multiple linear regression 
202  
nonnormal  
categorical  logistic regression  211  
normal  mixed categorical and continuous 
Analysis of Covariance, General Linear Models (regression) 
209  
nonnormal  
categorical  logistic regression  211  
2  2 or more  normal  categorical  MANOVA  215  
2 or more  2 or more  normal  continuous  multivariate multiple linear regression 
not  
2 sets of 2 or more 
0  normal  not applicable  canonical correlation  not  
2 or more  0  normal  not applicable  factor analysis  not  
0 or more  mixed categorical and continuous 
principal component analysis (w/multiple regression) 
213  
categorical  cluster analysis  213  
discriminant analysis  216  
classification  217 