Archived from Fall 2015 (Current year here.)
UNM Stat 427/527: Advanced Data Analysis I (ADA1)
Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (xtable) by writing code (R, Rstudio) to answer questions using fundamental statistical methods (all one- and two-variable methods), which you’ll be proud to present (poster).
Fall 2015 Syllabus is below table
Fall 2015 schedule; Time: TR 1530-1645; Location: CTLB 300 (building 55, northeast of Zimmerman); Stat 427.002, CRN 54725; Stat 527.002, CRN 54726
+ Peer mentors via UNM Stat 495/595: Statistics Education Practicum (SEP) Stat 495.002 or Stat 595.001, CRN 13764 or 55072 (named “Individual Study”)
9/26 – Course notes
These are now posted above the time table instead of by each day’s assigned reading.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files. The CTLB computers do not connect to your standard UNM drive space. I recommend using a very systematic folder structure, such as ADA1/HW, ADA1/Class, ADA1/Reading, ADA1/Poster, etc. Do not just work on files in your downloads folder or your desktop; respect your data and code!
Each week has this structure:
- Pre-class (Tuesday): Reading, Video, Quiz (due before class — solutions become available Tue 3:30, after the quiz is due)
- In-class: Activities in class Tuesday and Thursday due by 5pm each day, submitted to UNM Learn (evaluated by TA within 1 week).
- Post-class (Thursday): Homework (crowdgrader, due following Thursday before class)
- Post-class (Following Thursday-Tuesday): Grading (crowdgrader, following 1 week + Tuesday before class)
Course notes and R code
pdf R Ch 0 Introduction to R, Rstudio, and ggplot
pdf R Ch 1 Summarizing and Displaying Data
pdf R Ch 2 Estimation in One-Sample Problems
pdf R Ch 3 Two-Sample Inferences
pdf R Ch 4 Checking Assumptions
pdf R Ch 5 One-Way Analysis of Variance
pdf R Ch 6 Nonparametric Methods
pdf R Ch 7 Categorical Data Analysis
pdf R Ch 8 Correlation and Regression
pdf R Ch 9 Introduction to the Bootstrap
pdf R Ch 10 Power and Sample size
pdf R Ch 11 Data Cleaning
pdf R ADA2 Ch 11 Logistic Regression
lm_diag_plots.R function for a large set of standard diagnostic plots
Notes from Fall 2015 using R: ADA1_notes_F15.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F15.pdf.
(I reserve the right to continue to improve the materials throughout the semester.)
|Wk-Date||Cl||Topic||Reading, Video, Quiz||In-class Worksheet, Data||Homework||HW Submit Grading||Due before class|
|00-08/18||00||Install software, survey||read:
|01-08/18||01||Intro, data, poster||read:
PDS Chs 2-3video:
Ch 2-3, Med records,
01 Syllabus subset,
01a Medical records Rmd html
|(Intro to using RMarkdown: Rmd html)|
|01-08/20||02||Rmd, codebook, crowdgrader||video:
01 Personal codebook
|Work as a group, each submit own copy (anonymously).
submit 8/18-8/20 16:30
grading 8/20 16:30-17:00
Hey, awesome work today, everyone!
|01 Personal codebook Rmd html||01 crowdgrader
PDS Ch 2-4,video:
Lit Rev biblio & Mendeleyquiz:
02 codebook and lit review
|In-class: Rmd html
Turn in one question of variable association.
|(UNM Google Scholar)||Quiz 02|
|02-08/27||04||Literature review||In-class: Rmd html
Turn in one citation to a research question.
|02 Literature review Rmd html bib
(the assignment above is short of a research proposal Rmd html, we won’t be doing a research proposal as part of this class)
|Turn in HW 01|
|03-09/01||05||R programming, data subset and numerical summaries||read:
PDS Chs 5, 8, & 18,
Ch 00 R,
Ch 01 R,video:
Ch 00 p1,
Ch 00 p2,
03 programming, univariate
|In-class: Rmd html
Look at datasets in R, create subset of data, rename variables, numerical summaries.
|Quiz 03, Grade HW 01|
HW 03 vid
|In-class: Rmd html
Univariate plots of numerical and categorical variables.
|03 Data subset, univariate summaries and plots Rmd html
(See the link above the table “Erik’s NESARC data, nicotine and depression”.)
|Turn in HW 02|
PDS Ch 9,
Ch 00 R,
Ch 11 R,video:quiz:
|In-class: Rmd html
Complete at least one bivariate coding relationship.
|Quiz 04, Grade HW 02|
|04-09/10||08||Data cleaning||In-class: Rmd html
Edit rules, run with dataset, assess exceptions, decide what to do with them.
|04 Rmd html||04 crowdgrader
|Turn in HW 03|
|05-09/15||09||Simple linear regression, intro||read:
Ch 8.4, 8.2 Rvideo:quiz:
|In-class: Rmd html dat
Build intuition using SLR App, interpret properties of linear regression fit.
|Quiz 05, Grade HW 03|
|05-09/17||10||Logarithm transformation||(novel example)||In-class: Rmd html dat
Plot, transform, plot, and interpret.
|05 Rmd html||05 crowdgrader
|Turn in HW 04|
Ch 8.1, 8.3.1 R,
Ch 7.5.1 only sections on “conditional probability” and the following example Rvideo:quiz:
|In-class: Rmd html
Data collection (hand span and word memory), correlation, regression to the mean.
|Spurious Correlations||Quiz 06, Grade HW 04|
|06-09/24||12||Categorical contingency tables||In-class: Rmd html d1
Interpret condition proportions in two examples.
06 Rmd html
|Turn in HW 05|
Ch 2.1-2.2 Rvideo:quiz:
|In-class: Rmd html
Guess Ages and Candy weights.
|Quiz 07, Grade HW 05|
|07-10/01||14||Parameter estimation (one-sample)||In-class: Rmd html
Water on Earth.
|07 Rmd html
PDS Data Sampling Designs:
AddHealth, OOL, NESARC
|Turn in HW 06|
|08-10/06||15||Hypothesis testing (two-sample)||read: Ch 2.3-end R
|In-class: Rmd html
one- and two-sample tests using data we collected in class.
|Quiz 08, Grade HW 06|
|08-10/08||Fall Break||08 Rmd html||08 crowdgrader
|Turn in HW 07|
|09-10/13||16||Paired data, assumption assessment||read:
Ch 3.4 & 3.6,
|In-class: Rmd html
Paired data and checking model assumptions.
|Quiz 09, Grade HW 07|
|09-10/15||17||ANOVA, post-hoc comparisons||In-class: Rmd html
ANOVA, model assumptions, and paired comparisons.
|09 Rmd html||09 crowdgrader
|Turn in HW 08|
|In-class: Rmd html
NP one-sample tests and CIs, and ANOVA with pairwise comparisons.
|Quiz 10, Grade HW 08|
|10-10/22||19||Binomial and multinomial proportion tests||In-class: Rmd html dat
Multinomial: World series number of games.
|10 Rmd html||10 crowdgrader
|Turn in HW 09|
|11-10/27||20||Two-way categorical tables||read:
|In-class: Rmd html dat
|Quiz 11, Grade HW 09|
|11-10/29||21||Simple linear regression, inference||In-class: Rmd html
Regression of height vs hand span using data from our class.
|11 Rmd html||11 crowdgrader
|Turn in HW 10|
|12-11/03||22||Logistic regression, intro||read:
ADA2 Ch 11.1-3,
PDS Ch 16video:quiz:
|In-class: Rmd html
AddHealth W4 Pregnancy.
|Summary of Methods we’ve covered||Quiz 12, Grade HW 10|
|12-11/05||23||Experiments and observational studies||In-class: Rmd html
Describing a study reported in the media.
|12 Rmd html||12 crowdgrader
|Turn in HW 11|
|13-11/10||24||Statistical communication||read: PDS Ch 18
|In-class: Rmd html
Key statistical principles, ethics.With additional time, clarify which research questions you’ll present in your poster with a peer mentor. (Null results are ok!)
|Statistics is about communication, including writing and presenting.||Quiz 13, Grade HW 11|
|13-11/12||25||Poster Preparation||In-class: Rmd html
Work on designing poster content at the bottom of your HW document.
|13 Rmd html
Work on your poster content.
Try to complete your poster planning in your HW document.
|Turn in HW 12|
|14-11/17||26||Posters wrapping up||Grade HW 12|
|14-11/19||27||Show poster||Course evaluation, submit receipt as in-class assignment.||14 Rmd html
Due next Wednesday. Complete and submit your poster in template format.
|Turn in HW 13|
|15-11/24||28||Approve poster, final touches||ARI Graphix
$9 poster printingOpen Mon-Wed 7:30-5:30,
Open Mon 11/30 7:30-5:30
Do not use their website! Do:
Email email@example.com, indicate to print “in color on bond paper” and attach poster pdf file. Price is $0.75/sq ft.
|Have a peer mentor approve your poster for printing and presentation. Congratulations!||Grade HW 13
Turn in HW 14 tomorrow (Wed)
|16-12/01||29||POSTERS A||Poster sessions in SMLC||poster template
pdf, Rnw, sty, bib, logo
|16-12/03||30||POSTERS B||Poster sessions in SMLC||Prof Erhardt’s example poster
Transition from Markdown to LaTeX
Video for poster transition
|Poster rubric||Grade HW 14|
|17-12/08||Finals week||(no final)|
Description: Statistical tools for scientific research, including parametric and non-parametric methods for ANOVA and group comparisons, simple linear and multiple linear regression and basic ideas of experimental design and analysis. Emphasis placed on the use of statistical packages such as R. Course cannot be counted in the hours needed for graduate degrees in Mathematics and Statistics.
Prerequisite: Stat 145 (or other intro stats course)
Semesters offered: Fall
Lecture: Stat 427.002, CRN 54725; Stat 527.002, CRN 54726, TR 1530-1645; Location: CTLB 300 (building 55, northeast of Zimmerman) Video
Office hours: Tue/Thu 12:30-13:30, and by appointment in SMLC 312
email: “Erik B. Erhardt” <firstname.lastname@example.org>, please include “ADA1” in subject line
Textbook: Required books will be provided free by pdf on UNM Learn. Optional: Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 978-0-387-79053-4. The book is not required, but it will provide a backup for what you learn in class.
Laptops running R: I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, there are some laptops in class and teamwork is encouraged — sit next to someone friendly who likes to share.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files. The CTLB computers do not connect to your standard UNM drive space.
Teaching Assistants and Peer Mentors
Stat grad students TAs
Chauntal Andrews <email@example.com>, office hours Tue/Thu 14:00-15:00 in SMLC 301
Huan Yu <firstname.lastname@example.org>, office hours Mon 14:00-15:30 and Fri 9:00-10:30 in SMLC 302
Carrie Booth <email@example.com>, Education grad student, ADA course alumnus and Delta Alpha Pi Honor Society (Disability Achievement Pride) Member
Armida Carbajal <firstname.lastname@example.org>, Stat grad student
Andisheh Dadashi <email@example.com>, Stat grad student
Jerry Hatch <firstname.lastname@example.org>, ADA course alumnus, Stat MS student
John Pesko <email@example.com>, Stat PhD student
Ana Oaxaca <firstname.lastname@example.org>, ADA course alumnus
Juan Pablo Madrigal Cianci <email@example.com>, Applied Math grad student, ADA course alumnus
Angela Gregory <firstname.lastname@example.org>, ADA course alumnus, MS
Erin Ochoa <email@example.com>, ADA course alumnus
Student learning outcomes
- Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment.
- Distinguish a testable scientific hypothesis or data-supported interpretation from an opinion.
- Understand from a data story the goals of the study and apply the correct statistical procedure.
- Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.
- Define parameters of interest and hypotheses in words and notation.
- Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, five-number summary, confidence intervals, and p-values, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
- Distinguish between statistical significance and scientific relevance.
- Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
- Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
- Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
- Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
- Make evidence-based decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
- Discover relationships and make predictions through model development and selection.
Meeting the learning outcomes
You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.
Learning in this class occurs by:
- Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
- Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
- Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.
- Quizzes will be due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade, the lowest few are dropped.) Most weeks plan for 1-2 hours reading and video, 20 minute quiz.
- In-class assignments are due each day by the end of day (midnight), submitted to UNM Learn. Purpose: to struggle and find success in class with the concepts and skills. (About 24, includes class participation, 20% of final grade, the lowest several are dropped.) Most weeks plan to finish in class.
- Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to crowdgrader.org (75% of HW grade). Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of final grade, the lowest few are dropped.) Most weeks plan on 1-4 hours per assignment.
- Peer grading is due by the following Tuesday after each homework is due (25% of HW grade). Purpose: to gain skill assessing the work of others, as well as see alternative strategies to answer questions. Most weeks this will take about 30 minutes to grade 5 other students’s HW.
- Poster will be developed through semester (most HW assignment contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.) In the last couple weeks, assembling this poster may take 3-5 hours, using a template provided to you.
- Course surveys are due at the beginning and end of the course. Purpose: to participate in national project-based learning projects and improve course. (About 2, 4% of final grade.)
Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.95 for graduate students and 0.90 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95], so that your grade is slightly higher than you earned.
Final grade calculation: (so much rounding up!)
I’m really proud of my class; you’ve worked hard this semester in a new format. I was especially pleased with the closing poster session, a celebration of what we were able to accomplish.
- Drop lowest 4 in-class, 2 quizzes, and 2 homework assignments (your worst two weeks).
- Take weighted average as discussed above.
- Divide ugrad grade by 0.95 and grad grade by 0.98 (a little extra boost for ugrads).
- Round this number up to the nearest integer (93.1 becomes a 94).
- Assign letter grades with this these cutoffs (get’s lenient below a B-)
All homework assignments in this class are electronic, submitted to crowdgrader.com for grading, except for the final poster.
- Students usually get far more feedback on their work than they would get from over-worked teaching assistants/faculty.
- Students get to see what other students are doing, and they can learn from the work of others (taking the best ideas, and leaving the rest).
- In exchange for this, they need to put in some amount of work in reviewing the work of others.
- It is important that students understand that their final grade is determined both by the quality of their work, and by the precision of the grades they give, and the helpfulness of the reviews they write.
Late assignments will not be accepted.
Rubrics guide assessment (and self-assessment) of homework, code, projects, exams, and presentations. Each assignment will have its own specific rubric.
All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier).
Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution (in a non-fixed-width font) should be provided in addition to R output.
Collaboration and citation
For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).
As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me.
I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).
If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.
Peer mentor Carrie Booth <firstname.lastname@example.org>, Education grad student, course alumnus, and member of the Delta Alpha Pi Honor Society (Disability Achievement Pride) has a background in special education and is familiar with challenges surrounding learning and accessibility for students with disabilities in college. She has offered to be available, if you choose to seek her out, for assistance. I legally can’t connect her to you, but you can let her know if you have needs she’ll be particularly qualified to be helpful with. I’m glad to have her available since one of her goals through DAP is to reduce stigma of students with disabilities in academia through visibility, achievement, and community service.
We’re doing this because:
- We want you to be empowered with statistics.
- We believe everyone should get out of this course with awesome skills
- Real-time feedback promotes efficient learning
“It encourages me to engage actively with the course material and take responsibility for my learning.”
Our six recommendations include the following:
- Emphasize statistical literacy and develop statistical thinking
- Use real data
- Stress conceptual understanding, rather than mere knowledge of procedures
- Foster active learning in the classroom
- Use technology for developing conceptual understanding and analyzing data
- Use assessments to improve and evaluate student learning
Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
R is currently available in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB IT-LoboLab Pod and IT-LoboLab Classroom.
R style matters. There is a lot of online help on R, such as at UCLA, try-r, and Google’s Intro to R video series. Usually try searching for “R [mytopic]” and you’ll get lots of results. ggplot2 plotting cookbook.
R reference card by Jonathan Baron.
Translate between MATLAB and R.
Statistical consulting and collaboration slides
Raster vs vector graphics.
Statistics pre-req refresher from Khan Academy.
Coursera has a free 4-week course on computing for data analysis with R.
Muddy points in perspective.
Asking smart questions
“Smart Questions” guide (note “hackers build things, crackers break them”)
Email Question Rubric:
* Send one email per question.
— Use “Reply” to continue conversation on a question; send a new email for a new question.
* Include “ADA1” as the first word of the subject line in new emails (if replying, just use reply).
* Begin email with a short question summary.
* When possible, include commented code in email body
— Comments should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce problem.
— Code should include a “Minimum representative test cast” (http://www.catb.org/esr/faqs/
* If attaching code, please include all the files necessary to run your code (data, etc.).
LaTeX wiki, lshort, Detexify LaTeX symbols (linux texlive package management)
R tutorials: TryR (gentle), Kelly Black
R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results.
Knitr in Rstudio (knitr is modern version of Sweave intro, demo, guide)
xtable to produce LaTeX tabular environment from R data.frames
Cookbook for R for helpful examples, visualization tutorials, diagrams
Image formats: vector (pdf, eps) vs raster (jpeg, bmp, tiff, gif)
Why stats now?
Before first day:
Step 0: Instructions for “Pre-course Software Install and Survey”: google account, crowdgrader, R+Rstudio, Mendeley, LaTeX, and a pre-course survey. All are required.
Did you receive a registration error for Fall 2015? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the pre-reqs)?
If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.
Table of selected statistical methods
Here’s a table of methods with the applicable semester of ADA and Chapter.
|continuous normal||not applicable
|normal||2 categories||mean||2 independent
rank sum test
|categorical||proportions||Chi square test
Fisher’s Exact test
signed ranks test
(3 or more
|categorical||proportions||Chi square test||1-07|
|2 or more
(e.g., 2-way ANOVA)
3 or more
|normal||not applicable||means||Repeated measures
|2 or more||normal||continuous||multiple linear
|Analysis of Covariance,
General Linear Models
|2||2 or more||normal||categorical||MANOVA||2-15|
|2 or more||2 or more||normal||continuous||multivariate
|2 sets of
2 or more
|0||normal||not applicable||canonical correlation||not|
|2 or more||0||normal||not applicable||factor analysis||not|
|0 or more||mixed categorical
Citing and using notes
Notes from Fall 2014 using R: ADA1_notes_F14.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F14.pdf.
Notes from Fall 2013 using R: ADA1_notes_F13.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F13.pdf.
Notes from Fall 2012 using R: ADA1_notes_F12.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F12.pdf.
Notes from Fall 2011 using Minitab: ADA1_notes_F11.pdf includes all chapters in one document.
Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at http://statacumen.com/teach/ADA1/ADA1_notes_F11.pdf.