ADA1 F19

UNM Stat 427/527: Advanced Data Analysis I (ADA1)

Fall 2019 Syllabus is below tables Fall 2019 schedule; Time: TR 1530-1645; Location: CTLB 300 (building 55, northeast of Zimmerman); Stat 427.001, CRN 59508; Stat 527.001, CRN 59509

Goal

This Is Statistics

Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (all one- and two-variable methods), which you’ll be proud to present (poster).


Course content

Weekly structure

(also see “Assessment” below)
  1. Pre-class (Tuesday): Reading, Video, Quiz due before class Tue 3:30 PM — solutions become available after the quiz is due.
  2. In-class (Tuesday and Thursday): Activities in class due by 5 PM the following day, submitted to UNM Learn (evaluated by TA within 1 week).
  3. Post-class (Thursday): Homework due the following Thursday by 3:30 PM, submitted to UNM Learn (evaluated by TA within 1 week).
UNM Learn for quizzes and in-class assignments. YouTube Video playlist (try 1.5 speed, then pause as needed).

Course notes, code, data, and video lectures

Second text: PDS Textbook Notes from Fall 2019: ADA1_notes_F19.pdf includes all chapters in one document. Citing lecture notes example: Erhardt EB, Bedrick EJ, and Schrader RM. (2019) Lecture notes for Advanced Data Analysis 1. Retrieved Sep 1, 2019, from statacumen.com/teach/ADA1/notes/ADA1_notes.pdf, 136–144. Creative Commons License Lecture notes for Advanced Data Analysis 1 (ADA1) Stat 427/527 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/ADA1/notes/ADA1_notes_F19.pdf.
Ch Chapter Title Notes R code Datasets Video lectures playlist Helper videos
00 Introduction to R, Rstudio, and ggplot pdf R 00-1 00-2 markdown, 01 PDS codebook, 01 HW codebook, 02 HW Lit review
01 Summarizing and Displaying Data pdf R 01-1 03 HW 03 subset
02 Estimation in One-Sample Problems pdf R 02-1 02-2 02-3
03 Two-Sample Inferences pdf R 03-1 03-2 03-3
04 Checking Assumptions pdf R 04-1
05 One-Way Analysis of Variance pdf R CHDS dat desc 05-1
06 Nonparametric Methods pdf R 06-1 one-sample, 06-2 paired, 06-3 two-sample, 06-4 ANOVA, 06-5 perm test.
07 Categorical Data Analysis pdf R 07-1 intro, 07-2 single prop, 07-3 GOF-test, 07-4 two prop & cond prob, …
08 Correlation and Regression pdf R BodyMass dat desc pdf 08-1 corr/log, 08-2 corr hyp test, 08-3 LS reg eq, 08-4 08-5
09 Introduction to the Bootstrap pdf R 09-1
10 Power and Sample size pdf R 10-1
11 Data Cleaning pdf R 11-1 14 HW to poster
12 ADA2 Ch 11 Logistic Regression pdf R 12-1 12-2 12-3 12-4 Upgrading R on Windows
lm_diag_plots.R function for a large set of standard diagnostic plots.

Passion-Driven Statistics (PDS) data

I encourage you to use one of the AddHealth datasets.  Use W1 if you want to understand adolescents when they were young and W4 if you want to understand adult relationships.  NESARC is also interesting for alcohol abuse, depression, and related conditions. It can be difficult to find good numeric variables.  I will include a few under each dataset.
  • AddHealthW1 Sampling Design, Codebook, RData. Adolescents when they were young.
    • Unique ID “AID”.
    • A few numeric variables: age (in data, not in codebook), …
    • Other potential variables (run this code): names(AddHealth)[lapply(AddHealth, class) %in% c(“numeric”, “integer”)]
  • AddHealthW4 Sampling Design, Codebook, RData. Same adolescents when they were older, move life events.
    • Unique ID “aid”.
    • A few numeric variables: agew1 (not in codebook, age in Wave 1 — for age in Wave 4, see the AddHealth4 age example “AH14$age_years”), H4EC1, H4EC2, …
    • Other potential variables (run this code): names(addhealth_public4)[lapply(addhealth_public4, class) %in% c(“numeric”, “integer”)]
  • NESARC Sampling Design, Codebook, RData. Alcohol abuse and related conditions.
    • Unique ID “IDNUM”.
    • A few numeric variables: AGE (in data, not in codebook), …
    • Other potential variables (run this code): names(NESARC)[lapply(NESARC, class) %in% c(“numeric”, “integer”)]
  • OutlookOnLife  Sampling Design, Codebook, RData. Interesting data, but not enough continuous variables to use, unique ID “CASEID”.
  • GapMinder  Sampling Design, Codebook, RData. Country data, but it’s complicated to average large and small countries, unique ID “country”.

SEV LTER data

Sevilleta (SEV) Long Term Ecological Research (LTER) Program Arthropod and Small Mammal Description and Codebook Rmd html, data.zip
Erik’s example homework document: NESARC data, nicotine and depression. Use these files as a model for your assignments: .Rmd + .bib = .html.

Timetable

Wk-Date Cl Topic Reading, Video, Quiz In-class Worksheet, Data Homework
00-08/20 00 Install software, survey Step 0 (above) pre-survey required for classroom assessment (8/7 – 9/6/2019)
01-08/20 01 Intro, data, poster read: PDS Chs 2-3;  video: Rmd, Ch 2-3, Med records CTLB video, Active Learning, 01 Syllabus subset, 01a Medical records Rmd html Turn in assignment in Thursday’s class to learn how UNM Learn works. (Intro to using RMarkdown: Rmd html)
01-08/22 02 Rmd, codebook video: 01 Personal codebook In-class: yesterday’s 01a submit by 16:00 Work as a group, each submit own copy. HW: 01 Personal codebook Rmd html Choose from PDS datasets
02-08/27 03 Research questions read: PDS Ch 2-4;  video: Lit Rev biblio & Mendeley;  quiz: 02 codebook and lit review In-class: Rmd html Turn in one question of variable association. (UNM Google Scholar) (For experienced PubMed users, use TeXMed to get the bibtex format)
02-08/29 04 Citations and Literature review In-class: Rmd html bib Turn in one citation to a research question. HW: 02 Literature review Rmd html bib (While we won’t be doing a research proposal as part of this class, if we were covering more on research methods, then we might continue with a short research proposal (Rmd html).)
03-09/03 05 R programming, data subset and numerical summaries read: PDS Chs 5, 8, & 18, Ch 00 R, Ch 01 R;  video: Ch 00 p1, Ch 00 p2, Ch 01;  quiz: 03 programming, univariate In-class: Rmd html Look at datasets in R, create subset of data, rename variables, numerical summaries. ADA1 ALL Outline file Rmd html All of your assignments will be written in this file.
03-09/05 06 Plotting univariate video: HW 03 vid In-class: Rmd html Univariate plots of numerical and categorical variables. HW: 03 Data subset, univariate summaries and plots Rmd html (See the link above the table “Erik’s NESARC data, nicotine and depression”.)
04-09/10 07 Plotting bivariate, numeric response read: PDS Ch 9, Ch 00 R; quiz: quiz In-class: Rmd html Complete at least one bivariate coding relationship.
04-09/12 08 Plotting bivariate, categorical response In-class: Rmd html Complete at least one bivariate coding relationship. HW: 04 Rmd html
05-09/17 09 Simple linear regression, intro read: Ch 8.4, 8.2 R;  video: 08-1 corr/log, 08-3 LS reg eq;  quiz: quiz In-class: Rmd html dat Build intuition using SLR App, interpret properties of linear regression fit.
05-09/19 10 Logarithm transformation (novel example) In-class: Rmd html dat Plot, transform, plot, and interpret. HW: 05 Rmd html
06-09/24 11 Correlation read: Ch 8.1, 8.3.1 R, Ch 7.5.1 only sections on “conditional probability” and the following example R;  video: 08-1 corr/log, 08-2 corr hyp test, 07-4 two prop & cond prob;  quiz: quiz In-class: Rmd html Data collection (hand span and word memory), correlation, regression to the mean. Spurious Correlations
06-09/26 12 Categorical contingency tables quiz 06b, Guess Ages (for next in-class) In-class: Rmd html d1 Interpret condition proportions in two examples. Simpson’s Paradox HW: 06 Rmd html
07-10/01 13 Inference, intro read: Ch 2.1-2.2 R;  video: see table above;  quiz: quiz In-class: Rmd html Guess Ages, Legos. (Legos part 2 Rmd html dat, diagram). BBC Radio 4: More or Less, “sampling” 9 min audio
07-10/03 14 Parameter estimation (one-sample) In-class: Rmd html Water on Earth. HW: 07 Rmd html PDS Data Sampling Designs: AddHealth, OOL, NESARC
08-10/08 15 Hypothesis testing (two-sample) read: Ch 2.3-end R Ch 3 R;  video: see table above;  quiz: quiz In-class: Rmd html one- and two-sample tests using data we collected in class.
08-10/10 Fall Break HW: 08 Rmd html
09-10/15 16 Paired data, assumption assessment read: Ch 2.2.1, Ch 3.4 & 3.6, Ch 4, Ch 5;  video: see table above;  quiz: quiz In-class: Rmd html Paired data and checking model assumptions.
09-10/17 17 ANOVA, post-hoc comparisons In-class: Rmd html ANOVA, model assumptions, and paired comparisons. HW: 09 Rmd html
10-10/22 18 Nonparametric methods read: Ch 6, Ch 7.2-7.4, Ch 10;  video: see table above;  quiz: quiz In-class: Rmd html NP one-sample tests and CIs, and ANOVA with pairwise comparisons.
10-10/24 19 Binomial and multinomial proportion tests In-class: Rmd html dat Multinomial: World series number of games. HW: 10 Rmd html
11-10/29 20 Two-way categorical tables read: Ch 7.8-end, Ch 8.5-8.7;  video:;  quiz: quiz In-class: Rmd html dat Popular kids.
11-10/31 21 Simple linear regression, inference In-class: Rmd html Regression of height vs hand span using data from our class. HW: 11 Rmd html
12-11/05 22 Logistic regression, intro read: ADA2 Ch 11.1-3, 11.6, PDS Ch 16;  video:;  quiz: quiz In-class: Rmd html AddHealth W4 Pregnancy. Summary of Methods we’ve covered
12-11/07 23 Experiments and observational studies In-class: Rmd html Describing a study reported in the media. HW: 12 Rmd html
13-11/12 24 Statistical communication read: PDS Ch 18;  video:;  quiz: no quiz In-class: Rmd html Key statistical principles, ethics.With additional time, clarify which research questions you’ll present in your poster with a peer mentor. (Null results are ok!) Statistics is about communication, including writing and presenting.
13-11/14 25 Poster Preparation In-class: Rmd html Work on designing poster content at the bottom of your HW document. HW: 13 Rmd html Work on your poster content. Try to complete your poster planning in your HW document.
14-11/19 26 Posters wrapping up poster template pdf,  Rnw, sty, bib, logo Prof Erhardt’s example poster pdf,  Rnw
14-11/21 27 Show poster In-class: Course evaluations, submit receipt (capture screen image) as in-class assignment.
  1. Everyone EvalKit
    1. submit receipt
  2. Post-survey Classroom
    1. submit receipt
  3. PDS Wesleyan U qualtrics survey (email)
    1. no receipt required
See the 12/27 email describing these surveys in more detail.
HW: 14 Rmd html Due next Wednesday 12/7. Complete and submit your poster in LaTeX pdf format. Transition from Markdown to LaTeX Video for poster transition
15-11/26 28 Approve poster, final touches Note: The poster needs to be printed on Wed before the holiday or Friday/Monday after (closed on the weekend).  Try to finish early to reduce the burden on the poster printer company. $10 poster printing Minuteman Press, Eubank 1631 Eubank Boulevard NE, Suite D, Albuquerque, NM 87112 (505)881-0164 Open Mon-Fri 8a-5p, closed Thanksgiving, open Fri 11/29 10a-2p Submit poster to website Project name: “UNM ADA1 class poster” Due Date: 12/02/19 (at latest, try to finish a little early so you can print before holiday) Additional Details: “3’x4′ portrait poster on bond paper” File #1: Name the poster pdf with your name in the filename, such as “FirstLast_ADA1_poster.pdf”. Arrange to pick up the poster. Have a peer mentor approve your poster for printing and presentation. Congratulations!
15-11/28 Thanksgiving break
16-12/03 29 POSTERS Poster sessions in SMLC Atrium Poster Schedule (be on time): 3:30-3:40 Organization 3:40-4:20 Group 1 Grad 4:25-5:05 Group 2 Grad Everyone is expected at both poster presentations.
16-12/05 30 POSTERS Poster sessions in SMLC Atrium Poster Schedule (be on time): 3:30-3:40 Organization 3:40-4:20 Group 1 UGrad 4:25-5:05 Group 2 UGrad Students not presenting will be evaluating other poster.
17-12/08 Finals week (no final) Congratulations on a great semester!
(I reserve the right to continue to modify the schedule and improve the materials throughout the semester.)

Syllabus

Description: Statistical tools for scientific research, including parametric and non-parametric methods for ANOVA and group comparisons, simple linear and multiple linear regression and basic ideas of experimental design and analysis. Emphasis placed on the use of statistical packages such as R. Course cannot be counted in the hours needed for graduate degrees in Mathematics and Statistics. Prerequisite: Math 1350 [Stat 145] (or other intro stats course) Semesters offered: Fall Lecture: Stat 427.001, CRN 59508; Stat 527.001, CRN 59509; TR 1530-1645; Location: CTLB 300 (building 55, northeast of Zimmerman) Video Laptops running R: I encourage you to bring a laptop to class each day so you can work on the exercises in class. If you don’t have one, no problem, there are laptops in class and teamwork is encouraged — sit next to someone friendly and discuss your work. Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files.  The CTLB computers do not connect to your standard UNM drive space (as of 2016, this may not still be an issue).

Instructors

Please include “ADA1” in the subject line of all emails.

Professor

Erik Erhardt <erike@stat.unm.edu>, he/him, SMLC 312

Teaching Assistants

Kelli Kasper <kkasper@unm.edu>, she/her, SMLC 306 Leah Puglisi <lhpuglisi@unm.edu>, she/her, SMLC 319 Ola Anifowoshe <oanifowoshe@unm.edu>, he/him, SMLC 208

Additional Assistants, Peer Mentors, SEP

Grace Mayer, she/her

Office hours

Mon: 11:00-14:00 Kelli, 14:00-16:00 Ola Tue: 13:00-15:00 Erik Wed: 14:00-16:00 Leah Thu: 13:00-15:00 Erik Fri: 11:00-12:00 Erik
  • We are also all available by appointment by email if these many hours do not work for you.
  • Leah’s tutoring table hours are for another course, so we should let Leah give priority to students from the other course.  Thanks for understanding.

Student learning outcomes

At the end of the course, you will be able to: (student results: R, all years20152014, 20132012) General outcomes:
  1. Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment.
  2. Distinguish a testable scientific hypothesis or data-supported interpretation from an opinion.
  3. Understand from a data story the goals of the study and apply the correct statistical procedure.
  4. Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.
Topical outcomes:
  1. Define parameters of interest and hypotheses in words and notation.
  2. Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, five-number summary, confidence intervals, and p-values, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
  3. Distinguish between statistical significance and scientific relevance.
  4. Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
  5. Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
  6. Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
  7. Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
  8. Make evidence-based decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
  9. Discover relationships and make predictions through model development and selection.

Meeting the learning outcomes

You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories. Learning in this class occurs by:
  1. Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
  2. Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
  3. Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.

Assessment

  • Quizzes will be due each Tuesday before class.  Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (About 12, 20% of final grade, the lowest few are dropped.)  Most weeks plan for 1-2 hours reading and video, 20 minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
    • Viewing quiz solutions after the due date in UNM Learn is not intuitive.  Click on the “Begin” button (this is the non-intuitive part, since you are not actually beginning the quiz), then click “View All Attempts” to see the scores.  Finally, click “Calculated Grade” to see the feedback for each question of the quiz.
  • In-class assignments are due by 5pm the next day, submitted to UNM Learn.  Purpose: to struggle and find success in class with the concepts and skills. (About 24, includes class participation, 30% of final grade, the lowest several are dropped.) Most weeks plan to finish in class.
  • Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to UNM Learn. Purpose: to apply concepts and skills to your class poster project. (About 12, 30% of final grade, the lowest few are dropped.) Most weeks plan on 1-4 hours per assignment.
  • Poster will be developed through semester (most HW assignment contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.)  In the last couple weeks, assembling this poster may take 5-10 hours, using a template provided to you.
  • Course surveys are due at the beginning and end of the course. Purpose: to participate in national project-based learning projects and improve course.  (About 2, 4% of final grade.)
Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned divided by the total possible points times 0.95 for graduate students and 0.90 for undergraduate students. That is [Final Grade] = [Points Earned]/[Points possible * 0.95], so that your grade is slightly higher than you earned. All assignments in this class are electronic, submitted to UNM Learn. Late assignments will not be accepted. Rubrics guide assessment (and self-assessment) of homework, code, projects, exams, and presentations.  Each assignment will have its own specific rubric. Use of R and RMarkdown are required for the course.  This will include all of the R code for the assignment with the part of the problem it addresses in a fixed-width font and syntax highlighting. You will weave your code with prose narrations of your work and solutions.

Collaboration and citation

For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to submit substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.). As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the me or the TAs. I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).  You won’t be the first person to do anything in this class, so give credit where it’s due.

Statements

Accommodation Statement

In accordance with University Policy 2310 and the Americans with Disabilities Act (ADA), academic accommodations may be made for any student who notifies the instructor of the need for an accommodation. It is imperative that you take the initiative to bring such needs to the instructor’s attention, as he/she are not legally permitted to inquire. Students who may require assistance in emergency evacuations should contact the instructor as to the most appropriate procedures to follow. Contact Accessibility Resource Center at 277-3506 for additional information.

Title IX statement

In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15).   This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity. For more information on the campus policy regarding sexual misconduct.

UNM Indigenous Peoples Land and Territory Acknowledgment

I would like to acknowledge the original peoples of this land.  The Sandia Pueblo (other pueblo communities) and the Navajo nation have ties and stories on this land and within the broader community that are connected within New Mexico.  I am grateful to be able to work here in relationship and strengthen community on this territory.

Our Classroom

We’re doing this because:
  • We want you to be empowered with statistics.
  • We believe everyone should get out of this course with awesome skills
  • Real-time feedback promotes efficient learning
“It encourages me to engage actively with the course material and to take responsibility for my learning.”

GAISE Connections

Our six recommendations include the following:
  1. Emphasize statistical literacy and develop statistical thinking
  2. Use real data
  3. Stress conceptual understanding, rather than mere knowledge of procedures
  4. Foster active learning in the classroom
  5. Use technology for developing conceptual understanding and analyzing data
  6. Use assessments to improve and evaluate student learning

Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius

Archive

Course introduction materials

Pre-course to-dos

Did you receive a registration error for Fall 2019? Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-requisites)? If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.

Step 0

Before our first class (Tue 8/20) please read through the following actions and install the required software on your computer and complete the brief survey. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open.  Video for this process.
  1. Complete surveys
    1. a short Opinio pre-survey required for classroom assessment (8/7 – 9/6/2019).
    2. Respondus survey about passion driven statistics (PDS) course content, by email (8/20 – 8/27/2019).
  2. Install R (windows or mac) or upgrade , then Rstudio. Videos that may be helpful:
  3. Install R packages, also update all packages within RStudio.
  4. Set up your computer
    1. RStudio disable notebook
    2. Operating system to be more friendly to programming.
  5. (Postpone until later: Install LaTeX (for poster at end of the semester).)

Problems installing PDS package?  Solution. If you had problems installing the PDS package, no problem; here’s how to get the data: 1. Download the “.RData” file above for your dataset. 2. Where I have “library(PDS)” in my code, change it to the two lines below.  You’ll need to update the “PATH_TO_FILE” below to the path on your computer’s hard drive, and “filename” needs to be changed to the name of the file. This will directly read the data file.

# library(PDS)
setwd("/PATH_TO_FILE")
load("filename.RData")
Joining AddHealth waves 1 and 2 together into a single dataset can be done if you want to use variables from when the participants were both adolescents and adults. See Erik’s example project for the code.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files.  The CTLB computers do not connect to your standard UNM drive space. I recommend using a very systematic folder structure, such as ADA1/HW, ADA1/Class, ADA1/Reading, ADA1/Poster, etc.  Do not just work on files in your downloads folder or your desktop; respect your data and code!
Unicode compile problems:  If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”.  ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break.  You get unwanted Unicode when you copy/paste from a pdf or some other source into your code.  To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent.  To do this: Ctrl-F to find, search for “[^\x00-\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one.  As it finds instances, replace the characters manually until there are no more.  These characters will typically be curly quotes or fancy dashes.

Random stuff

UNM has a license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these. R is currently available (2016) in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB IT-LoboLab Pod and IT-LoboLab Classroom. R style matters. There is a lot of online help on R, such as at UCLA, try-r, and Google’s Intro to R video series. Usually, try searching for “R [mytopic]” and you’ll get lots of results.  ggplot2 plotting cookbook. R reference card by Jonathan Baron. Translate between MATLAB and R. Figure checklist.  Choosing the right chart.  Nature Methods points of view on visualization. Statistical consulting and collaboration slides Raster vs vector graphics. Statistics pre-req refresher from Khan Academy. Coursera has a free 4-week course on computing for data analysis with R. Muddy points in perspective. R+LaTeX+knitr for reproducible research.  See my SC1 lecture notes (Ch01), and Mohammad Arbabshirani’s notes (pdf, rnw). Asking smart questionsSmart Questions” guide (note “hackers build things, crackers break them”) Email Question Rubric: * Send one email per question. — Use “Reply” to continue conversation on a question; send a new email for a new question. *  Include “ADA1” as the first word of the subject line in new emails (if replying, just use reply). *  Begin email with a short question summary. *  When possible, include commented code in email body — Comments should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce problem. — Code should include a “Minimum representative test cast” (http://www.catb.org/esr/faqs/smart-questions.html#code) *  If attaching code, please include all the files necessary to run your code (data, etc.). Help: LaTeX wiki, lshort, Detexify LaTeX symbols (linux texlive package management) R tutorials: TryR (gentle), Kelly Black R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results. Knitr in Rstudio (knitr is modern version of Sweave introdemoguide) xtable to produce LaTeX tabular environment from R data.frames Cookbook for R for helpful examples, visualization tutorials, diagrams Image formats: vector (pdf, eps) vs raster (jpeg, bmp, tiff, gif)

Why stats now?

140,000 analysts needed. Important enough to have a US Chief Data Scientist (1) (2).

Table of selected statistical methods

The data and design determine which method you use: original or UCLA. Here’s a table of methods with the applicable semester of ADA and Chapter.
Number of Dependent Variables Number of Independent Variables Type of Dependent Variable(s) Type of Independent Variable(s) Measure Test(s) ADA-Ch
1 0 (1 population) continuous normal not applicable (none) mean one-sample t-test 1-02
continuous non-normal median one-sample median 1-06
categorical proportions Chi Square goodness-of-fit, binomial test 1-07
1 (2 independent populations) normal 2 categories mean 2 independent sample t-test 1-03
non-normal medians Mann Whitney, Wilcoxon rank sum test 1-06
categorical proportions Chi square test Fisher’s Exact test 1-07
0 (1 population measured twice) or 1 (2 matched populations) normal not applicable/ categorical means paired t-test 1-02
non-normal medians Wilcoxon signed ranks test 1-06
categorical proportions McNemar, Chi-square test 1-07
1 (3 or more populations) normal categorical means one-way ANOVA 1-05
non-normal medians Kruskal Wallis 1-06
categorical proportions Chi square test 1-07
2 or more (e.g., 2-way ANOVA) normal categorical means Factorial ANOVA 2-05
non-normal medians Friedman test not
categorical proportions log-linear, logistic regression 2-11
0 (1 population measured 3 or more times) normal not applicable means Repeated measures ANOVA not
1 normal continuous correlation, simple linear regression 1-08
non-normal non-parametric correlation 1-08
categorical categorical or continuous logistic regression 2-11
continuous discriminant analysis 2-16
2 or more normal continuous multiple linear regression 2-02
non-normal
categorical logistic regression 2-11
normal mixed categorical and continuous Analysis of Covariance, General Linear Models (regression) 2-09
non-normal not
categorical logistic regression 2-11
2 2 or more normal categorical MANOVA 2-15
2 or more 2 or more normal continuous multivariate multiple linear regression not
2 sets of 2 or more 0 normal not applicable canonical correlation not
2 or more 0 normal not applicable factor analysis not
0 or more mixed categorical and continuous principal component analysis (w/multiple regression) 2-13
categorical cluster analysis 2-13
discriminant analysis 2-16
classification 2-17

Acumen in Statistics