UNM Stat 145 special: Statistics for Research (S4R)
Syllabus is below timetable.
Our goal is to increase the number and diversity of students exposed to meaningful and empowering data analysis experiences and to inspire the pursuit of advanced data-driven experiences and opportunities for everyone! Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (all one- and two-variable methods), which you’ll be proud to present (poster).
Information about the coming week will appear here if necessary; usually there won’t be any.
Did you receive a registration error for Spring 2019? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the pre-reqs)?
If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.
Step 0: Before our first class (Tue 1/15) please read through the following actions and install the required software on your computer and complete the brief surveys. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open.
- Install R and RStudio:
- R for programming
- Rstudio Desktop for better R experience
- Installers at bottom, choose Windows or Mac OSX.
- Videos that may be helpful for installation:
- Install R packages (copy/paste CODE into console and press [Enter]; this may take 20-30 minutes), also update all packages within RStudio.
- Install Zotero or Mendeley (recommended for your own laptop) for bibliography management.
Course book and videos
Passion Driven Statistics (PDS) data
I encourage you to use one of the AddHealth datasets or NESARC. Use AddHealth W1 if you want to understand adolescents when they were young and AddHealth W4 if you want to understand adult relationships. NESARC is also interesting for substance abuse issues.
Data available in the PDS package with the command: library(PDS).
- AddHealthW1 Sampling Design, Codebook, RData. Adolescents when they were young, unique ID “AID”.
- AddHealthW4 Sampling Design, Codebook, RData. Same adolescents when they were older, unique ID “aid”.
- NESARC Sampling Design, Codebook, RData. Alcohol abuse and related conditions, unique ID “IDNUM”.
OutlookOnLife Sampling Design, Codebook, RData.Interesting data, but not enough quantitative variables to use, unique ID “CASEID”. GapMinder Sampling Design, Codebook, RData.Country data, but it’s complicated to interpret the average of large and small countries, unique ID “country”.
- Additional data sources
Weekly structure (also see Assessment below)
- Pre-class (Tuesday): Reading, Video, Quiz (due before class — two attempts, higher score used, and solutions become available Tue 3:30pm after the quiz is due)
- In-class: Activities in class Tuesday and Thursday. Tuesday’s assignment is due by Thursday 3:30pm of the same week, submitted to UNM Learn (evaluated by TA within 1 week). Often finished in class.
- Post-class (Thursday): Thursday’s assignment will be left to complete as Homework (due following Thursday by 3:30pm). Occasionally, finished in class, usually not.
- UNM Learn for quizzes and submitting in-class and homework assignments.
- Erik: M/T 13:00-14:00, and by appointment in SMLC 312
- Kelli: M 11-12, W 10-12 in SMLC 306
- Leah: M/W 3-4 in SMLC 319
|Wk-Date||Cl||Topic||Reading, Video, Quiz
|In-class Worksheet, Homework, Data|
|00-01/15||00||Install software, survey||Step 0 – software install
Complete the Learning Studio Opinio pre-semester survey required for classroom assessment.
Dierker Pre-survey (sent by email end of first week)
|01-01/15||01||Intro, RStudio and RMarkdown, poster|
01 Personal codebook Rmd html
Choose from PDS datasets
Ch 04 ZoteroRMarkdown\References.Rmd
Ch 04 ZoteroRMDexample\KCV.Rmd and KCV.bib
In-class: Rmd html
Turn in one citation to a research question.02 Literature review Rmd html bib (While we won’t be doing a research proposal as part of this class, if we were covering more on research methods, then we might continue with a short research proposal (Rmd html).)
||ADA1_WS_03_ResearchQuestions.Rmd Rmd html
Turn in one question of variable association.
(UNM Google Scholar)In-class: Rmd html
Look at datasets in R, create subset of data, rename variables, numerical summaries.
Ch 04 LiteratureReview\LiteratureReview.Rmd and LiteratureReview.bib
Ch 04 LiteratureReview\AnotherExample\AnotherExample.Rmd and references MultipleReferences\*.bib Rmd html
Turn in one citation to a research question.02 Literature review Rmd html bibIn-class: Rmd html
Univariate plots of numerical and categorical variables.03 Data subset, univariate summaries and plots Rmd html
(See the link above the table “Erik’s NESARC data, nicotine and depression”.)
|04-02/05||07||Writing About Empirical Research||Read:
PDS Ch 5 Writing About Empirical ResearchVideo:
PDS Quiz 05 Writing About Empirical ResearchOptional Read:
Rep-Res-Book Ch 04
Rep-Res-Book Ch 05read:
PDS Ch 9,
Ch 00 R,
Ch 11 R; video: 11-1; quiz:
|Research Plan Rmd html.
Ch 05 Assignment ResearchProposal\ResearchProposal.Rmd and ResearchProposal.bib
Ch 05 Assignment ResearchProposal\GradingRubric.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlIn-class: Rmd html
Complete at least one bivariate coding relationship.
|05-02/12||09||Working With Data, Data Management||Read:
PDS Ch 6 Working with Data
PDS Ch 7 Data ManagementVideo:
PDS Video: 04. Working with Data
PDS Video: 05. Data ManagementQuiz:
PDS Quiz 06 Working With Data
PDS Quiz 07 Data ManagementOptional Read:
Rep-Res-Book Ch 06read:
Ch 8.4, 8.2 R; video: 08-1 corr/log, 08-3 LS reg eq; quiz:
|For 2 weeks:
Ch 07: DataManagement
Addhealth example DataManagementAssignment\DataManagementExample.Rmd
Addhealth example, a little more DataManagementAssignment\DataManagementStuff.Rmd
NESARC all parts DataManagementAssignment\DataManagementTemplate.Rmd
NESARC some parts, different from Template DataManagementAssignment\NESARCcommands.Rmd
ADA1 DataManagementStuff_AddHealth_Gapminder.RmdThe Data Management Assignment is Challenging and will require many hours of your time—so please start now — not next week when it is due.
Directions: Select three secondary categorical variables from the data set you are using. Recode your data to create factors with appropriate labels. Create a barplot for each of your factors, and write a sentence or two explaining each barplot. Show all R code used to manage your data and used to create your barplots with R code chunks in your *.Rmd file. Submit the resulting *.html file.
Look at datasets in R, create subset of data, rename variables, numerical summaries.In-class: Rmd html dat
Build intuition using SLR App, interpret properties of linear regression fit.
|05-02/14||10||Subsetting data and R Programming||video:
ADA1 Data subsetting
|03 Data subset, univariate summaries and plots Rmd html
(See the link above the table “Erik’s NESARC data, nicotine and depression”.)In-class: Rmd html dat
Plot, transform, plot, and interpret.05 Rmd html
Rep-Res-Book Ch 07read:
Ch 8.1, 8.3.1 R,
Ch 7.5.1 only sections on “conditional probability” and the following example R; video: 08-1 corr/log, 08-2 corr hyp test, 07-4 two prop & cond prob; quiz:
quiz 06b, Guess Ages (for next in-class)
PDS Ch 8 Graphing: One Variable at a TimeVideo:
PDS Video: 06. Graphing: One Variable at a TimeQuiz:
PDS Quiz 08a Frequency Tables
PDS Quiz 08b Graphing VariablesOptional Read:
Rep-Res-Book Ch 08
Ch 08 Categorical Graphs\CategoricalGraphs.Rmd
Ch 08 Categorical Graphs\Graphs.Rmd
Ch 08 Categorical Graphs\SH.Rmd
Ch 8.3 Mean and Var calculations Statistics\MeanVarianceRV.RmdRmd html Univariate plots of numerical and categorical variables.In-class: Rmd html
Guess Ages, Legos.
(Legos part 2 Rmd html dat, diagram).BBC Radio 4: More or Less, “sampling” 9 min audio
PDS Ch 9 Graphing RelationshipsVideo:
PDS Video: 07. Graphing RelationshipsQuiz:
PDS Quiz 09 Graphing RelationshipsOptional Read:
Rep-Res-Book Ch 09
|Ch 09 Bivariate Graphs Graphs\BivariateMultivariateGraphs.Rmd
Ch 09 Create numerical variable, plot ReadingData\Exercise.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html Complete at least one bivariate coding relationship.In-class: Rmd html
one- and two-sample tests using data we collected in class.
PDS Ch 10 Hypothesis TestingVideo:
PDS Video: 08. Hypothesis Testing
Ch 10 Sampling distributions Statistics\SamplingDistributions.Rmd
Ch 10 Project template Statistics\StatisticsTemplate.RmdQuiz:
PDS Quiz 10 Hypothesis TestingOptional Read:
Rep-Res-Book Ch 10
|Ch 10 fake data example Statistics\HypothesesTesting.Rmd
ADA1_HW_08_Inference_HypTestOneTwoSam.RmdRmd html one- and two-sample tests using data we collected in class.In-class: Rmd html
NP one-sample tests and CIs, and ANOVA with pairwise comparisons.
|10-03/21||18||Rmd html Hypothesis testing (one- and two-sample)|
PDS Ch 11 Analysis of VarianceVideo:
PDS Video: 09. Analysis of Variance
Ch 11 Video Statistics\ANOVA.RmdQuiz:
PDS Quiz 11 ANOVAOptional Read:
Rep-Res-Book Ch 11
|Ch 11 ANOVA example Practice\Practice1.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html ANOVA, model assumptions, and paired comparisons.In-class: Rmd html dat
PDS Ch 12 Chi-Square Test of IndependenceVideo:
PDS Video: 10. Chi-Square Test of Independence
Ch 12 Video Statistics\ChiSquare.RmdQuiz:
PDS Quiz 12 Chi SquareOptional Read:
Rep-Res-Book Ch 12
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html dat Popular kids.06 Rmd htmlIn-class: Rmd html
AddHealth W4 Pregnancy.Summary of Methods we’ve covered
|13-04/09||23||Correlation and Interactions||Read:
PDS Ch 13 Correlation Coefficient
PDS Ch 14 ModerationVideo:
PDS Video: 11. Correlation
PDS Video: 12. Moderation
Ch 14 Moderation\Moderation.RmdQuiz:
PDS Quiz 13 Correlation
PDS Quiz 14 Exploring ModerationOptional Read:
Rep-Res-Book Ch 13
Ch 13: Correlation\Correlation.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html Data collection (hand span and word memory), correlation, regression to the mean.
06 Rmd htmlIn-class: Rmd html
Key statistical principles, ethics.With additional time, clarify which research questions you’ll present in your poster with a peer mentor. (Null results are ok!)Statistics is about communication, including writing and presenting.
Work on posterCh 18 S4R_Content\assess\poster\*.pptx for poster template — need to modify
Work on posterIn-class: Rmd html
Work on designing poster content at the bottom of your HW document.13 Rmd htmlWork on your poster content.Try to complete your poster planning in your HW document.
PDS Ch 15 Linear Regression: Summarizing the Pattern of the Data with a Line
PDS Ch 17 Confounding and Multivariate ModelsVideo:
PDS Video: 13. The Question of Causation
PDS Video: 14. Multivariate Models and Confounding
Ch 15 Reg and Logistic Reg plotting Regression\Regression.RmdQuiz:
PDS Quiz 15 Regression
PDS Quiz 17 ConfoundingOptional Read:
Rep-Res-Book Ch 14
|Ch 15 Reg example Practice\Practice1.Rmd
Ch 15 Leverage examples Statistics\LevInf.Rmd
ADA1_WS_21_SimpleLinearRegression.RmdRmd html Regression of height vs hand span using data from our class.
11 Rmd html
Rmd html dat
Build intuition using SLR App, interpret properties of linear regression fit.poster template
pdf, Rnw, sty, bib, logoProf Erhardt’s example poster
Work on poster
|15-04/23||27||Sampling and Designing Studies, Poster Presentation||Read:
PDS Ch 16 Sampling and Designing Studies
PDS Ch 18 Poster PresentationVideo:
PDS Video: 15. Writing Your Poster PresentationQuiz:
Work on poster
Work on poster
In-class: Course evaluations, submit receipt (capture screen image) as in-class assignment.
See email for more details.
Due next Wednesday 12/7. Complete and submit your poster in LaTeX pdf format.
|15-04/25||28||Work on poster|
|16-04/30||29||Complete the Learning Studio Opinio post-semester survey required for classroom assessment.
Survey Dierker Post
|Work on poster
|16-05/02||30||POSTERS||Poster sessions in SMLC Atrium||Poster presentation
Poster Schedule (be on time):
Congratulations on a great semester!
|17-05/06||Finals week||No final!|
(I reserve the right to continue to improve the materials throughout the semester.)
Description: Techniques for the visual presentation of numerical data, descriptive statistics, introduction to probability and basic probability models used in statistics, introduction to sampling and statistical inference, illustrated by examples from a variety of fields. In this special Statistics for Research (S4R) version, we will emphasize the skills of data analysis, visualization, and communication for undergraduate research.
Prerequisite: See UNM catalog
Semesters offered: Spring 2019
Lecture: Spring 2019 schedule
TR 1530-1645, CTLB 330, Stat 145.014, CRN 30479
Location: CTLB 330 (building 55, northeast of Zimmerman) Video
Office hours: Mon/Tue 13:00-14:00, and by appointment in SMLC 312
email: “Erik B. Erhardt” <email@example.com>, please include “S4R” with a descriptive subject line, such as “S4R Homework 02 plot”
Textbook: Required custom book is available for free on this webpage: Passion Driven Statistics.
Laptops running R: I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, there are some laptops in class and teamwork is encouraged — sit next to someone friendly who likes to share.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files. The CTLB computers do not connect to your standard UNM drive space.
Teaching Assistants and Peer Mentors
Stat grad students TAs
Kelli Kasper <firstname.lastname@example.org>, office hours M 11-12, W 10-12 in SMLC 306.
Peer Mentors, SEP
Leah Puglisi <email@example.com>, former student, office hours M W 3-4 in SMLC 319.
Student learning outcomes
- Students will learn to use a reproducible research workflow.
- Students will improve their technology expertise.
- Students will learn to work with large data sets.
- Students will learn to create and present graphs for both univariate and multivariate data.
- Students will learn how to construct and test hypotheses.
Meeting the learning outcomes
You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.
Learning in this class occurs by:
- Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
- Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
- Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.
- Quizzes are due each Tuesday before class. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (There are 17, 20% of final grade, the lowest 2 are dropped.) Most weeks plan for 1 hour reading and video with a 20 minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
- Viewing quiz solutions after the due date in UNM Learn is not intuitive. Click on the “Begin” button (this is the non-intuitive part, since you are not actually beginning the quiz), then click “View All Attempts” to see the scores. Finally, click “Calculated Grade” to see the feedback for each question of the quiz.
- In-class assignments are assigned each Tuesday and due before the Thursday class at 3:30pm, submitted to UNM Learn. Purpose: to struggle and find success in class with the concepts and skills. (About 12, includes class participation, 20% of final grade, the lowest 2 are dropped.) Most weeks plan to finish in class.
- Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to UNM Learn. Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of final grade, the lowest few are dropped.) Most weeks plan on 1-4 hours per assignment with a substantial start in class.
- Poster will be developed through semester (most HW assignment contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.) In the last couple weeks, assembling this poster may take 5-10 hours using a template provided to you.
- Course surveys are due at the beginning and end of the course. Purpose: to participate in national project-based learning projects and improve the course. (About 4, 4% of final grade.)
All assignments in this class are electronic and submitted to UNM Learn for grading.
Late assignments will not be accepted.
All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier); this will happen automatically by using RMarkdown.
Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution should be provided to interpret the output. Output without explanation will not be given credit.
Collaboration and citation
For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).
As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me. Meeting in person is often much more productive than questions by email. If emailing, include your Rmd file and any required files (such as your .bib file) and a description of what you’re trying to do and where your error or trouble is.
I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful to you for finding the resource again).
I will follow the UNM absences policy with two unexcused absences. This means I can drop you from the class if you have a third absence; this paragraph serves as your warning. No one wants that, but I have found that I need to take attendance in freshman courses otherwise this policy is abused. If we all respect ourselves and each other then we won’t need attendance sheets and you’ll all achieve more.
If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.
Title IX statement
In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15). This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity. For more information on the campus policy regarding sexual misconduct.
Citizenship and/or Immigration Status
All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergency-related absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on the UNM website: http://undocumented.unm.edu/.
We’re doing this because:
- We want you to be empowered with statistics.
- We believe everyone should get out of this course with awesome skills
- Real-time feedback promotes efficient learning
“It encourages me to engage actively with the course material and take responsibility for my learning.”
Our six recommendations include the following:
- Teach statistical thinking.
- Teach statistics as an investigative process of problem-solving and decision-making.
- Give students experience with multivariable thinking.
- Focus on conceptual understanding.
- Integrate real data with a context and purpose.
- Foster active learning.
- Use technology to explore concepts and analyze data.
- Use assessments to improve and evaluate student learning.
Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files. The CTLB computers do not connect to your standard UNM drive space. I recommend using a very systematic folder structure, such as S4R/HW, S4R/Class, S4R/Reading, S4R/Poster, etc. Do not just work on files in your downloads folder or your desktop; respect your data and code!
Unicode compile problems: If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”. ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break. You get unwanted Unicode when you copy/paste from a pdf or some other source into your code. To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent. To do this: Ctrl-F to find, search for “[^\x00-\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one. As it finds instances, replace the characters manually until there are no more. These characters will typically be curly quotes or fancy dashes.