S4R S19

UNM Stat 145 special: Statistics for Research (S4R)

Spring 2019
Syllabus is below timetable.

Spring 2019 schedule
TR 1530-1645, CTLB 330, Stat 145.014, CRN 30479


Goal

This Is Statistics

Our goal is to increase the number and diversity of students exposed to meaningful and empowering data analysis experiences and to inspire the pursuit of advanced data-driven experiences and opportunities for everyone! Learn to produce beautiful (markdown) and reproducible (knitr) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, Rstudio) to answer questions using fundamental statistical methods (all one- and two-variable methods), which you’ll be proud to present (poster).


News

Information about the coming week will appear here if necessary; usually there won’t be any.


Course content

Course book and videos

Book: Passion Driven Statistics

Erik’s example assignment document
NESARC data, nicotine and depression: .Rmd + .bib = .html

Passion Driven Statistics (PDS) data

I encourage you to use one of the AddHealth datasets or NESARC.  Use AddHealth W1 if you want to understand adolescents when they were young and AddHealth W4 if you want to understand adult relationships.  NESARC is also interesting for substance abuse issues.

Data available in the PDS package with the command: library(PDS).


Weekly structure (also see Assessment below)

  1. Pre-class (Tuesday): Reading, Video, Quiz (due before class — two attempts, higher score used, and solutions become available Tue 3:30pm after the quiz is due)
  2. In-class: Activities in class Tuesday and Thursday.  Tuesday’s assignment is due by Thursday 3:30pm of the same week, submitted to UNM Learn (evaluated by TA within 1 week).  Often finished in class.
  3. Post-class (Thursday): Thursday’s assignment will be left to complete as Homework (due following Thursday by 3:30pm).  Occasionally, finished in class, usually not.
  • UNM Learn for quizzes and submitting in-class and homework assignments.

Office hours

  • Erik: M/T 13:00-14:00, and by appointment in SMLC 312
  • Kelli: M 11-12, W 10-12 in SMLC 306
  • Leah: M/W 3-4 in SMLC 319

Timetable

Wk-Date Cl Topic Reading, Video, Quiz
Week’s Preparation
In-class Worksheet, Homework, Data
00-01/15 00 Install software, survey Step 0 – software install

Complete the Learning Studio Opinio pre-semester survey required for classroom assessment.

Dierker Pre-survey (sent by email end of first week)

01-01/15 01 Intro, RStudio and RMarkdown, poster
  • Introduction
  • RStudio + RMarkdown
    • RStudio config:
    • Menu: Tools / Global options /
      • General / Save workspace: Never
      • Sweave / Weave Rnw: knitr
    • Disable notebooks
  • Datasets + Codebooks
  • Learn: 01 Intro to using RMarkdown: Rmd html
01-01/17 02 Rmd, codebook

 

02-01/22 03 Datasets, Codebooks,
Personal Codebook

 

02-01/24 04 Citations
  • (CAPS visit scheduled)

I recommend starting next week’s assignments over the weekend since the Literature Review can take a long time.

(UNM Google Scholar)

03-01/29 05 Research Questions
03-01/31 06 Literature Review  

 

04-02/05 07 Working With Data, Data Management
  • Read:
    PDS Ch 6 Working with Data
    PDS Ch 7 Data Management
  • Video:
    PDS Video: 04. Working with Data
    PDS Video: 05. Data Management
    ADA1 Subset variables
  • Quiz:
    PDS Quiz 06 Working With Data
    PDS Quiz 07 Data Management
04-02/07 08 Coding missing and factor labels
05-02/12 09 continued continued
05-02/14 10 continued continued
06-02/19 11 Graphing Univariate
  • Read:
    PDS Ch 8 Graphing: One Variable at a Time
  • Video:
    PDS Video: 06. Graphing: One Variable at a Time
  • Quiz:
    PDS Quiz 08a Frequency Tables
    PDS Quiz 08b Graphing Variables
06-02/21 12 continued  

 

continued

 

07-02/26 13 Graphing Bivariate Read:
PDS Ch 9 Graphing RelationshipsVideo:
PDS Video: 07. Graphing RelationshipsQuiz:
PDS Quiz 09 Graphing RelationshipsOptional Read:
Rep-Res-Book Ch 09
Ch 09 Bivariate Graphs Graphs\BivariateMultivariateGraphs.Rmd
Ch 09 Create numerical variable, plot ReadingData\Exercise.Rmd
ADA1_WS_07_PlottingBivariate.Rmd
ADA1_HW_04_BivariatePlot_DataCleaning.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html Complete at least one bivariate coding relationship.In-class: Rmd html
one- and two-sample tests using data we collected in class.
07-02/28 14 continued continued

08 Rmd html

08-03/05 15 Sampling and Designing Studies PDS Ch 16 Sampling and Designing Studies Earth water, African countries in the UN example
ADA1_WS_15_HypTest_OneTwoSam.html#african-countries-in-the-un-example

In-class: Rmd html
Water on Earth.07 Rmd html
PDS Data Sampling Designs:
AddHealth, OOL, NESARC

08-03/07 16
09-03/12 Spring Break
09-03/14 Spring Break
10-03/19 17 Hypothesis Testing Read:
PDS Ch 10 Hypothesis TestingVideo:
PDS Video: 08. Hypothesis Testing
Ch 10 Sampling distributions Statistics\SamplingDistributions.Rmd
Ch 10 Project template Statistics\StatisticsTemplate.RmdQuiz:
PDS Quiz 10 Hypothesis TestingOptional Read:
Rep-Res-Book Ch 10
Ch 10 fake data example Statistics\HypothesesTesting.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.html
ADA1_WS_15_HypTest_OneTwoSam.Rmd
ADA1_HW_08_Inference_HypTestOneTwoSam.RmdRmd html one- and two-sample tests using data we collected in class.In-class: Rmd html
NP one-sample tests and CIs, and ANOVA with pairwise comparisons.
10-03/21 18 Rmd html Hypothesis testing (one- and two-sample)

In-class: Rmd html dat
Multinomial: World series number of games.10 Rmd html

11-03/26 19 ANOVA Read:
PDS Ch 11 Analysis of VarianceVideo:
PDS Video: 09. Analysis of Variance
Ch 11 Video Statistics\ANOVA.RmdQuiz:
PDS Quiz 11 ANOVAOptional Read:
Rep-Res-Book Ch 11
Ch 11 ANOVA example Practice\Practice1.Rmd
ADA1_WS_17_ANOVA_PairwiseComparisons.Rmd
ADA1_HW_09_ANOVA_Assumptions.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html ANOVA, model assumptions, and paired comparisons.In-class: Rmd html dat
Popular kids.
11-03/28 20 continued

In-class: Rmd html
Regression of height vs hand span using data from our class.11 Rmd html

12-04/02 21 Contingency tables Read:
PDS Ch 12 Chi-Square Test of IndependenceVideo:
PDS Video: 10. Chi-Square Test of Independence
Ch 12 Video Statistics\ChiSquare.RmdQuiz:
PDS Quiz 12 Chi SquareOptional Read:
Rep-Res-Book Ch 12
ADA1_WS_12_CategoricalTables.Rmd
ADA1_WS_20_TwowayCatTables.Rmd
ADA1_HW_11_TwowayCat_SLR.Rmd
ADA1_HW_06_Corr_CatTab.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html dat Popular kids.06 Rmd htmlIn-class: Rmd html
AddHealth W4 Pregnancy.Summary of Methods we’ve covered
12-04/04 22 continued

In-class: Rmd html
Describing a study reported in the media.12 Rmd html

13-04/09 23 Correlation and Interactions Read:
PDS Ch 13 Correlation Coefficient
PDS Ch 14 ModerationVideo:
PDS Video: 11. Correlation
PDS Video: 12. Moderation
Ch 14 Moderation\Moderation.RmdQuiz:
PDS Quiz 13 Correlation
PDS Quiz 14 Exploring ModerationOptional Read:
Rep-Res-Book Ch 13
ADA1_WS_10_LogTransform.Rmd
ADA1_WS_11_Correlation.Rmd
ADA1_HW_06_Corr_CatTab.Rmd
Ch 13: Correlation\Correlation.Rmd
Rubric STT2810ClassRepo-gh-pages\CoursePacing\GenericPacing.htmlRmd html Data collection (hand span and word memory), correlation, regression to the mean.
Spurious Correlations
06 Rmd htmlIn-class: Rmd html
Key statistical principles, ethics.With additional time, clarify which research questions you’ll present in your poster with a peer mentor. (Null results are ok!)Statistics is about communication, including writing and presenting.
13-04/11 24 Continued
Work on posterCh 18 S4R_Content\assess\poster\*.pptx for poster template — need to modify
Work on posterIn-class: Rmd html
Work on designing poster content at the bottom of your HW document.13 Rmd htmlWork on your poster content.Try to complete your poster planning in your HW document.
14-04/16 25 Linear Regression Read:
PDS Ch 15 Linear Regression: Summarizing the Pattern of the Data with a Line
PDS Ch 17 Confounding and Multivariate ModelsVideo:
PDS Video: 13. The Question of Causation
PDS Video: 14. Multivariate Models and Confounding
Ch 15 Reg and Logistic Reg plotting Regression\Regression.RmdQuiz:
PDS Quiz 15 Regression
PDS Quiz 17 ConfoundingOptional Read:
Rep-Res-Book Ch 14
Ch 15 Reg example Practice\Practice1.Rmd
Ch 15 Leverage examples Statistics\LevInf.Rmd
ADA1_WS_09_LinearRegression.Rmd
ADA1_HW_05_SLR_Log.Rmd
ADA1_WS_21_SimpleLinearRegression.RmdRmd html Regression of height vs hand span using data from our class.
11 Rmd html
Rmd html dat
Build intuition using SLR App, interpret properties of linear regression fit.poster template
pdf,  Rnw, sty, bib, logoProf Erhardt’s example poster
pdf,  Rnw
14-04/18 26 Continued
Work on poster
15-04/23 27 Sampling and Designing Studies, Poster Presentation Read:
PDS Ch 16 Sampling and Designing Studies
PDS Ch 18 Poster PresentationVideo:
PDS Video: 15. Writing Your Poster PresentationQuiz:
none
ADA1_WS_23_ExpObs_DescribeStudy.Rmd

ADA1_HW_13_PosterCompleteInHWDoc.Rmd
ADA1_HW_14_PosterCompleteInTemplate.Rmd
ADA1_HW_ALL_NESARC_Project.Rmd

ADA1_WS_24_StatisticalCommunication.Rmd
ADA1_WS_25_PosterPrep.Rmd
ADA1_WS_CompileAll.Rmd

Work on poster

Rmd html Work on designing poster content at the bottom of your HW document.
13 Rmd html Work on your poster content.Try to complete your poster planning in your HW document.
14 Rmd html

 

Work on poster

In-class: Course evaluations, submit receipt (capture screen image) as in-class assignment.

  1. Everyone EvalKit
  2. Everyone Classroom
  3. UGrads SLOs

See email for more details.

14 Rmd html

Due next Wednesday 12/7. Complete and submit your poster in LaTeX pdf format.

Transition from Markdown to LaTeX
Video for poster transition

15-04/25 28 Work on poster
16-04/30 29 Complete the Learning Studio Opinio post-semester survey required for classroom assessment.

Survey Dierker Post

Work on poster

ARI Graphix
$9 poster printing
Open Mon-Fri 7:30-5:30
Do not use their website!
Email plotting@abqrepro.com,
Subject: UNM S4R class poster
Text: indicate to print “in color on bond paper”.
Attach:
Poster pdf with your name in the filename, such as “FirstLast_S4R_poster.pdf”.
Try to send by Friday noon for the poster to be ready by Monday.
Arrange to pick up the poster.
Price is $0.75/sq ft for Fall 2016.Have a peer mentor approve your poster for printing and presentation. Congratulations!

16-05/02 30 POSTERS Poster sessions in SMLC Atrium Poster presentation

Poster Schedule (be on time):
3:30-3:40 Organization
3:40-4:40 Group 1
4:45-5:45 Group 2

Congratulations on a great semester!
Poster rubric

17-05/06 Finals week No final!
 
 

 

(I reserve the right to continue to improve the materials throughout the semester.)


Syllabus

Description: Techniques for the visual presentation of numerical data, descriptive statistics, introduction to probability and basic probability models used in statistics, introduction to sampling and statistical inference, illustrated by examples from a variety of fields.  In this special Statistics for Research (S4R) version, we will emphasize the skills of data analysis, visualization, and communication for undergraduate research.
Prerequisite: See UNM catalog
Semesters offered: Spring 2019
Lecture: Spring 2019 schedule
TR 1530-1645, CTLB 330, Stat 145.014, CRN 30479
Location: CTLB 330 (building 55, northeast of Zimmerman) Video
Office hours: Mon/Tue 13:00-14:00, and by appointment in SMLC 312
email: “Erik B. Erhardt” <erike@stat.unm.edu>, please include “S4R” with a descriptive subject line, such as “S4R Homework 02 plot”
Textbook: Required custom book is available for free on this webpage: Passion Driven Statistics.
Laptops running R: I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, there are some laptops in class and teamwork is encouraged — sit next to someone friendly who likes to share.
Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files.  The CTLB computers do not connect to your standard UNM drive space.

Teaching Assistants and Peer Mentors

Stat grad students TAs

Kelli Kasper <kkasper@unm.edu>, office hours M 11-12, W 10-12 in SMLC 306.

Peer Mentors, SEP

Leah Puglisi <lhpuglisi@unm.edu>, former student, office hours M W 3-4 in SMLC 319.

Student learning outcomes

  1. Students will learn to use a reproducible research workflow.
  2. Students will improve their technology expertise.
  3. Students will learn to work with large data sets.
  4. Students will learn to create and present graphs for both univariate and multivariate data.
  5. Students will learn how to construct and test hypotheses.

Meeting the learning outcomes

You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories.

Learning in this class occurs by:

  1. Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
  2. Discussion – interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.
  3. Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.

Assessment

  • Quizzes are due each Tuesday before class.  Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture. (There are 17, 20% of final grade, the lowest 2 are dropped.)  Most weeks plan for 1 hour reading and video with a 20 minute quiz. Quizzes are not timed, they can be taken twice, and the higher of the two scores is used for grade calculation.
    • Viewing quiz solutions after the due date in UNM Learn is not intuitive.  Click on the “Begin” button (this is the non-intuitive part, since you are not actually beginning the quiz), then click “View All Attempts” to see the scores.  Finally, click “Calculated Grade” to see the feedback for each question of the quiz.
  • In-class assignments are assigned each Tuesday and due before the Thursday class at 3:30pm, submitted to UNM Learn.  Purpose: to struggle and find success in class with the concepts and skills. (About 12, includes class participation, 20% of final grade, the lowest 2 are dropped.) Most weeks plan to finish in class.
  • Homework (HW) assignments are assigned each Thursday and due the following Thursday, submitted to UNM Learn. Purpose: to apply concepts and skills to your class poster project. (About 12, 40% of final grade, the lowest few are dropped.) Most weeks plan on 1-4 hours per assignment with a substantial start in class.
  • Poster will be developed through semester (most HW assignment contribute to poster), the last couple weeks we’ll complete them, and the last week we’ll have poster presentations. Purpose: to have an overarching set of questions to answer using methods learned in the course, with a deliverable you can be proud of! (1 poster and presentation, 12% poster, 2% presentation, and 2% evaluations of others of final grade.)  In the last couple weeks, assembling this poster may take 5-10 hours using a template provided to you.
  • Course surveys are due at the beginning and end of the course. Purpose: to participate in national project-based learning projects and improve the course.  (About 4, 4% of final grade.)

All assignments in this class are electronic and submitted to UNM Learn for grading.

Late assignments will not be accepted.

All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier); this will happen automatically by using RMarkdown.
Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution should be provided to interpret the output.  Output without explanation will not be given credit.

Collaboration and citation

For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.).

As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me.  Meeting in person is often much more productive than questions by email.  If emailing, include your Rmd file and any required files (such as your .bib file) and a description of what you’re trying to do and where your error or trouble is.

I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful to you for finding the resource again).

Absences policy

I will follow the UNM absences policy with two unexcused absences.  This means I can drop you from the class if you have a third absence; this paragraph serves as your warning.  No one wants that, but I have found that I need to take attendance in freshman courses otherwise this policy is abused. If we all respect ourselves and each other then we won’t need attendance sheets and you’ll all achieve more.


Statements

Disability statement

If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.

Title IX statement

In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15).   This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity. For more information on the campus policy regarding sexual misconduct.

Citizenship and/or Immigration Status

All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergency-related absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on the UNM website: http://undocumented.unm.edu/.


Our Classroom

We’re doing this because:

  • We want you to be empowered with statistics.
  • We believe everyone should get out of this course with awesome skills
  • Real-time feedback promotes efficient learning

“It encourages me to engage actively with the course material and take responsibility for my learning.”

GAISE Connections

Our six recommendations include the following:

  1. Teach statistical thinking.
    • Teach statistics as an investigative process of problem-solving and decision-making.
    • Give students experience with multivariable thinking.
  2. Focus on conceptual understanding.
  3. Integrate real data with a context and purpose.
  4. Foster active learning.
  5. Use technology to explore concepts and analyze data.
  6. Use assessments to improve and evaluate student learning.

Learning without thought is labor lost.
What I hear, I forget.
What I see, I remember.
What I do, I understand.
– Confucius

 


Archive

Pre-course to-dos

Did you receive a registration error for Spring 2019? Send me an email with the following answers:
1. What registration error did you get (copy/paste is best)?
2. What is your UNM ID?
3. What is your Math/Stat background (that is, do you have the pre-reqs)?
If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.

Step 0: Before our first class (Tue 1/15) please read through the following actions and install the required software on your computer and complete the brief surveys. If you don’t have a computer, there are classroom computers which will be available only when the classroom is open.

  1. Install R and RStudio:
    1. R for programming
      1. Windows (Download R 3.5.x for Windows link) or
      2. Mac (R-3.5.x.pkg link); or
      3. upgrade if you already have R
    2. Rstudio Desktop for better R experience
      1. Installers at bottom, choose Windows or Mac OSX.
    3. Videos that may be helpful for installation:
      1. Install R on Mac (2 min).
      2. Install R for Windows (3 min).
      3. Install R and RStudio on Windows (5 min).
  2. Install R packages (copy/paste CODE into console and press [Enter]; this may take 20-30 minutes), also update all packages within RStudio.
  3. Install Zotero or Mendeley (recommended for your own laptop) for bibliography management.

Saving data: If you’re using classroom computers, use Flashdrives or UNM’s OneDrive (available in LoboMail) for saving files.  The CTLB computers do not connect to your standard UNM drive space. I recommend using a very systematic folder structure, such as S4R/HW, S4R/Class, S4R/Reading, S4R/Poster, etc.  Do not just work on files in your downloads folder or your desktop; respect your data and code!


Unicode compile problems:  If you knit to pdf you may get this error: “! Package inputenc Error: Unicode char”.  ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break.  You get unwanted Unicode when you copy/paste from a pdf or some other source into your code.  To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent.  To do this: Ctrl-F to find, search for “[^\x00-\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one.  As it finds instances, replace the characters manually until there are no more.  These characters will typically be curly quotes or fancy dashes.

 

Acumen in Statistics