ADA1 F23

UNM Stat 427/527: Advanced Data Analysis I (ADA1)

Fall 2023 Syllabus is below the tables

Goal

This is Statistics Learn to produce beautiful (markdown) and reproducible (knitr/quarto) reports with informative plots (ggplot2) and tables (kable) by writing code (R, tidyverse, RStudio) to answer questions using fundamental statistical methods (all one- and two-variable methods), which you’ll be proud to present (poster).


Content

Roadmap

Here’s your roadmap for the semester! Each week, follow the general process outlined below:
  • The class maintains a Tuesday/Thursday schedule.
  • Each Tuesday and Thursday:
    • Enjoy reading the assigned chapter, using Video lectures to supplement the reading.
    • If available, experiment with Applets to develop intuition and work through Tutorials to practice R coding with data.
  • Complete the homework assignments in the form of exercises and worksheets.
    • Tuesday assignments are due Friday by 11:50 PM
    • Thursday assignments are due Monday by 11:50 PM
  • The table below has a row for each Tuesday and Thursday.

Resources

Timetable

Date Class Chapter VideoS, Tutorials, Applets Exercises, WorksheetS, Labs, Data
08/21 Mon 00 Install R software
  • You are responsible to read the entire chapter (unless otherwise noted).
  • Videos, Tutorials, and Applets are available to help with most material.
    • Videos will specify sections covered.
    • Tutorials are best done at the end of each chapter.
    • Applets will specify section it supports.
 
Key:
  • Due date (F = Friday, M = Monday)
  • Exercises from the current chapter under “TOPIC” column.
08/22 Tue 01
  • Due F 08/25
  • Exercises
    • Practice: 1, 2, 3, 5, 9 (ignore rows of data above ellipsis “…”), 13, 17
    • Turn in: 10, 14, 20
  • Worksheet: Medical records
    • qmd html
    • Download qmd file to your computer, open in RStudio, edit it, print HTML to pdf, turn in assignment by Friday 11:50 to UNM Canvas.
    • Video: CL01
08/24 Thu 02
08/29 Tue 03
  • Due F 09/01
  • Exercises
    • Practice: 1, 2, 3, 5, 7, 9, 11
    • Turn in: 8, 10, 18, 30
08/31 Thu 04
  • Due T 09/05
  • Worksheet: Study Design and Sampling
09/05 Tue 05
  • Due F 09/08
  • Exercises
    • Practice: 1, 3, 5
    • Turn in: 2, 6
09/07 Thu 06
  • Due M 09/11
  • Exercises
    • Practice: 3, 5, 13, 15, 19, 21, 23, 25
    • Turn in: 10, 12, 14, 16, 26
09/12 Tue 07
09/14 Thu 08 Spurious Correlations
09/19 Tue 09
  • Continued
  • Due F 09/22
  •  Exercises
    • Practice: 15, 17, 19, 23, 25, 31
    • Turn in: 18, 20, 22, 24, 28
09/21 Thu 10
  • Correlation and Logarithmic transformation
  • Due M 09/25
  • Worksheet: Logarithmic Transformation
09/26 Tue 11
  • Due F 09/29
  • Exercises
    • Practice:1, 3, 5, 9, 11, 13
    • Turn in: 6, 10, 12, 14
09/28 Thu 12
10/03 Tue 13
  • Due F 10/06
  • Exercises
    • Practice: 1, 3, 5, 7
    • Turn in: 2, 4, 8
10/05 Thu 14
  • Due M 10/09
  • Worksheet: Logistic regression
    • qmd html dat
    • Video: CL14
    • (it’s ok, it refers to CL28 from a previous semester)
10/10 Tue 15
  • Due M 10/16
  • Exercises
    • Practice: 1, 3, 5, 7
    • Turn in: 2, 4, 6, 8
10/12 Thu Fall Break
10/17 Tue 16
  • Due F 10/20
  • Exercises
    • Practice: 1, 7
    • Turn in: 4, 8
  • Worksheet: Hypothesis testing with randomization
    • CL 15: qmd html
    • Video: CL15
    • (1 class behind schedule)
10/19 Thu 17
  • Due M 10/23
  • Exercises
    • Practice: 1, 3, 5
    • Turn in: 2, 4, 6
  • Worksheet: Confidence intervals with bootstrapping
    • CL 16: qmd html
    • Video: CL16
    • (1 class behind schedule)
10/24 Tue 18
  • Due F 10/27
  • Exercises Ch 14
    • Practice: 1, 5
    • Turn in: 2, 3 (yes, odd), 6
  • Worksheet: Sampling distributions
    • Ch 17: qmd html
    • Video: CL17
    • (1 class behind schedule)
10/26 Thu 19
  • Due M 10/30
  • Exercises Ch 16
    • Practice: 3, 7, 13, 15, 19, 27
    • Turn in: 4, 8, 14, 16, 20, 28
  • Exercises Ch 17
    • Practice: 7, 13, 21
    • Turn in: 8, 14, 22
  • Worksheet: Proportion inference and hypothesis testing
10/31 Tue 20
  • Due F 11/03
  • Exercises Ch 18
    • Practice: 13, 15
    • Turn in: 14, 16
  • Worksheet: One- and two-way tables
11/02 Thu 21  
  • Due M 11/06
  •  Exercises Ch 19
    • Practice: 1, 4, 13, 15
    • Turn in: 2, 12, 16
  • Worksheet: Mean inference and hypothesis testing
    • qmd html dat
    • Video: CL21
    • (Video from a previous semester, time 0:00 – 25:00.)
11/07 Tue 22
  • Due F 11/10
  • Exercises Ch 20
    • Practice: 3, 7, 17, 19
    • Turn in: 4, 10, 12, 20
  • Exercises Ch 21
    • Practice: 3, 13, 17
    • Turn in: 4, 14, 18
  • Worksheet: Two means inference and hypothesis testing
    • qmd html dat
    • Video: CL22
    • (Video from a previous semester, time 27:00 – end.)
11/09 Thu 23
  • Due M 11/13
  • Exercises
    • Practice: 3, 5, 9, 11
    • Turn in: 6, 12, 14
  • Worksheet: start on CL23 in Class 24 below
11/14 Tue 24
  • In the Worksheet Video CL23, I demonstrate the use of pairwise comparisons as seen in ADA 5.2 and 5.3
  • Due F 11/17
  • Worksheet: ANOVA, Pairwise comparisons
11/16 Thu 25
  • Due M 11/20
  • Exercises
    • Practice: 1, 5, 13, 15
    • Turn in: 2, 10, 14
  • Worksheet: start on CL25 in Class 26 below
11/21 Tue 26
  • Continued
  • Due M 11/27
  • Worksheet: Simple linear regression
11/23 Thu  Thanksgiving break
11/28 Tue 27
  • Due F 12/01
  • Exercises
    • Practice: 1, 3, 7
    • Turn in: 2, 4, 6
  • Worksheet: start on CL27 in Class 28 below
11/30 Thu 28
  • Continued
  • Due M 12/04
  • Worksheet: Multiple regression, introduction
12/05 Tue 29
  • Due F 12/08
  • Exercises
    • Practice: 1, 3, 5
    • Turn in: 2, 4, 6
  • Worksheet: start on CL29 in Class 30 below
12/07 Thu 30
  • Confusion matrix and ROC curve
  • Due M 12/11
  • Worksheet: Logistic regression, prediction
12/12 Tue Finals week No final
  • EvalKit course evaluation: print a pdf of your email confirmation that you’ve completed the EvaluationKIt Survey and upload that to UNM Canvas. (Due T 12/12)
Congratulations on a great semester!
(I reserve the right to continue to modify the schedule and improve the materials throughout the semester.)

Software, R

Using R (through the RStudio IDE)

R will be used for all homework assignments. You can use R by downloading R onto your own computer. R is freely available at http://www.r-project.org/ and is already installed on many college computers. Additionally, you are required to install RStudio and turn in all R assignments using Quarto (RMarkdown). http://rstudio.org/. (You can use the LaTeX compiler at: https://yihui.name/tinytex/)

Installing software and packages (Step 0)

Before our first “class” (Mon 8/21), please read through the following actions and install the required software on your computer.  Video
  1. Install:
    1. R (Windows or Mac) or upgrade – Video
    2. RStudio, and
    3. Quarto.
  2. Install R packages.
    1. Follow these instructions: R packages.  (Ignore warning about rtools or any packages unavailable.)
    2. In RStudio, open Packages tab, click on “Update”, Select All, Install Updates (“No” to restart, “No” to compile from source).
  3. Make sure the erikmisc package works by printing the logo in the Console:
    1. library(erikmisc)
    2. erikmisc_logo()
  4. Set up your computer
    1. RStudio disable notebook
    2. Operating system to be more friendly to programming.
If you have a Chromebook or no laptop, consider using RStudio Cloud > Individuals,  and when installing packages remove the type="binary" option.

Learning R, self-study



Syllabus

  • Description: Statistical tools for scientific research, including parametric and non-parametric methods for ANOVA and group comparisons, simple linear and multiple linear regression, and basic ideas of experimental design and analysis. Emphasis placed on the use of statistical packages such as R. Course cannot be counted in the hours needed for graduate degrees in Mathematics and Statistics.
  • Prerequisite: Math 1350 [Stat 145] (or other intro stats course)
  • Semesters offered: Fall
  • Lecture: Stat 427.003, CRN 77017; Stat 527.003, CRN 77018; Online MAX Arranged
  • Email: Please include “ADA1” in the subject line of all emails; please do not send messages via UNM Canvas.

Instructors

  • Professor
    • Erik Erhardt <erike@stat.unm.edu>, he/him
  • Teaching Assistants
    • Behzad FallahiFard <bfallahifard@unm.edu>, he/him
    • Azadeh Golduzian <agolduzian96@unm.edu>, she/her

Office hours

See email “ADA1, Stat 427/527, Announcements” from 8/26/22 for Zoom links and instructions. UNM Authentication instructions.
  • Before attending an office hour, please email the instructor above to let us know when you’ll be attending
    • Email example: “ADA1 Office Hours Tue 10 AM: I’ll be there to ask about X, Y, and Z”.
  • We will certainly be at our office hours if you let us know that you’re coming.  However, we reserve the right to cancel an office hour last minute without notification if no one has let us know they will be attending.  This is a way of respecting everyone’s time so we can each be effective in our lives.
  • We are also available by appointment by email if these many hours do not work for your schedule.
Time Mon Tue Wed Thu Fri Sat Sun
8 AM
9 AM BF EE BF EE BF
10 AM BF EE BF EE BF
11 AM BF EE BF EE BF
12 PM
1 PM
2 PM EE
3 PM EE
4 PM AG AG AG
5 PM AG AG AG
6 PM AG AG AG
7 PM
8 PM
9 PM
 

Student learning outcomes

At the end of the course, you will be able to: (student results: R, all years20152014, 20132012) General outcomes:
  1. Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment.
  2. Distinguish a testable scientific hypothesis or data-supported interpretation from an opinion.
  3. Understand from a data story the goals of the study and apply the correct statistical procedure.
  4. Explain the scientific aspects of a problem to nonscientists in a fashion that enhances understanding and decision making.
Topical outcomes:
  1. Define parameters of interest and hypotheses in words and notation.
  2. Summarize data visually, numerically, and descriptively and interpret the observed characteristics. Calculate and interpret numerical summaries such as mean, variance, five-number summary, confidence intervals, and p-values, and create visual summaries such as bar plots, scatter plots, and histograms. (Never pie charts!)
  3. Distinguish between statistical significance and scientific relevance.
  4. Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
  5. Understand the differences and limitations of controlled experiments and observational studies. Design experiments to infer causal treatment effects. Analyze observational data to infer associations between measured variables.
  6. Identify and explain the statistical methods, assumptions, and limitations used in reported studies in scientific literature or popular media.
  7. Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.
  8. Make evidence-based decisions by constructing and deciding between testable hypotheses using appropriate data and methods.
  9. Discover relationships and make predictions through model development and selection.

Meeting the learning outcomes

You will acquire new information in this class, but the emphasis is on comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning occurs by explaining, integrating, applying, and analyzing facts, hypotheses, and theories. Learning in this class occurs by:
  1. Doing – completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.
  2. Discussion – interaction with classmates to assemble and synthesize information utilizing the collective skills and knowledge base of the group.
  3. Listening, acting, and reflecting – activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note-taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to every class prepared to participate in active and reflective learning opportunities.

Assessment

  • Exercises. Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in worksheet activities with minimal lecture. (About 18, 28% of final grade.)  Most weeks plan for 1-2 hours reading and video, 1-hour exercises. Exercises are not timed, they can be taken twice, and the last submission of the two is graded.
  • Worksheet assignments.  Purpose: to struggle and find success in class with the concepts and skills. (About 21, 70% of final grade.)
  • Course surveys are due at the end of the course (EvalKit).  (About 2, 2% of final grade.)
  • No late assignments.  Roughly speaking, the lowest 2-weeks worth of assignments are dropped, so your lowest 1-2 exercises assignments and 2-3 worksheet assignments are not included in the calculation of your grade (this could include a worksheet assignment that spanned a full week).
Final grade may include a small buffer at the discretion of the instructor. For example, final grade could be the total points earned adjusted none or a little for graduate students and a little more for undergraduate students. That is [Final Grade] = 1 – (1 – [Points Earned])/a, where a = 1.25 for undergraduate students.  This increases your grade is slightly higher than you earned, and does more so for those with lower grades.  Here’s R code to see how grades would adjust for a given value of “adjustment”:
adjustment = 1.25
tibble::tibble(
    original   = seq(0, 1, by = 0.05)
  , adjusted   = 1 - ((1 - original) / adjustment)
  , difference = adjusted - original
  ) |> 
  print(n=Inf)
All assignments in this class are electronic, submitted to UNM Canvas.  For all submissions: (1) In Quarto, render qmd file to HTML, (2) Open HTML file in your internet browser, (3) Print HTML to pdf file, (4) Submit pdf to UNM Canvas.  Always view your submission in Canvas to verify that the grader will also be able to view your assignment! Browser choice: Chrome is the best browser choice.  On a Mac, Safari adds “.txt” to Quarto files when downloaded, and Firefox sometimes fails on upload of a pdf to UNM Canvas. Rubrics guide assessment (and self-assessment) of homework, code, projects, exams, and presentations.  Each assignment will have its own specific rubric. The use of R and Quarto are required for the course.  This will include all of the R code for the assignment with the part of the problem it addresses in a fixed-width font and syntax highlighting. You will weave your code with prose narrations of your work and solutions.

Collaboration and citation

For homework, I encourage you to work together. Please discuss the data, code, and problems with one another, but do your own exploration and write up. We expect everyone to submit substantially different homework, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.). As in life, please use any resources available to you. Projects and some homework will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free to discuss your problems with other classmates or meet with or email questions to me or the TAs. I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).  You won’t be the first person to do anything in this class, so give credit where it’s due.
Why Cheat?
There are many known reasons why we may feel the need to “cheat” on problem sets or exams:
  • An academic environment that values grades above learning.
  • Financial aid is critical for remaining in school that places undue pressure on maintaining a high GPA.
  • Navigating school, work, and/or family obligations that have diverted focus from class.
  • Challenges balancing coursework and mental health.
  • Balancing academic, family, peer, or personal issues.
Being accused of cheating – whether it has occurred or not – can be devastating for students. The college requires me to respond to potential academic dishonesty with a process that is very long and damaging. As your instructor, I care about you and want to offer alternatives to prevent us from having to go through this process.
If you find yourself in a situation where “cheating” seems like the only option, please come talk to me. We will figure this out together.

Statements

COVID-19 Health and Awareness

UNM is a mask friendly, but not a mask required, community. If you are experiencing COVID-19 symptoms, please do not come to class. If you do need to stay home, please communicate with me at my email; I can work with you to provide alternatives for course participation and completion. Let me, an advisor, or another UNM staff member know that you need support so that we can connect you to the right resources. Please be aware that UNM will publish information on websites and email about any changes to our public health status and community response. Support: Student Health and Counseling (SHAC) at (505) 277-3136. If you are having active respiratory symptoms (e.g., fever, cough, sore throat, etc.) AND need testing for COVID-19; OR If you recently tested positive and may need oral treatment, call SHAC.  LoboRESPECT Advocacy Center (505) 277-2911 can offer help with contacting faculty and managing challenges that impact your UNM experience.

Accessibility and Privacy

UNM is committed to providing courses that are inclusive and accessible for all participants. As your instructor, it is my objective to facilitate an accessible classroom setting, in which students have full access and opportunity. If you are experiencing physical or academic barriers, or concerns related to mental health, physical health, and/or COVID-19, please consult with me after class, via email/phone, or during office hours. You are also encouraged to contact the Accessibility Resource Center at arcsrvs@unm.edu or by phone 277-3506.

Below are accessibility and privacy statements for the tools we will be using in this course. If you have questions or concerns about any of these, please contact me.

Credit-hours

This is a three credit-hour course delivered in an entirely asynchronous online modality over 16 weeks during the Spring 2023 semester. Please plan for a minimum of 9 hours per week to learn course materials and complete assignments. Support: Resources to support study skills and time management are available through Student Learning Support at the Center for Teaching and Learning.

Title IX statement

Our classroom and our university should always be spaces of mutual respect, kindness, and support, without fear of discrimination, harassment, or violence. Should you ever need assistance or have concerns about incidents that violate this principle, please access the resources available to you on campus. Please note that, because UNM faculty, TAs, and GAs are considered “responsible employees” any disclosure of gender discrimination (including sexual harassment, sexual misconduct, and sexual violence) made to a faculty member, TA, or GA must be reported by that faculty member, TA, or GA to the university’s Title IX coordinator. For more information on the campus policy regarding sexual misconduct and reporting, please see: https://policy.unm.edu/university-policies/2000/2740.html. Support: LoboRESPECT Advocacy Center, the Women’s Resource Center, and the LGBTQ Resource Center all offer confidential services.

Citizenship and/or Immigration Status

All students are welcome in this class regardless of citizenship, residency, or immigration status. Your professor will respect your privacy if you choose to disclose your status. As for all students in the class, family emergency-related absences are normally excused with reasonable notice to the professor, as noted in the attendance guidelines above. UNM as an institution has made a core commitment to the success of all our students, including members of our undocumented community. The Administration’s welcome is found on our website: http://undocumented.unm.edu/.

Land Acknowledgement

Founded in 1889, the University of New Mexico sits on the traditional homelands of the Pueblo of Sandia. The original peoples of New Mexico Pueblo, Navajo, and Apache since time immemorial, have deep connections to the land and have made significant contributions to the broader community statewide. We honor the land itself and those who remain stewards of this land throughout the generations and also acknowledge our committed relationship to Indigenous peoples. We gratefully recognize our history. Faculty Resource: Information provided by UNM’s Division for Equity and Inclusion can support building an inclusive classroom, https://diverse.unm.edu/education-and-resources/programs/index.html.

Respectful and Responsible Learning

We all have shared responsibility for ensuring that learning occurs safely, honestly, and equitably. Submitting material as your own work that has been generated on a website, in a publication, by an artificial intelligence algorithm, by another person, or by breaking the rules of an assignment constitutes academic dishonesty. It is a student code of conduct violation that can lead to a disciplinary procedure. Please ask me for help in finding the resources you need to be successful in this course. I can help you use study resources responsibly and effectively. Off-campus paper writing services, problem-checkers and services, websites, and AIs can produce incorrect or misleading results. Learning the course material depends on completing and submitting your own work. UNM preserves and protects the integrity of the academic community through multiple policies including policies on student grievances (Faculty Handbook D175 and D176), academic dishonesty (FH D100), and respectful campus (FH CO9). These are in the Student Pathfinder (https://pathfinder.unm.edu) and the Faculty Handbook (https://handbook.unm.edu). Support: Many students have found that time management workshops or work with peer tutors can help them meet their goals. These and are other resources are available through Student Learning Support at the Center for Teaching and Learning.

Connecting to Campus and Finding Support

UNM has many resources and centers to help you thrive, including opportunities to get involved, mental health resources, academic support such as tutoring, resource centers for people like you, free food at Lobo Food Pantry, and jobs on campus. Your advisor, staff at the resource centers and Dean of Students, and I can help you find the right opportunities for you.

Support in Receiving Help

Students who ask for help are successful students. I encourage students to be familiar with services and policies that can help them navigate UNM successfully. Many services exist to help you succeed academically, such as peer tutoring at CAPS and http://mentalhealth.unm.edu. There are plenty of ways to find your place and your pack at UNM: see the “student guide” tab on my.unm, students.unm.edu, or ask me for information about the right resource center or person to contact.

Doing the Right Thing

UNM has policies to preserve and protect you and the academic community available in the Student Pathfinder as well as in the Faculty Handbook. These include policies on student grievances D175 (undergraduates) and D176 (graduate and professional students), academic dishonesty (D100), and respectful campus (CO9). Please ask for help in understanding and avoiding plagiarism (passing the work or words of others off as your own work or words) or other forms of academic dishonesty. Doing something dishonest in a class or on an assignment can lead to serious academic consequences. Come talk with me about your concerns or needs for academic flexibility or talk with support staff at one of our student resource centers before you do something that may endanger your career.

Our Classroom

We’re doing this because:
  • We want you to be empowered with statistics.
  • We believe everyone should get out of this course with awesome skills
  • Real-time feedback promotes efficient learning
“It encourages me to engage actively with the course material and to take responsibility for my learning.”

GAISE Connections

Our six recommendations include the following:
  1. Teach statistical thinking.
    • Teach statistics as an investigative process of problem-solving and decision-making.
    • Give students experience with multivariable thinking.
  2. Focus on conceptual understanding.
  3. Integrate real data with a context and purpose.
  4. Foster active learning.
  5. Use technology to explore concepts and analyze data.
  6. Use assessments to improve and evaluate student learning.

Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius

Archive

Asking smart questions

  • Smart Questions” guide (note “hackers build things, crackers break them”)
  • Follow this Rubric when emailing a question:
    • Send a new email for each new question.  Use “Reply” to continue a conversation on a question (do not start a new email, again).
    • Include “ADA1” as the first word of the subject line in new emails (if replying, use reply), with the rest of the subject indicating the assignment and type of problem.
    • Begin the email with a short question summary (that is, don’t bury your question in the middle of the third paragraph).  Then, begin the detail of your question in the second paragraph.
    • When possible, include commented code in the email body — Comments (starting with # symbol) should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce the problem.
    • Attach your qmd file so that the instructor can reproduce the problem.   If attaching code, please include all the files necessary to run your code (data, etc.)
    • [Attaching code supersedes this: Code should include a “Minimum representative test case” (http://www.catb.org/esr/faqs/smart-questions.html#code)]
    • Assume the best. Your instructors want to help and we will do our best. Do not abuse your helpers even if you feel frustrated.

  Unicode compile problems:  If you render to pdf you may get this error: “! Package inputenc Error: Unicode char”.  ASCII is a small character set what we use to program in, Unicode is an extended character set that looks pretty (for example “straight quotes” become “curly quotes”) but causes code to break.  You get unwanted Unicode when you copy/paste from a pdf or some other source into your code.  To fix this, you have to find the Unicode and replace it with it’s ASCII equivalent.  To do this: Ctrl-F to find, search for “[^\x00-\x7F]” (without quotes), select “Regex” for regular expressions, and find the “Next” one.  As it finds instances, replace the characters manually until there are no more.  These characters will typically be curly quotes or fancy dashes.

Acumen in Statistics