# UNM Stat 590: Statistical Computing (SC1)

Table of Contents

### Goal

Learn to produce beautiful (LaTeX) and reproducible (knitr) reports with informative plots (lattice, ggplot2) and tables (xtable) by writing code (R, Rstudio) to answer questions using computational or robust statistical methods (MCMC, bootstrap), which you’ll be proud to publish (packaging commented code).

**Fall 2015** Syllabus is below table

Fall 2015 schedule; Stat 590.001 (CRN 53402), TR 14:00–15:15, SMLC 120

**Did you receive a**Send me an email with the following answers: 1. What registration error did you get (copy/paste is best)? 2. What is your UNM ID? 3. What is your Math/Stat background (that is, do you have the pre-reqs)?

*registration error*for Fall 2015?*If you are waitlisted, as long as there are seats available I will override you into the course. Don’t worry.*

**Before first day:**

**Step 0: Set up LaTeX, R, and Rstudio**

Before our first class (Tue 8/18) please read through the following and install the required software on your computer. If you don’t have a computer, campus resources are available (but we’ll need to make special arrangements). This video will guide you though the installations.

- We will be using your
**google****account**for crowdgrader (Create a google account if you don’t already have one). - Sign up for crowdgrader (which uses gmail account).
- Install R (windows or mac) then Rstudio. These videos may be helpful: Install R on Mac (2 min), Install R for Windows (3 min), Install R and RStudio on Windows (5 min).
- Run Rstudio and install a package we’ll use (this will test that R’s working), in console window (bottom left) type: install.packages(“ggplot2”)

- Install LaTeX (for knitr document preparation)

**News:**(I reserve the right to continue to improve the materials throughout the semester.)*Updating for Fall 2015 — will be a “flipped” class. Lectures will become reading before class with a brief quiz, and class periods will be spent working on assignments and projects, often in teams. (Instead of having you struggle alone at home with homework.)*

**Timetable**

*(content will be filled in soon for F15)*

**Each week has this structure:**

- Pre-class (Tuesday):
**Reading, Video, Quiz**(due before class) - In-class:
**Activities**in class Tuesday and Thursday - Post-class (Thursday):
**Homework**(crowdgrader, due following Thursday before class) - Post-class (Following Thursday-Tuesday):
**Grading**(crowdgrader, following 1 week + Tuesday before class)

Wk-Date | Cl | Topic | Reading, Video, Quiz | In-class Worksheet, Data | Homework | HW Submit Grading | DUE BEFORE CLASS |
---|---|---|---|---|---|---|---|

00-08/18 | 00 | Install software | read install, video install | ||||

01-08/18 | 01 | 01 Reproducible research LaTeX, R, Rstudio, knitr | read: Ch 01 slides R LaTeX: lshort R: 1 2, video Ch 01 | LaTeX+R+Knitr template: SC1_LaTeX_basic.tex, SC1_student_template.Rnw, 01 Help and plots sol | 01 crowdgrader 8/27 Submit 8/27 Grade | ||

01-08/20 | 02 | 02 R visualization using ggplot 18 More plots | read: Ch 02 slides R video Ch 02 p1 Ch 02 p2 read:Ch 18 R video Ch 18 | Brief DataVis Presentation from Allen and Erhardt book chapter. In-class from CL01. Discuss HW. | 01a Improving a plot, datathief Two examples: 1. suicides 2. voting | 01a crowdgrader 9/1 Submit 9/3 Grade | |

02-08/25 | 03 | 06 R Writing and debugging code, *ply | Ch 06 slides R video Ch 06 p1 Ch 06 p2 | apply | |||

02-08/27 | 04 | Submit 01 to crowd grader in class. Start discussing functions. | 06 functions sol | 02 crowdgrader 9/3 Submit 9/8 Grade | Turn in HW 01 | ||

03-09/01 | 05 | Basic data manipulation | Grade HW 01 | ||||

03-09/03 | 06 | 60 HW03 sol | 03 crowdgrader 9/10 Submit 9/15 Grade | Turn in HW 02 | |||

04-09/08 | 07 | 20 Data manipulation | read: Ch 20 R video Ch 20 p1 p2 | t1 t2 t3 t4 s1 s2 baby birth R (web baby name wizard) | Grade HW 02 | ||

04-09/10 | 08 | 30 HW20 sol | 04 crowdgrader 9/17 Submit 9/22 Grade | Turn in HW 03 | |||

05-09/15 | 09 | Data Cleaning | Ch 18 R d1 d2 d3 d4 d5 Had SN | 90 HW18 sol dat, FB | Grade HW 03 | ||

05-09/17 | 10 | 05 crowdgrader 9/24 Submit 9/29 Grade | Turn in HW 04 | ||||

06-09/22 | 11 | 03 Linear regression and matrix vs looped calculations | Ch 03 d1 R | Grade HW 04 | |||

06-09/24 | 12 | No homework this week. | Turn in HW 05 | ||||

07-09/29 | 13 | 04 Approximating expectations, generating random numbers, simulation strategies | Ch 4 R | 140 HW05a sol, FB | Grade HW 05 | ||

07-10/01 | 14 | 07 crowdgrader 10/15 Submit 10/20 Grade | |||||

08-10/06 | 15 | ||||||

08-10/08 | Fall Break |
||||||

09-10/13 | 16 | 05 Basics of Monte Carlo methods | Ch 05 R | 80 HW05b sol, FB | |||

09-10/15 | 17 | 09 crowdgrader 10/27 Submit 11/3 Grade | Turn in HW 07 | ||||

10-10/20 | 18 | 12 Bootstrap | Ch 12 R | 110 HW12 | Grade HW 07 | ||

10-10/22 | 19 | 10 crowdgrader 11/3 Submit 11/10 Grade | Turn in HW 09 | ||||

11-10/27 | 20 | 00 Mixture distributions with review of generating random numbers and parametric bootstrap | using HW assignment | 65 HW04 sol | Grade HW 09 | ||

11-10/29 | 21 | 11 crowdgrader 11/17 Submit 11/24 Grade | Turn in HW 10 | ||||

12-11/03 | 22 | 09 Optimization via NR, Secant, other methods | Ch 09 R | Grade HW 10 | |||

12-11/05 | 23 | Turn in HW 11 | |||||

13-11/10 | 24 | 10 Multivariate maximization | Ch 10 R | 105 HW10 sol | Grade HW 11 | ||

13-11/12 | 25 | 13 crowdgrader 11/25 Submit 12/03 Grade | Turn in HW 12 | ||||

14-11/17 | 26 | 11 Logistic regression and NR | Ch 11 R | 120 HW11 dat sol R | Grade HW 12 | ||

14-11/19 | 27 | 14 crowdgrader 12/3 Submit 12/8 Grade | Turn in HW 13 | ||||

15-11/24 | 28 | 17 optim() | Everyone optim()! | Nelder-Mead 1 2Results R | Grade HW 13 | ||

15-11/26 | Thanksgiving break |
Turn in HW 14 | |||||

16-12/01 | 29 | Grade HW 14 | |||||

16-12/03 | 30 | 19 Assessing test size | Ch 19 R | ||||

17-12/08 | Finals Week |

**If extra time:**16 Maps in R Ch 16 R Crime maps: Albuquerque and, discuss a few optim(), wcloud images Lecture notes for Statistical Computing (SC1) Stat 590 University of New Mexico is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at https://statacumen.com/teach/SC1/SC1_notes.pdf.

### Possible additional topic list

06. CI for proportion, small sample properties with readings 07. Exact binomial intervals, robustness, HW3: goodness-of-fit, binomial CI, Winsorized means 12. Bootstrap for many applications, concept and several examples, Davison and Hinkley Possible applications: correlation, power analysis, checking normality, null distribution for max correlation in lag window (from brain imaging), regression. HW4: Bootstrap 14. Nonparametric and permutation tests 15. Packaging code (with inline commenting, with package inlinedocs), or Roxygen. HW15: R package functions from semester with documentation Others. Using Hadley’s devtools to make a simple package. Testing packages: Runit and testthat.# Syllabus

**Description:**A detailed examination of essential statistical computing skills needed for research and industrial work. Students will use R to develop algorithms for solving a variety of statistical problems using resampling and simulation techniques such as the bootstrap, Monte Carlo methods, and Markov chain methods for approximating probability distributions. Applications to linear and non-linear models will be stressed with emphasis on reproducible research, efficient data manipulation, and visualization throughout.

**Prerequisite:**Stat 528 (ADA2)

**Semesters offered:**Fall or Spring

**Lecture:**Stat 590.001 (CRN 53402), TR 14:00–15:15, SMLC 120 Video

**Office hours:**Tue 12:30-13:30, Thu 12:30-13:30, and by appointment in SMLC 312

**email:**“Erik B. Erhardt” <erike@stat.unm.edu>, please include “SC1” in subject line

**Textbook:**Peter Dalgaard, “Introductory Statistics with R“, Second Edition, 2008, ISBN: 978-0-387-79053-4.

*The book is not required, but it will provide a backup for what you learn in class. Many other books are available with similar material.*

**Laptops running R:**I encourage you to bring a laptop to class each day so you can try the R programming exercises in class. If you don’t have one, no problem, teamwork is encouraged — sit next to someone friendly who likes to share.

## Teaching Assistants and Peer Mentors

None.## Student learning outcomes

**General outcomes:**

- Organize knowledge in graphs, tables, and code to support concise, comprehensible, and scientifically defensible written interpretations to produce knowledge within a reproducible research environment (R + knitr + LaTeX).
- Create computational solutions to answer statistical questions using a high-level programming language.
- Evaluate and verify code, and assess/criticize for improvements in correctness or efficiency.

**Topical outcomes:**

- Use statistical software, such as R, to read and manage data, create informative plots, report numerical summaries, and apply statistical models, by recommended programming practice including abstraction and documentation.
- Apply various types of apply functions to split, apply, and combine for efficient and effective data processing.
- Apply concepts of data visualization to improve visual communication and to critique figures for improvements.
- Use MCMC methods for statistical estimation, to improve estimates in standard situations and to make estimation possible in unusual situations.
- Implement good optimization strategies for a range of scenarios.
- Evaluate and criticize published studies, the work of peers, and your own work and assess what was done well, what could be done better, and examine whether their conclusions are supported using statistical principles.

## Meeting the learning outcomes

You will acquire new information in this class, but the emphasis is comprehending, integrating, and applying information. Rote factual memorization is the lowest form of learning. Effective learning takes place by explaining, integrating, applying, and analyzing facts, hypotheses, and theories. Learning in this class occurs by:**Doing**– completion of exercises that require analysis of data to answer questions and test hypotheses, or researching answers to reading assignments.**Discussion**– interaction with classmates to assemble and synthesize information you’d utilizing the collective skills and knowledge base of the group.**Listening, acting, and reflecting**– activities during class time provide insights into information not available in readings and includes review difficult material to aid comprehension. Note taking permits later reflection on lecture content. Listening to the professor lecture is the least effective learning tool for both students, however, and you should plan on coming to*every*class prepared to participate in active and reflective learning opportunities.

## Assessment

**Quizzes**will be due each Tuesday before class.*Purpose: to assess reading and video comprehension and assure you’re prepared to actively participate in class activities with minimal lecture.*(About 12, 15% of final grade.) Most weeks plan for 1-2 hours reading and video, 20 minute quiz.**Participation**is required in every class. If you’re engaged with the material and your classmates, you’ll get full points. If you’re not in class, working on other things, etc., then you’re not meeting my expectations.*Purpose: to struggle and find success in class with the concepts and skills.*(10% of final grade.)**Homework (HW)**assignments are assigned each Thursday and due the following Thursday, submitted to crowdgrader (75% of HW grade).*Purpose: to apply concepts and skills to your class poster project.*(About 12, 75% of final grade.) Most weeks plan on 3-5 hours per assignment.**Peer grading**is due by the following Tuesday after each homework is due (25% of HW grade).*Purpose: to gain skill assessing the work of others, as well as see alternative strategies to answer questions.*Most weeks this will take about 30 minutes to grade 3 other students’s HW.

**Rubrics**guide assessment (and self-assessment) of homework, code, projects, exams, and presentations. Each assignment will have its own specific rubric. All R code for the assignment should be included with the part of the problem it addresses (for code and output use a fixed-width font, such as Courier). Do NOT use your R code and output as your answer to the problem, but include them to show me how you arrived at your answer. Your prose solution (in a non-fixed-width font) should be provided in addition to R output.

### Collaboration and citation

For homeworks I encourage you to work together. Please discuss the data, code, and problems with one another, but*do your own exploration and write up*. We expect everyone to hand in substantially different homeworks, and we will enforce this under the honor code. The small benefit you might get from plagiarism is not worth the severe penalty (of lost trust, being reported to the dean, no points for the assignment, etc.). As in life, please use any resources available to you. Projects and some homeworks will explicitly encourage you to use resources on the internet, but showing extra initiative will always be appreciated. You may find R programming tough at first, so feel free discuss your problems with other classmates or meet with or email questions to the TAs or me. I encourage you to use the ideas of others, but make them your own, giving credit. For projects have a formal bibliography, for homework cite casually, and for code simply copy the URL in as a comment (which is doubly helpful for finding the resource again).

## Disability statement

If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. You’ll also need to register with the Accessibility Resource Center in 2021 Mesa Vista Hall (building 56) across the courtyard east from the SUB.Learning without thought is labor lost. What I hear, I forget. What I see, I remember. What I do, I understand. – Confucius

**Random stuff**

UNM has license for free online access to the definitive books for the Lattice and ggplot2 graphing platforms. Note you must be on campus or logged in through the UNM proxy to access these.
R is currently available in these UNM Locations: DSH 141 and 143, Econ 1004, SMLC pods, and SUB IT-LoboLab Pod and IT-LoboLab Classroom.
R style matters. There is a lot of online help on R, such as at UCLA, try-r, and Google’s Intro to R video series. Usually try searching for “R [mytopic]” and you’ll get lots of results. ggplot2 plotting cookbook.
R reference card by Jonathan Baron.
Translate between MATLAB and R.
Figure checklist. Choosing the right chart. Nature Methods points of view on visualization.
Statistical consulting and collaboration slides
Raster vs vector graphics.
Statistics pre-req refresher from Khan Academy.
Coursera has a free 4-week course on computing for data analysis with R.
Muddy points in perspective.
R+LaTeX+knitr for reproducible research. See my SC1 lecture notes (Ch01), and Mohammad Arbabshirani’s notes (pdf, rnw).
**Asking smart questions**“Smart Questions” guide (note “hackers build things, crackers break them”) Email Question Rubric: * Send one email per question. — Use “Reply” to continue conversation on a question; send a new email for a new question. * Include “Stat590” as the first word of the subject line in new emails (if replying, just use reply). * Begin email with a short question summary. * When possible, include commented code in email body — Comments should indicate where the problem is, what the expected behavior is, and what steps are necessary to reproduce problem. — Code should include a “Minimum representative test cast” (http://www.catb.org/esr/faqs/

**Help:**LaTeX wiki, lshort, Detexify LaTeX symbols (linux texlive package management) R tutorials: TryR (gentle), Kelly Black R style matters. There is a lot of online help on R, such as at UCLA. Usually try searching for “R [mytopic]” and you’ll get lots of results. Knitr in Rstudio (knitr is modern version of Sweave intro, demo, guide) xtable to produce LaTeX tabular environment from R data.frames Cookbook for R for helpful examples, visualization tutorials, diagrams Image formats: vector (pdf, eps) vs raster (jpeg, bmp, tiff, gif)