Note: Each class save this file with a new name, updating the last two digits to the class number. Then, you’ll have a record of your progress, as well as which files you turned in for grading.

Starting in Class 07, we will concatenate all our WSs together to retain the relevant information needed for subsequent classes. You will also have an opportunity to revisit previous parts to make changes or improvements, such as updating your codebook, modifying your research questions, improving tables and plots. I’ve provided an initial predicted organization of our sections and subsections using the # and ## symbols. A table of contents is automatically generated using the “toc: true” in the yaml and can headings in the table of contents are clickable to jump down to each (sub)section.

You will need to update any section symbols to be at least 3 deep (###) if you’ve copied them from an older assignment into this document.

Finally, you can delete all this header text once you don’t need to refer to it.

Research Questions

Class 03 Datasets, Codebooks, Personal Codebook

Question of interest

Dataset: (You need this part.) National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), with codebook NESARC_W1_CodeBook.pdf.

Initial thinking: (My helpful narrative description to help you get going.) While nicotine dependence is a good starting point, I need to determine what it is about nicotine dependence that I am interested in. It strikes me that friends and acquaintances that I have known through the years that became hooked on cigarettes did so across very different periods of time. Some seemed to be dependent soon after their first few experiences with smoking and others after many years of generally irregular smoking behavior. I decide that I am most interested in exploring the association between level of smoking and nicotine dependence. I add to my codebook variables reflecting smoking levels (e.g., smoking quantity and frequency).

Topic of interest: (You need this part.) I have decided to investigate the relationship between nicotine dependence and the frequency and quantity of smoking on people up to 25 years old. The association may differ by ethnicity, age, gender, and other factors.

How I did it: (My helpful narrative description to help you get going.) I look through the codebook and find some variables of interest. I searched the text with “Ctrl-F” (find) to find these variables. For each variable, I copy/paste the description here, then formatted it so it’s organized. You can choose to use a table or an outline format. I found this verbatim text format to be very easy to format. I retained the “frequency” of each response because it’s interesting to know, and because it was already in the codebook — this value is not required for your codebook.


Dataset: NESARC
Primary association: nicotine dependence vs frequency and quantity of smoking

  Variable description
  Data type (Continuous, Discrete, Nominal, Ordinal)
  Frequency ItemValue Description

  43093 1-43093. Unique Identification number

  18518 1. Male
  24575 2. Female

  43079 18-97. Age in years
     14 98. 98 years or older

   9913 1. Smoked cigarettes in the past 12 months
   8078 2. Smoked cigarettes prior to the last 12 months
     22 9. Unknown
  25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

  38131 0. No nicotine dependence
   4962 1. Nicotine dependence

  14836 1. Every day
    460 2. 5 to 6 Day(s) a week
    687 3. 3 to 4 Day(s) a week
    747 4. 1 to 2 Day(s) a week
    409 5. 2 to 3 Day(s) a month
    772 6. Once a month or less
    102 9. Unknown
  25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

  24507 1. White, Not Hispanic or Latino
   8245 2. Black, Not Hispanic or Latino
    701 3. American Indian/Alaska Native, Not Hispanic or Latino
   1332 4. Asian/Native Hawaiian/Pacific Islander, Not Hispanic or Latino
   8308 5. Hispanic or Latino

  17751 1-98. Cigarette(s)
    262 99. Unknown
  25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

Class 04 Citations

I will use (Beckschäfer et al. 2014; Richert 2013; Murphy 2012; Dean and ebrary, Inc 2014) for this work which was obtained using [@beck2014; @rich2013; @murp2012; @dean2014]. Plus, Beckschäfer et al. (2014) says some interesting stuff and that citation was obtained using @beck2014. For more documentation on bibliographies and citations with R Markdown, see For general help with R Markdown, see

Class 05 Research Questions

See Class 06 below.

Class 06 Literature Review

Dataset: National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), with codebook NESARC_W1_CodeBook.pdf.

Research question:

  1. Is nicotine dependence [S3AQ10D] associated with smoking frequency [S3AQ3B1] and quantity [S3AQ3C1]?
    • Google scholar search: “nicotine dependence smoking frequency”
    • Citation: pdf file available for Dierker et al. (2007)
    • Interesting points: Figures 2 and 3, quantity and frequency both positively related to probability of dependence.
    • Others: Kandel and Chen (2000) and Caraballo, Novak, and Asman (2009)

You don’t need to include images in your literature review. I’m providing these tables to illustrate what these tables look like:

  1. Is nicotine dependence [S3AQ10D] associated with depression [S4AQ1]?
    • Google scholar search: “nicotine dependence depression”
    • Citation: pdf file available for Naomi Breslau (1995)
    • Interesting points: Table 2, Smoking and Nicotine Dependence both associated with Education. Table 3, Major depression associated with being nicotine dependent and Sex.
    • Others: N. Breslau et al. (1998)
  1. Is the associated between nicotine dependence [S3AQ10D] and depression [S4AQ1] different by demographics, such as Education or Sex?
    • In Question 2 we see differences by Education and Sex.

I have decided to further focus my question by examining whether the association between nicotine dependence and depression differs based on how much a person smokes. I am wondering if at low levels of smoking compared to high levels, nicotine dependence is more common among individuals with major depression than those without major depression. I add relevant depression questions/items/variables to my personal codebook as well as several demographic measures (age, gender, ethnicity, education, etc.) and any other variables I may wish to consider.

All required variables have been found and added to my personal codebook (by expanding Class 03).

Data Management

Class 07 Working With Data, Data Management

Class 08 Subsetting data and R Programming


Class 11 Graphing Univariate

Class 13 Graphing Bivariate

Statistical methods

Class 17 Hypothesis Testing

Class 19 ANOVA

Class 21 Contingency tables

Class 23 Correlation and Interactions

Class 25 Linear Regression


Class 27 Poster Presentation


Beckschäfer, P, L Fehrmann, Rd Harrison, J Xu, and C Kleinn. 2014. “Mapping Leaf Area Index in Subtropical Upland Ecosystems Using RapidEye Imagery and the randomForest Algorithm.” iForest - Biogeosciences and Forestry 7 (1): 1–11. doi:10.3832/ifor0968-006.

Breslau, N., E. L. Peterson, L. R. Schultz, H. D. Chilcoat, and P. Andreski. 1998. “Major Depression and Stages of Smoking. a Longitudinal Investigation.” Archives of General Psychiatry 55 (2): 161–66.

Breslau, Naomi. 1995. “Psychiatric Comorbidity of Smoking and Nicotine Dependence.” Behavior Genetics 25 (2). Springer: 95–101.

Caraballo, Ralph S., Scott P. Novak, and Katherine Asman. 2009. “Linking Quantity and Frequency Profiles of Cigarette Smoking to the Presence of Nicotine Dependence Symptoms Among Adolescent Smokers: Findings from the 2004 National Youth Tobacco Survey.” Nicotine & Tobacco Research, January, ntn008. doi:10.1093/ntr/ntn008.

Dean, Jared, and ebrary, Inc. 2014. Big Data, Data Mining, and Machine Learning Value Creation for Business Leaders and Practitioners. Wiley & SAS Business Series. Hoboken, NJ: Wiley.

Dierker, Lisa C., Eric Donny, Stephen Tiffany, Suzanne M. Colby, Nicholas Perrine, and Richard R. Clayton. 2007. “The Association Between Cigarette Smoking and DSM-IV Nicotine Dependence Among First Year College Students.” Drug and Alcohol Dependence 86 (2–3): 106–14. doi:10.1016/j.drugalcdep.2006.05.025.

Kandel, D. B., and K. Chen. 2000. “Extent of Smoking and Nicotine Dependence in the United States: 1991-1993.” Nicotine & Tobacco Research: Official Journal of the Society for Research on Nicotine and Tobacco 2 (3): 263–74.

Murphy, Kevin P. 2012. Machine Learning a Probabilistic Perspective. Adaptive Computation and Machine Learning Series. Cambridge, Mass: MIT Press.

Richert, Willi. 2013. Building Machine Learning Systems with Python. Birmingham, UK: Packt Publishing.