Note: Each class save this file with a new name, updating the last two digits to the class number. Then, you’ll have a record of your progress, as well as which files you turned in for grading.

Starting in Class 07, we will concatenate all our WSs together to retain the relevant information needed for subsequent classes. You will also have an opportunity to revisit previous parts to make changes or improvements, such as updating your codebook, modifying your research questions, improving tables and plots. I’ve provided an initial predicted organization of our sections and subsections using the # and ## symbols. A table of contents is automatically generated using the “toc: true” in the yaml and can headings in the table of contents are clickable to jump down to each (sub)section.

You will need to update any section symbols to be at least 3 deep (###) if you’ve copied them from an older assignment into this document.

Finally, you can delete all this header text once you don’t need to refer to it.

# Research Questions

## Class 03 Datasets, Codebooks, Personal Codebook

### Question of interest

Dataset: (You need this part.) National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), with codebook NESARC_W1_CodeBook.pdf.

Initial thinking: (My helpful narrative description to help you get going.) While nicotine dependence is a good starting point, I need to determine what it is about nicotine dependence that I am interested in. It strikes me that friends and acquaintances that I have known through the years that became hooked on cigarettes did so across very different periods of time. Some seemed to be dependent soon after their first few experiences with smoking and others after many years of generally irregular smoking behavior. I decide that I am most interested in exploring the association between level of smoking and nicotine dependence. I add to my codebook variables reflecting smoking levels (e.g., smoking quantity and frequency).

Topic of interest: (You need this part.) I have decided to investigate the relationship between nicotine dependence and the frequency and quantity of smoking on people up to 25 years old. The association may differ by ethnicity, age, gender, and other factors.

How I did it: (My helpful narrative description to help you get going.) I look through the codebook and find some variables of interest. I searched the text with “Ctrl-F” (find) to find these variables. For each variable, I copy/paste the description here, then formatted it so it’s organized. You can choose to use a table or an outline format. I found this verbatim text format to be very easy to format. I retained the “frequency” of each response because it’s interesting to know, and because it was already in the codebook — this value is not required for your codebook.

### Codebook

Dataset: NESARC
Primary association: nicotine dependence vs frequency and quantity of smoking

Key:
VarName
Variable description
Data type (Continuous, Discrete, Nominal, Ordinal)
Frequency ItemValue Description

IDNUM
UNIQUE ID NUMBER WITH NO ALPHABETICS
Nominal
43093 1-43093. Unique Identification number

SEX
SEX
Nominal
18518 1. Male
24575 2. Female

AGE
AGE
Continuous
43079 18-97. Age in years
14 98. 98 years or older

CHECK321
CIGARETTE SMOKING STATUS
Nominal
9913 1. Smoked cigarettes in the past 12 months
8078 2. Smoked cigarettes prior to the last 12 months
22 9. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

TAB12MDX
NICOTINE DEPENDENCE IN THE LAST 12 MONTHS
Nominal
38131 0. No nicotine dependence
4962 1. Nicotine dependence

S3AQ3B1
USUAL FREQUENCY WHEN SMOKED CIGARETTES
Ordinal
14836 1. Every day
460 2. 5 to 6 Day(s) a week
687 3. 3 to 4 Day(s) a week
747 4. 1 to 2 Day(s) a week
409 5. 2 to 3 Day(s) a month
772 6. Once a month or less
102 9. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

ETHRACE2A
IMPUTED RACE/ETHNICITY
Nominal
24507 1. White, Not Hispanic or Latino
8245 2. Black, Not Hispanic or Latino
701 3. American Indian/Alaska Native, Not Hispanic or Latino
1332 4. Asian/Native Hawaiian/Pacific Islander, Not Hispanic or Latino
8308 5. Hispanic or Latino

S3AQ3C1
USUAL QUANTITY WHEN SMOKED CIGARETTES
Discrete
17751 1-98. Cigarette(s)
262 99. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

## Class 04 Citations

I will use (Beckschäfer et al. 2014; Richert 2013; Murphy 2012; Dean and ebrary, Inc 2014) for this work which was obtained using [@beck2014; @rich2013; @murp2012; @dean2014]. Plus, Beckschäfer et al. (2014) says some interesting stuff and that citation was obtained using @beck2014. For more documentation on bibliographies and citations with R Markdown, see http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html. For general help with R Markdown, see https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf.

## Class 05 Research Questions

See Class 06 below.

## Class 06 Literature Review

Dataset: National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), with codebook NESARC_W1_CodeBook.pdf.

Research question:

1. Is nicotine dependence [S3AQ10D] associated with smoking frequency [S3AQ3B1] and quantity [S3AQ3C1]?
• Google scholar search: “nicotine dependence smoking frequency”
• Citation: pdf file available for Dierker et al. (2007)
• Interesting points: Figures 2 and 3, quantity and frequency both positively related to probability of dependence.
• Others: Kandel and Chen (2000) and Caraballo, Novak, and Asman (2009)

You don’t need to include images in your literature review. I’m providing these tables to illustrate what these tables look like:

1. Is nicotine dependence [S3AQ10D] associated with depression [S4AQ1]?
• Google scholar search: “nicotine dependence depression”
• Citation: pdf file available for Naomi Breslau (1995)
• Interesting points: Table 2, Smoking and Nicotine Dependence both associated with Education. Table 3, Major depression associated with being nicotine dependent and Sex.
• Others: N. Breslau et al. (1998)
1. Is the associated between nicotine dependence [S3AQ10D] and depression [S4AQ1] different by demographics, such as Education or Sex?
• In Question 2 we see differences by Education and Sex.

I have decided to further focus my question by examining whether the association between nicotine dependence and depression differs based on how much a person smokes. I am wondering if at low levels of smoking compared to high levels, nicotine dependence is more common among individuals with major depression than those without major depression. I add relevant depression questions/items/variables to my personal codebook as well as several demographic measures (age, gender, ethnicity, education, etc.) and any other variables I may wish to consider.

All required variables have been found and added to my personal codebook (by expanding Class 03).

# References

Beckschäfer, P, L Fehrmann, Rd Harrison, J Xu, and C Kleinn. 2014. “Mapping Leaf Area Index in Subtropical Upland Ecosystems Using RapidEye Imagery and the randomForest Algorithm.” iForest - Biogeosciences and Forestry 7 (1): 1–11. doi:10.3832/ifor0968-006.

Breslau, N., E. L. Peterson, L. R. Schultz, H. D. Chilcoat, and P. Andreski. 1998. “Major Depression and Stages of Smoking. a Longitudinal Investigation.” Archives of General Psychiatry 55 (2): 161–66.

Breslau, Naomi. 1995. “Psychiatric Comorbidity of Smoking and Nicotine Dependence.” Behavior Genetics 25 (2). Springer: 95–101.

Caraballo, Ralph S., Scott P. Novak, and Katherine Asman. 2009. “Linking Quantity and Frequency Profiles of Cigarette Smoking to the Presence of Nicotine Dependence Symptoms Among Adolescent Smokers: Findings from the 2004 National Youth Tobacco Survey.” Nicotine & Tobacco Research, January, ntn008. doi:10.1093/ntr/ntn008.

Dean, Jared, and ebrary, Inc. 2014. Big Data, Data Mining, and Machine Learning Value Creation for Business Leaders and Practitioners. Wiley & SAS Business Series. Hoboken, NJ: Wiley.

Dierker, Lisa C., Eric Donny, Stephen Tiffany, Suzanne M. Colby, Nicholas Perrine, and Richard R. Clayton. 2007. “The Association Between Cigarette Smoking and DSM-IV Nicotine Dependence Among First Year College Students.” Drug and Alcohol Dependence 86 (2–3): 106–14. doi:10.1016/j.drugalcdep.2006.05.025.

Kandel, D. B., and K. Chen. 2000. “Extent of Smoking and Nicotine Dependence in the United States: 1991-1993.” Nicotine & Tobacco Research: Official Journal of the Society for Research on Nicotine and Tobacco 2 (3): 263–74.

Murphy, Kevin P. 2012. Machine Learning a Probabilistic Perspective. Adaptive Computation and Machine Learning Series. Cambridge, Mass: MIT Press.

Richert, Willi. 2013. Building Machine Learning Systems with Python. Birmingham, UK: Packt Publishing.