# Rubric

PDS Ch 3: Personal Codebook (slightly modified)

1. (1 p) Is there a datset and a topic of interest?

2. (1 p) Are the variables relavant to the question (or related questions)?

3. (1 p) Is a unique identifier variable included?

4. (4 p) Are there at least 2 categorical and 2 numerical variables (at least 4 “data” variables)?
• 1 categorical variable with only 2 levels
• 1 categorical variable with at least 3 levels
• 2 numerical variables with many possible unique values
• More variables are welcome and you’re likely to add to this later in the semester
5. (3 p) For each variable is there a variable description, a data type, and coded value descriptions?

Below I give an example of a personal codebook to help get you started on this assignment. This is the beginning of your investigation toward answering a few questions that you’ll pursue throughout the semester.

The purpose of this assignment is to

1. select a dataset (PDS Ch 2),

2. identify a specific topic of interest (PDS Ch 3), and

3. prepare a codebook of your own (as the example below) from the larger codebook that includes the questions/items/variables that measure your selected topics (PDS Ch 3).

Your codebook may continue to develop during your literature review next week.

# Question of interest

Dataset: National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), with codebook wv1codebook.pdf.

Initial thinking (not for your answer, an example of how to think through this): While nicotine dependence is a good starting point, I need to determine what it is about nicotine dependence that I am interested in. It strikes me that friends and acquaintances that I have known through the years that became hooked on cigarettes did so across very different periods of time. Some seemed to be dependent soon after their first few experiences with smoking and others after many years of generally irregular smoking behavior. I decide that I am most interested in exploring the association between level of smoking and nicotine dependence. I add to my codebook variables reflecting smoking levels (e.g. smoking quantity and frequency).

Topic of interest: I have decided to investigate the relationship between nicotine dependence and the frequency and quantity of smoking on people up to 25 years old. The association may differ by ethnicity, age, gender, and other factors.

How I did it (not needed for your answer): I look through the codebook wv1codebook.pdf and find some variables of interest. I searched the text with “Ctrl-F” (find) to find these variables. For each variable, I copy/paste the description here, then formatted so it’s organized. You can choose to use a table or an outline format. I found this text format to be very easy to format. I retained the “frequency” of each response because it’s interesting to know, and because it was already in the codebook — this value is not required for your codebook.

# Codebook

Dataset: NESARC
Primary association: nicotine dependence vs frequency and quantity of smoking

Key:
RenamedVarName
VarName original in dataset
Variable description
Data type (Continuous, Discrete, Nominal, Ordinal)
Frequency ItemValue Description

IDNUM
IDNUM
UNIQUE ID NUMBER WITH NO ALPHABETICS
Nominal
43093 1-43093. Unique Identification number

Sex
SEX
SEX
Nominal
18518 1. Male
24575 2. Female

Age
AGE
AGE in years
Continuous
43079 18-97. Age in years
14 98. 98 years or older

SmokingStatus
CHECK321
CIGARETTE SMOKING STATUS
Nominal
9913 1. Smoked cigarettes in the past 12 months
8078 2. Smoked cigarettes prior to the last 12 months
22 9. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

TobaccoDependence
TAB12MDX
NICOTINE DEPENDENCE IN THE LAST 12 MONTHS
Nominal
38131 0. No nicotine dependence
4962 1. Nicotine dependence

SmokingFreq
S3AQ3B1
USUAL FREQUENCY WHEN SMOKED CIGARETTES
Ordinal
14836 1. Every day
460 2. 5 to 6 Day(s) a week
687 3. 3 to 4 Day(s) a week
747 4. 1 to 2 Day(s) a week
409 5. 2 to 3 Day(s) a month
772 6. Once a month or less
102 9. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

DailyCigsSmoked
S3AQ3C1
USUAL QUANTITY WHEN SMOKED CIGARETTES
Discrete
17751 1-98. Cigarette(s)
262 99. Unknown
25080 BL. NA, never or unknown if ever smoked 100+ cigarettes

Ethnicity
ETHRACE2A
IMPUTED RACE/ETHNICITY
Nominal
24507 1. White, Not Hispanic or Latino
8245 2. Black, Not Hispanic or Latino
701 3. American Indian/Alaska Native, Not Hispanic or Latino
1332 4. Asian/Native Hawaiian/Pacific Islander, Not Hispanic or Latino
8308 5. Hispanic or Latino

Depression
MAJORDEPLIFE
7839 1. Yes