Rubric

  1. (1 p) Dataset is specified.

  2. (1 p) Variables from the personal codebook are indicated with admissible values.

  3. (4 p) Set up an “editrules.txt” text file (can be a name of your choosing) which includes the admissible values for each variable. Follow my example file on the website – it will save you lots of time!

  4. (1 p) Add a section of code after you rename variables, but before you make any substitutions or label the factor levels (same place I put my “data cleaning” code, this way you check the data before making any important modifications). Read your “editrules.txt” file and display it with readLines().

  5. (1 p) Use library(editrules) and function editfile([yourfilename]) to define the edit rules and display those.

  6. (1 p) Summarize your violated edits with violatedEdits() and display them with summary().

  7. (1 p) Summarize in words what your violated edits indicate (you do not need to fix them in class). For example, in my NESARC subset I see that the number of violations is the same as the number of NAs for each variable that the violations apply to. Therefore, I would summarize that “I think the violations all have to do with missing values, so I would investigate that next.”


In-class: Set up data cleaning edit rules and summarize violated edits, then describe what you would do next to deal with violations.

See “HW 04” as starting point.