1 NM County-level Poverty Data

In this example we’ll use NM county-level poverty data to understand how counties differ by living conditions, and how those living conditions vary together. We hope to reduce our 13-dimensional dataset to the vital few components that explain about 75% of the variability.

Maps: Labels are easier to read on the left, but road features on right make the counties easier to place.

Here is a description of the codebook for this data.

NM county-level poverty data from S16 student:
Nathan Dobie, Student Technical Specialist, Bureau of Business Economic Research, UNM
Thanks, Nathan!

Data combined from:
http://bber.unm.edu/county-profiles                                        (poverty)
http://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP04/0400000US35 (other values)
http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt  (county names)

DATA COLUMNS:
 1 area
 2 county
 3 periodyear (2014)
   -Vacancy Status %
 4   Homeowner vacancy rate
 5   Rental vacancy rate
   -Occupancy Status %
 6   Owner-occupied
 7   Renter-occupied
   -Main source of heating (% of homes)
 8   Utility gas
 9   Electricity
10   Wood
11 Lacking complete plumbing facilities %
12 No telephone service available %
13 rentover35        (gross rent as a percentage of household income (grapi))
   -Poverty
14   est_percent     (Estimated percent of people of all ages in poverty)
15   child_percent   (Estimate of people age 0-17 in poverty)
16   fam_percent     (Estimated percent of related children age 5-17 in families in poverty)
library(tidyverse)

# First, download the data to your computer,
#   save in the same folder as this Rmd file.

# read the data
dat_nmcensus <-
  read_csv(
    "ADA2_HW_22_PCA_NMCensusPovertyHousingCharacteristics_DP04.csv"
  , skip = 1
  ) %>%
  rename(
    # Shorter column names
    "Area"     = "area"
  , "County"   = "county"
  , "Year"     = "periodyear"
  , "VacantH"  = "Homeowner vacancy rate"
  , "VacantR"  = "Rental vacancy rate"
  , "Owner"    = "Owner-occupied"
  , "Renter"   = "Renter-occupied"
  , "HeatG"    = "Utility gas"
  , "HeatE"    = "Electricity"
  , "HeatW"    = "Wood"
  , "NoPlumb"  = "Lacking complete plumbing facilities"
  , "NoPhone"  = "No telephone service available"
  , "Rent35"   = "rentover35"
  , "PovAll"   = "est_percent"
  , "PovChild" = "child_percent"
  , "PovFam"   = "fam_percent"
  ) %>%
  filter(
    # remove state average, use county-level
    Area != 0
  )

# remove column attributes from read_csv()
attr(dat_nmcensus, "spec") <- NULL

# columns to use for analysis,
use_col_ind <- c(4:6, 8:14)
use_col_names <- names(dat_nmcensus)[use_col_ind]
use_col_names
 [1] "VacantH" "VacantR" "Owner"   "HeatG"   "HeatE"   "HeatW"   "NoPlumb" "NoPhone" "Rent35"  "PovAll" 
str(dat_nmcensus)
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    33 obs. of  16 variables:
 $ Area    : num  1 3 5 6 7 9 11 13 15 17 ...
 $ County  : chr  "Bernalillo" "Catron" "Chaves" "Cibola" ...
 $ Year    : num  2014 2014 2014 2014 2014 ...
 $ VacantH : num  1.7 14.8 2.3 1.6 7.4 3.9 11.4 2.1 0.4 3.2 ...
 $ VacantR : num  6.9 7.5 7.8 6.8 20.4 7 8.5 7.4 7.5 8.1 ...
 $ Owner   : num  62.4 87.2 65.4 74.8 67.6 59.4 82.7 64.7 73.5 75.6 ...
 $ Renter  : num  37.6 12.8 34.6 25.2 32.4 40.6 17.3 35.3 26.5 24.4 ...
 $ HeatG   : num  81.7 3.8 50.1 49.1 50.2 47 46.6 70.4 53.5 51.4 ...
 $ HeatE   : num  13 2.8 42.6 10 15.4 46.4 19.1 15.6 38.7 18.6 ...
 $ HeatW   : num  2 51.2 1.9 21.7 13.7 1.1 9 1.6 1 10.6 ...
 $ NoPlumb : num  0.5 0.9 0.5 5.2 0.1 0.1 0 0.7 0.7 1.2 ...
 $ NoPhone : num  3 2.4 2.8 3.5 4.4 3.2 4.5 3.1 2.2 3 ...
 $ Rent35  : num  43.8 51.7 36.7 45.1 38 42 0 46.9 31.4 41.9 ...
 $ PovAll  : num  18.7 22.2 23.4 28.8 20.5 19.2 20.6 27.9 14.1 19.1 ...
 $ PovChild: num  24.5 42.8 32.4 37.6 30.6 27.3 32.1 39.4 18.5 27.8 ...
 $ PovFam  : num  22.6 40.1 28.7 35.9 27.2 26.7 31.6 36 17.3 25.3 ...

Place your code to subset, filter, or transform variables in this code chunk below.

dat_nmcensus <-
  dat_nmcensus %>%
  filter(
    TRUE
  )

1.1 (2 p) Scatterplot matrix of variables of interest

# Scatterplot matrix
library(ggplot2)
library(GGally)
p <-
  ggpairs(
    dat_nmcensus %>% select(use_col_names)
  )
print(p)