In this example we’ll use NM county-level poverty data to understand how counties differ by living conditions, and how those living conditions vary together. We hope to reduce our 13-dimensional dataset to the vital few components that explain about 75% of the variability.
Maps: Labels are easier to read on the left, but road features on right make the counties easier to place.
Here is a description of the codebook for this data.
NM county-level poverty data from S16 student:
Nathan Dobie, Student Technical Specialist, Bureau of Business Economic Research, UNM
Thanks, Nathan!
Data combined from:
http://bber.unm.edu/county-profiles (poverty)
http://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP04/0400000US35 (other values)
http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt (county names)
DATA COLUMNS:
1 area
2 county
3 periodyear (2014)
-Vacancy Status %
4 Homeowner vacancy rate
5 Rental vacancy rate
-Occupancy Status %
6 Owner-occupied
7 Renter-occupied
-Main source of heating (% of homes)
8 Utility gas
9 Electricity
10 Wood
11 Lacking complete plumbing facilities %
12 No telephone service available %
13 rentover35 (gross rent as a percentage of household income (grapi))
-Poverty
14 est_percent (Estimated percent of people of all ages in poverty)
15 child_percent (Estimate of people age 0-17 in poverty)
16 fam_percent (Estimated percent of related children age 5-17 in families in poverty)
library(tidyverse)
# First, download the data to your computer,
# save in the same folder as this Rmd file.
# read the data
dat_nmcensus <-
read_csv(
"ADA2_HW_22_PCA_NMCensusPovertyHousingCharacteristics_DP04.csv"
, skip = 1
) %>%
rename(
# Shorter column names
"Area" = "area"
, "County" = "county"
, "Year" = "periodyear"
, "VacantH" = "Homeowner vacancy rate"
, "VacantR" = "Rental vacancy rate"
, "Owner" = "Owner-occupied"
, "Renter" = "Renter-occupied"
, "HeatG" = "Utility gas"
, "HeatE" = "Electricity"
, "HeatW" = "Wood"
, "NoPlumb" = "Lacking complete plumbing facilities"
, "NoPhone" = "No telephone service available"
, "Rent35" = "rentover35"
, "PovAll" = "est_percent"
, "PovChild" = "child_percent"
, "PovFam" = "fam_percent"
) %>%
filter(
# remove state average, use county-level
Area != 0
)
# remove column attributes from read_csv()
attr(dat_nmcensus, "spec") <- NULL
# columns to use for analysis,
use_col_ind <- c(4:6, 8:14)
use_col_names <- names(dat_nmcensus)[use_col_ind]
use_col_names
[1] "VacantH" "VacantR" "Owner" "HeatG" "HeatE" "HeatW" "NoPlumb" "NoPhone" "Rent35" "PovAll"
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 33 obs. of 16 variables:
$ Area : num 1 3 5 6 7 9 11 13 15 17 ...
$ County : chr "Bernalillo" "Catron" "Chaves" "Cibola" ...
$ Year : num 2014 2014 2014 2014 2014 ...
$ VacantH : num 1.7 14.8 2.3 1.6 7.4 3.9 11.4 2.1 0.4 3.2 ...
$ VacantR : num 6.9 7.5 7.8 6.8 20.4 7 8.5 7.4 7.5 8.1 ...
$ Owner : num 62.4 87.2 65.4 74.8 67.6 59.4 82.7 64.7 73.5 75.6 ...
$ Renter : num 37.6 12.8 34.6 25.2 32.4 40.6 17.3 35.3 26.5 24.4 ...
$ HeatG : num 81.7 3.8 50.1 49.1 50.2 47 46.6 70.4 53.5 51.4 ...
$ HeatE : num 13 2.8 42.6 10 15.4 46.4 19.1 15.6 38.7 18.6 ...
$ HeatW : num 2 51.2 1.9 21.7 13.7 1.1 9 1.6 1 10.6 ...
$ NoPlumb : num 0.5 0.9 0.5 5.2 0.1 0.1 0 0.7 0.7 1.2 ...
$ NoPhone : num 3 2.4 2.8 3.5 4.4 3.2 4.5 3.1 2.2 3 ...
$ Rent35 : num 43.8 51.7 36.7 45.1 38 42 0 46.9 31.4 41.9 ...
$ PovAll : num 18.7 22.2 23.4 28.8 20.5 19.2 20.6 27.9 14.1 19.1 ...
$ PovChild: num 24.5 42.8 32.4 37.6 30.6 27.3 32.1 39.4 18.5 27.8 ...
$ PovFam : num 22.6 40.1 28.7 35.9 27.2 26.7 31.6 36 17.3 25.3 ...
Place your code to subset, filter, or transform variables in this code chunk below.