# ADA1: Class 10, Logarithmic Transformation

Advanced Data Analysis 1, Stat 427/527, Fall 2023, Prof. Erik Erhardt, UNM

Author

Published

September 17, 2023

# Rubric

Answer the questions in this document, compile to html, print to pdf, and submit to UNM Canvas.

1. (2 p) Read and plot data, with $$x$$ = Avg_Mercury vs $$y$$ = Alkalinity.

2. (1 p) Describe the relationship you see.

3. (4 p) Determine an appropriate transformation of the $$x$$-variable, $$y$$-variable, or both in order to have a straight-line relationship.

• I recommend creating three more plots: $$(\log(x), y)$$, $$(x,\log(y))$$, and $$(\log(x), \log(y))$$. Choose the one that, in your view, is best described by a straight line.

• Describe in a sentence what makes this one the best choice.

4. (3 p) Interpret the slope on the transformed scale. For example, “For each unit increase in [$$x$$-variable], …”

## 1. Read and plot data

Save the datafile from the website to your computer. Read the data.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# the "skip = 25" ignores the first 25 lines of the text file (where I put descriptive text)
#   and starts reading at line 26.
dat_fish <-
, skip = 25
)
str(dat_fish)
'data.frame':   53 obs. of  12 variables:
$ID : int 1 2 3 4 5 6 7 8 9 10 ...$ Lake                  : chr  "Alligator" "Annie" "Apopka" "BlueCypress" ...
$Alkalinity : num 5.9 3.5 116 39.4 2.5 19.6 5.2 71.4 26.4 4.8 ...$ pH                    : num  6.1 5.1 9.1 6.9 4.6 7.3 5.4 8.1 5.8 6.4 ...
$Calcium : num 3 1.9 44.1 16.4 2.9 4.5 2.8 55.2 9.2 4.6 ...$ Chlorophyll           : num  0.7 3.2 128.3 3.5 1.8 ...
$Avg_Mercury : num 1.23 1.33 0.04 0.44 1.2 0.27 0.48 0.19 0.83 0.81 ...$ No.samples            : int  5 7 6 12 12 14 10 12 24 12 ...
$min : num 0.85 0.92 0.04 0.13 0.69 0.04 0.3 0.08 0.26 0.41 ...$ max                   : num  1.43 1.9 0.06 0.84 1.5 0.48 0.72 0.38 1.4 1.47 ...
$X3_yr_Standard_Mercury: num 1.53 1.33 0.04 0.44 1.33 0.25 0.45 0.16 0.72 0.81 ...$ age_data              : int  1 0 0 0 1 1 1 1 1 1 ...

Plot $$x$$ = Avg_Mercury vs $$y$$ = Alkalinity on their natural (original) scales.

## 3. Transform and plot

Note

Note, there are two ways to plot the transformed data in ggplot().

Do either of these but not both.

1. Transform variables, plot transformed variables.
2. Plot original variable with rescaled axes.
Warning

Do not plot transformed variables on scaled axes, since that’s like transforming twice: $$\log(\log(x))$$.

# With ggplot() consider using these "scale_?_log10()"" commands
#   to plot the original variables with scaled axes.
#   Compare to plotting the transformed variables directly.
p <- p + scale_x_log10()
p <- p + scale_y_log10()