1 Kangaroos skull measurements: crest width

What effect does sex and species have on the crest width of a kangaroo skull?

The data to be analyzed here are selected skull measurements on 148 kangaroos of known sex and species. There are 11 columns of data, corresponding to the following features. The measurements are in meters/10000 (mm/10).

column Variable name Description
1 * sex sex (1=M, 2=F)
2 * species species (0=M. giganteus, 1=M.f. melanops, 2=M.f. fuliginosus)
3 pow post orbit width
4 rw rostal width
5 sopd supra-occipital - paroccipital depth
6 * cw crest width
7 ifl incisive foramina length
8 ml mandible length
9 mw mandible width
10 md mandible depth
11 arh ascending ramus height

Some of the observations in the data set are missing (not available). These are represented by a period ., which in the read.table() function is specified by the na.strings = "." option.

fn.data <- "http://statacumen.com/teach/ADA2/worksheet/ADA2_WS_09_kang.txt"
kang <- read.table(fn.data, header=TRUE, na.strings = ".")

# subset only our columns of interest
kang <- subset(kang, select = c(sex, species, cw))

# remove observations with missing values
n.start <- nrow(kang)
kang <- na.omit(kang)
n.keep <- nrow(kang)
n.drop <- n.start - n.keep
cat("Removed", n.start, "-", n.keep, "=", n.drop, "observations with missing values.")
Removed 148 - 148 = 0 observations with missing values.
# make dose a factor variable and label the levels
kang$sex     <- factor(kang$sex    , labels = c("M","F"))
kang$species <- factor(kang$species, labels = c("Mg", "Mfm", "Mff"))

# The first few observations
head(kang)
  sex species  cw
1   M      Mg 153
2   M      Mg 141
3   M      Mg 144
4   M      Mg 116
5   M      Mg 120
6   M      Mg 188

1.1 (1 p) Interpret plots of the data, distributional centers and shapes

The side-by-side boxplots of the data compare the crest widths across the 6 combinations of sex and species. Comment on the distributional shapes and compare the typical crest widths across groups.

# Calculate the cell means for each (sex, species) combination
library(plyr)
mean(kang[, "cw"])
[1] 123.4865
kang.mean    <- ddply(kang, .(), summarise, m = mean(cw))
kang.mean
   .id        m
1 <NA> 123.4865
kang.mean.d  <- ddply(kang, .(sex), summarise, m = mean(cw))
kang.mean.d
  sex        m
1   M 111.0959
2   F 135.5467
kang.mean.i  <- ddply(kang, .(species), summarise, m = mean(cw))
kang.mean.i
  species       m
1      Mg 110.120
2     Mfm 115.625
3     Mff 144.400
kang.mean.di <- ddply(kang, .(sex,species), summarise, m = mean(cw))
kang.mean.di
  sex species        m
1   M      Mg 103.0800
2   M     Mfm 101.6522
3   M     Mff 127.8000
4   F      Mg 117.1600
5   F     Mfm 128.4800
6   F     Mff 161.0000
# Interaction plots, ggplot
library(ggplot2)
p1 <- ggplot(kang, aes(x = sex, y = cw, colour = species))
p1 <- p1 + geom_hline(aes(yintercept = 0), colour = "black"
                  , linetype = "solid", size = 0.2, alpha = 0.3)
p1 <- p1 + geom_boxplot(alpha = 0.5, outlier.size=0.1)
p1 <- p1 + geom_point(data = kang.mean.di, aes(y = m), size = 4)
p1 <- p1 + geom_line(data = kang.mean.di, aes(y = m, group = species), size = 1.5)
p1 <- p1 + labs(title = "Kangaroo interaction plot, species by sex")
#print(p1)

p2 <- ggplot(kang, aes(x = species, y = cw, colour = sex))
p2 <- p2 + geom_hline(aes(yintercept = 0), colour = "black"
                  , linetype = "solid", size = 0.2, alpha = 0.3)
p2 <- p2 + geom_boxplot(alpha = 0.5, outlier.size=0.1)
p2 <- p2 + geom_point(data = kang.mean.di, aes(y = m), size = 4)
p2 <- p2 + geom_line(data = kang.mean.di, aes(y = m, group = sex), size = 1.5)
p2 <- p2 + labs(title = "Kangaroo interaction plot, sex by species")
#print(p2)

library(gridExtra)
grid.arrange(grobs = list(p1, p2), nrow=1, top="Kangaroo crestwidth plots")

1.1.1 Solution

[answer]

1.2 (1 p) Do the plots above suggest there is an interaction?

Do the lines for each group seem to be very different from parallel?

1.2.1 Solution

[answer]

1.3 Fit the two-way interaction model

Here it is.

lm.cw.x.s.xs <- lm(cw ~ sex*species, data = kang
                , contrasts=list(sex = contr.sum, species = contr.sum))

1.4 (1 p) Check model assumptions for full model

Recall that we assume that the full model is correct before we perform model reduction by backward selection.

 48  72  74 
  1   2 148 

1.4.1 Solution

[answer]

1.5 (1 p) ANOVA table, test for interaction

Provide your conclusion for the test for interaction.

library(car)
Anova(lm.cw.x.s.xs, type=3)
Anova Table (Type III tests)

Response: cw
             Sum Sq  Df  F value    Pr(>F)    
(Intercept) 2244042   1 1643.795 < 2.2e-16 ***
sex           22556   1   16.523 7.927e-05 ***
species       34195   2   12.524 9.788e-06 ***
sex:species    2367   2    0.867    0.4224    
Residuals    193853 142                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1.5.1 Solution

[answer]

1.6 (4 p) Reduce to final model, test assumptions

If the model can be simplified (because interaction is not significant), then refit the model with only the main effects. Test whether the main effects are significant, reduce further if sensible. Test model assumptions of your final model.

1.6.1 Solution

[answer]

1.7 (2 p) Summarize the differences

Summarize differences, if any, in sexes and species using relevant multiple comparisons. Give clear interpretations of any significant effects.

This code is here to get you started. Determine which comparisons you plan to make and modify the appropriate code.

library(lsmeans)
# fill in your "lm.object", and only use the lines below that apply to your model
lsmeans(lm.object, list(pairwise ~ sex           ), adjust = "tukey")
lsmeans(lm.object, list(pairwise ~ species       ), adjust = "tukey")
lsmeans(lm.object, list(pairwise ~ sex | species ), adjust = "tukey")
lsmeans(lm.object, list(pairwise ~ species | sex ), adjust = "tukey")

1.7.1 Solution

[answer]