Download the R markdown file for this lecture.
We have looked previously at models with one and two factors.
In this lecture, we generalize to models with arbitrarily many factors.
Suppose that we have a response variable and q factors.
The most complex model will include the qth order interaction term and all lower order interactions and main effects.
\[Y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + (\alpha\beta)_{ij} + (\beta\gamma)_{jk} +(\alpha\gamma)_{ik} + (\alpha\beta\gamma)_{ijk} + \varepsilon_{ijkl}\]
where \((\alpha\beta\gamma)_{ijk}\) is a third order interaction.
All lower order (second order) interactions are also included in the model.
As usual, we will seek the simplest model that fits the data adequately.
However, models that we consider should satisfy the following constraint: if a model contains a particular interaction, then it must also include all lower order interactions.
E.g., if a model includes the third order interaction A:B:D
, then it should also include the two way interactions A:B
, B:D
, and A:D
.
This is the Principle of Marginality.
We saw it operate earlier with polynomial regression models, where all lower order terms should be present in a pth order model.
If an interaction involving a factor is significant, then we have evidence that the factor is associated with the response even if its main effect is not significant.
In an unbalanced model the order that the factors (and their interactions) are listed is important in terms of what is being adjusted for when conducting tests.
In an orthogonal design, the ordering of the factors is unimportant.
On the right hand side of a linear model formula in R, single variable names indicate inclusion of main effects while colon separated variable names indicate interactions, e.g. A:B
Variable names ‘multiplied’ together indicates the interaction between those terms that are multiplied, and all lower order terms.
For example
A*B*C = A + B + C + A:B + B:C + C:A + A:B:C
Brackets can be expanded in a model formula in the expected manner. For example
(A+B)*C = A + B + C + A:C + B:C
Suppose that we have factors A, B, C and D whose effects are represented by Greek characters in the obvious manner.
Consider the model with (mathematical) equation
$$Y_{ijklm} = \mu + \alpha_i + \beta_j + \gamma_k + \delta_l + (\beta\gamma)_{jk}
+(\alpha\gamma)_{ik} + \varepsilon_{ijklm}$$
The R formula for this model is
y ~ (A+B)*C + D
Experiment done to investigate the effect of various factors on swimming speed. Time to swim one 25 metre lap is response.
Factors are indicator variables (1=yes, 0=no) for wearing shirt, goggles, and flippers.
Design was complete and balanced 23 factorial (i.e. 3 factors each at 2 levels), with three replications at each treatment.
Reading in the Data
## Swim <- read.csv(file = "swim.csv", header = TRUE)
Swim
Time Shirt Goggles Flippers
1 16.55 1 1 1
2 17.22 1 1 1
3 17.70 1 1 1
4 21.53 1 1 0
5 22.49 1 1 0
6 22.50 1 1 0
7 17.77 1 0 1
8 17.43 1 0 1
9 18.70 1 0 1
10 23.78 1 0 0
11 24.29 1 0 0
12 24.89 1 0 0
13 16.14 0 1 1
14 16.39 0 1 1
15 16.40 0 1 1
16 19.97 0 1 0
17 19.95 0 1 0
18 20.32 0 1 0
19 16.85 0 0 1
20 17.80 0 0 1
21 16.81 0 0 1
22 22.63 0 0 0
23 22.81 0 0 0
24 22.31 0 0 0
<- lm(Time ~ Shirt * Goggles * Flippers, data = Swim)
Swim.lm anova(Swim.lm)
Analysis of Variance Table
Response: Time
Df Sum Sq Mean Sq F value Pr(>F)
Shirt 1 11.303 11.303 49.4595 2.829e-06 ***
Goggles 1 14.900 14.900 65.1998 4.915e-07 ***
Flippers 1 158.672 158.672 694.3430 1.314e-14 ***
Shirt:Goggles 1 0.057 0.057 0.2496 0.624161
Shirt:Flippers 1 1.766 1.766 7.7272 0.013388 *
Goggles:Flippers 1 3.368 3.368 14.7361 0.001449 **
Shirt:Goggles:Flippers 1 0.039 0.039 0.1716 0.684232
Residuals 16 3.656 0.229
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Swim.lm)
Call:
lm(formula = Time ~ Shirt * Goggles * Flippers, data = Swim)
Residuals:
Min 1Q Median 3Q Max
-0.64333 -0.28083 0.00833 0.25917 0.73333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.5833 0.2760 81.825 < 2e-16 ***
Shirt 1.7367 0.3903 4.449 0.000404 ***
Goggles -2.5033 0.3903 -6.414 8.58e-06 ***
Flippers -5.4300 0.3903 -13.912 2.35e-10 ***
Shirt:Goggles 0.3567 0.5520 0.646 0.527345
Shirt:Flippers -0.9233 0.5520 -1.673 0.113817
Goggles:Flippers 1.6600 0.5520 3.007 0.008351 **
Shirt:Goggles:Flippers -0.3233 0.7806 -0.414 0.684232
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.478 on 16 degrees of freedom
Multiple R-squared: 0.9811, Adjusted R-squared: 0.9729
F-statistic: 118.8 on 7 and 16 DF, p-value: 1.369e-12
par(mfrow = c(2, 2))
plot(Swim.lm)
hat values (leverages) are all = 0.3333333
and there are no factor predictors; no plot no. 5
Experiment done to investigate the effect of various factors on pizza delivery time (response, in minutes).
Factors are:
Crust |
Thick crust? No=0, Yes=1 |
Coke |
Coke ordered? No=0, Yes=1 |
Bread |
Garlic bread? No=0, Yes=1 |
Data source: Mackisack, M. S. (1994). What is the use of experiments conducted by statistics students? Journal of Statistics Education, 2.
Create the appropriate model for this experiment, and use the following commands to answer the questions below.
anova(Pizza.lm)
coef(Pizza.lm)
Does the delivery time depend on whether or not coke is ordered?
What is the estimated difference in mean delivery time between:
Order A: thick crust, coke but not garlic bread.
Order B: coke, but no thick crust or garlic bread.
Comments
Because the experimental design is orthogonal, all P-values in the ANOVA table can be interpreted as unadjusted for other factors.
The interactions
Goggles:Flippers
andShirt:Flippers
are statistically significant. This indicates that the effect of wearing a shirt or goggles on swimming speed depends on whether or not the swimmer is wearing flippers.All other interactions are not so important.
The main effects estimates indicate that
Goggles
and particularlyFlippers
improve swimming speed (reduce ), whileShirt
slows the swimmer.The (positive) coefficient for the
Goggles:Flippers
term indicates that the combination of goggles and flippers does not reduce swimming time by quite as much as the main effects alone would suggest.