Lecture 36 Appendix: Models with Many Factors
We have looked previously at models with one and two factors.
In this lecture, we generalize to models with arbitrarily many factors.
36.1 Models with Many Factors and Their Interactions
Suppose that we have a response variable and q factors.
The most complex model will include the qth order interaction term and all lower order interactions and main effects.
\[Y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + (\alpha\beta)_{ij} + (\beta\gamma)_{jk} +(\alpha\gamma)_{ik} + (\alpha\beta\gamma)_{ijk} + \varepsilon_{ijkl}\]
where \((\alpha\beta\gamma)_{ijk}\) is a third order interaction.
All lower order (second order) interactions are also included in the model.
36.2 The Principle of Marginality (a.k.a. Hierarchy Principle)
As usual, we will seek the simplest model that fits the data adequately.
However, models that we consider should satisfy the following constraint: if a model contains a particular interaction, then it must also include all lower order interactions.
E.g., if a model includes the third order interaction A:B:C
, then it should also include the two way interactions A:B
, B:C
, and A:C
.
This is the Principle of Marginality (or Hierarchy Principle).
36.2.1 Significance Testing
If an interaction involving a factor is significant, then we have evidence that the factor is associated with the response even if its main effect is not significant.
In an unbalanced model the order that the factors (and their interactions) are listed is important in terms of what is being adjusted for when conducting tests.
In an orthogonal design, the ordering of the factors is unimportant.
36.3 Model Formulae in R
On the right hand side of a linear model formula in R, single variable names indicate inclusion of main effects while colon separated variable names indicate interactions, e.g. A:B
Variable names ‘multiplied’ together indicates the interaction between those terms that are multiplied, and all lower order terms.
For example
A*B*C = A + B + C + A:B + B:C + C:A + A:B:C
Brackets can be expanded in a model formula in the expected manner. For example
(A+B)*C = A + B + C + A:C + B:C
36.3.1 R Formula from Model Equation
Suppose that we have factors A, B, C and D whose effects are represented by Greek characters in the obvious manner.
Consider the model with (mathematical) equation
\[Y_{ijklm} = \mu + \alpha_i + \beta_j + \gamma_k + \delta_l + (\beta\gamma)_{jk} +(\alpha\gamma)_{ik} + \varepsilon_{ijklm}\]
The R formula for this model is
y ~ (A+B)*C + D
36.4 Swimming Data Example
Experiment done to investigate the effect of various factors on swimming speed. Time to swim one 25 metre lap is response.
Factors are indicator variables (1=yes, 0=no) for wearing shirt, goggles, and flippers.
Design was complete and balanced 23 factorial (i.e. 3 factors each at 2 levels), with three replications at each treatment.
36.4.1 R Code for Example
Reading in the Data
Time Shirt Goggles Flippers
1 16.55 1 1 1
2 17.22 1 1 1
3 17.70 1 1 1
4 21.53 1 1 0
5 22.49 1 1 0
6 22.50 1 1 0
7 17.77 1 0 1
8 17.43 1 0 1
9 18.70 1 0 1
10 23.78 1 0 0
11 24.29 1 0 0
12 24.89 1 0 0
13 16.14 0 1 1
14 16.39 0 1 1
15 16.40 0 1 1
16 19.97 0 1 0
17 19.95 0 1 0
18 20.32 0 1 0
19 16.85 0 0 1
20 17.80 0 0 1
21 16.81 0 0 1
22 22.63 0 0 0
23 22.81 0 0 0
24 22.31 0 0 0
36.4.2 ANOVA Table and model summary
Analysis of Variance Table
Response: Time
Df Sum Sq Mean Sq F value Pr(>F)
Shirt 1 11.303 11.303 49.4595 2.829e-06 ***
Goggles 1 14.900 14.900 65.1998 4.915e-07 ***
Flippers 1 158.672 158.672 694.3430 1.314e-14 ***
Shirt:Goggles 1 0.057 0.057 0.2496 0.624161
Shirt:Flippers 1 1.766 1.766 7.7272 0.013388 *
Goggles:Flippers 1 3.368 3.368 14.7361 0.001449 **
Shirt:Goggles:Flippers 1 0.039 0.039 0.1716 0.684232
Residuals 16 3.656 0.229
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = Time ~ Shirt * Goggles * Flippers, data = Swim)
Residuals:
Min 1Q Median 3Q Max
-0.64333 -0.28083 0.00833 0.25917 0.73333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.5833 0.2760 81.825 < 2e-16 ***
Shirt 1.7367 0.3903 4.449 0.000404 ***
Goggles -2.5033 0.3903 -6.414 8.58e-06 ***
Flippers -5.4300 0.3903 -13.912 2.35e-10 ***
Shirt:Goggles 0.3567 0.5520 0.646 0.527345
Shirt:Flippers -0.9233 0.5520 -1.673 0.113817
Goggles:Flippers 1.6600 0.5520 3.007 0.008351 **
Shirt:Goggles:Flippers -0.3233 0.7806 -0.414 0.684232
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.478 on 16 degrees of freedom
Multiple R-squared: 0.9811, Adjusted R-squared: 0.9729
F-statistic: 118.8 on 7 and 16 DF, p-value: 1.369e-12
36.5 Where’s My Pizza?
Students did an experiment to investigate the effect of various factors on pizza delivery time (response, in minutes).
Factors are:
Crust
Thick crust? No=0, Yes=1 Coke
Coke ordered? No=0, Yes=1 Bread
Garlic bread? No=0, Yes=1
- Design was complete and balanced \(2^3\) factorial (i.e. 3 factors each at 2 levels) with two replications at each treatment.
Data source: Mackisack, M. S. (1994). What is the use of experiments conducted by statistics students? Journal of Statistics Education, 2.
36.4.3 Comments
Because the experimental design is orthogonal, all P-values in the ANOVA table can be interpreted as unadjusted for other factors.
The interactions
Goggles:Flippers
andShirt:Flippers
are statistically significant. This indicates that the effect of wearing a shirt or goggles on swimming speed depends on whether or not the swimmer is wearing flippers.All other interactions are not so important.
The main effects estimates indicate that
Goggles
and particularlyFlippers
improve swimming speed (reduce ), whileShirt
slows the swimmer.The (positive) coefficient for the
Goggles:Flippers
term indicates that the combination of goggles and flippers does not reduce swimming time by quite as much as the main effects alone would suggest.We should check our conclusion by dropping one interaction term at a time, until a simple model is found. For brevity we go to the end of the process.