Download the R markdown file for this lecture.

We have looked previously at models with one and two factors.

In this lecture, we generalize to models with arbitrarily many factors.

Models with Many Factors and Their Interactions

Suppose that we have a response variable and q factors.

The most complex model will include the qth order interaction term and all lower order interactions and main effects.

\[Y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + (\alpha\beta)_{ij} + (\beta\gamma)_{jk} +(\alpha\gamma)_{ik} + (\alpha\beta\gamma)_{ijk} + \varepsilon_{ijkl}\]

where \((\alpha\beta\gamma)_{ijk}\) is a third order interaction.

All lower order (second order) interactions are also included in the model.

The Principle of Marginality

As usual, we will seek the simplest model that fits the data adequately.

However, models that we consider should satisfy the following constraint: if a model contains a particular interaction, then it must also include all lower order interactions.

E.g., if a model includes the third order interaction A:B:D, then it should also include the two way interactions A:B, B:D, and A:D.

This is the Principle of Marginality.

We saw it operate earlier with polynomial regression models, where all lower order terms should be present in a pth order model.

Significance Testing

If an interaction involving a factor is significant, then we have evidence that the factor is associated with the response even if its main effect is not significant.

In an unbalanced model the order that the factors (and their interactions) are listed is important in terms of what is being adjusted for when conducting tests.

In an orthogonal design, the ordering of the factors is unimportant.

Model Formulae in R

On the right hand side of a linear model formula in R, single variable names indicate inclusion of main effects while colon separated variable names indicate interactions, e.g. A:B

Variable names ‘multiplied’ together indicates the interaction between those terms that are multiplied, and all lower order terms.

For example
A*B*C = A + B + C + A:B + B:C + C:A + A:B:C

Brackets can be expanded in a model formula in the expected manner. For example
(A+B)*C = A + B + C + A:C + B:C

R Formula from Model Equation

Suppose that we have factors A, B, C and D whose effects are represented by Greek characters in the obvious manner.

Consider the model with (mathematical) equation

$$Y_{ijklm} = \mu + \alpha_i + \beta_j + \gamma_k + \delta_l + (\beta\gamma)_{jk}
+(\alpha\gamma)_{ik} + \varepsilon_{ijklm}$$

The R formula for this model is
y ~ (A+B)*C + D

Swimming Data Example

Experiment done to investigate the effect of various factors on swimming speed. Time to swim one 25 metre lap is response.

Factors are indicator variables (1=yes, 0=no) for wearing shirt, goggles, and flippers.

Design was complete and balanced 23 factorial (i.e. 3 factors each at 2 levels), with three replications at each treatment.

swimming

R Code for Example

Reading in the Data

`Download swim.csv

## Swim <- read.csv(file = "swim.csv", header = TRUE)
Swim
    Time Shirt Goggles Flippers
1  16.55     1       1        1
2  17.22     1       1        1
3  17.70     1       1        1
4  21.53     1       1        0
5  22.49     1       1        0
6  22.50     1       1        0
7  17.77     1       0        1
8  17.43     1       0        1
9  18.70     1       0        1
10 23.78     1       0        0
11 24.29     1       0        0
12 24.89     1       0        0
13 16.14     0       1        1
14 16.39     0       1        1
15 16.40     0       1        1
16 19.97     0       1        0
17 19.95     0       1        0
18 20.32     0       1        0
19 16.85     0       0        1
20 17.80     0       0        1
21 16.81     0       0        1
22 22.63     0       0        0
23 22.81     0       0        0
24 22.31     0       0        0

ANOVA Table and model summary

Swim.lm <- lm(Time ~ Shirt * Goggles * Flippers, data = Swim)
anova(Swim.lm)
Analysis of Variance Table

Response: Time
                       Df  Sum Sq Mean Sq  F value    Pr(>F)    
Shirt                   1  11.303  11.303  49.4595 2.829e-06 ***
Goggles                 1  14.900  14.900  65.1998 4.915e-07 ***
Flippers                1 158.672 158.672 694.3430 1.314e-14 ***
Shirt:Goggles           1   0.057   0.057   0.2496  0.624161    
Shirt:Flippers          1   1.766   1.766   7.7272  0.013388 *  
Goggles:Flippers        1   3.368   3.368  14.7361  0.001449 ** 
Shirt:Goggles:Flippers  1   0.039   0.039   0.1716  0.684232    
Residuals              16   3.656   0.229                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Swim.lm)

Call:
lm(formula = Time ~ Shirt * Goggles * Flippers, data = Swim)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.64333 -0.28083  0.00833  0.25917  0.73333 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             22.5833     0.2760  81.825  < 2e-16 ***
Shirt                    1.7367     0.3903   4.449 0.000404 ***
Goggles                 -2.5033     0.3903  -6.414 8.58e-06 ***
Flippers                -5.4300     0.3903 -13.912 2.35e-10 ***
Shirt:Goggles            0.3567     0.5520   0.646 0.527345    
Shirt:Flippers          -0.9233     0.5520  -1.673 0.113817    
Goggles:Flippers         1.6600     0.5520   3.007 0.008351 ** 
Shirt:Goggles:Flippers  -0.3233     0.7806  -0.414 0.684232    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.478 on 16 degrees of freedom
Multiple R-squared:  0.9811,    Adjusted R-squared:  0.9729 
F-statistic: 118.8 on 7 and 16 DF,  p-value: 1.369e-12

Comments

Because the experimental design is orthogonal, all P-values in the ANOVA table can be interpreted as unadjusted for other factors.

The interactions Goggles:Flippers and Shirt:Flippers are statistically significant. This indicates that the effect of wearing a shirt or goggles on swimming speed depends on whether or not the swimmer is wearing flippers.

All other interactions are not so important.

The main effects estimates indicate that Goggles and particularly Flippers improve swimming speed (reduce ), while Shirt slows the swimmer.

The (positive) coefficient for the Goggles:Flippers term indicates that the combination of goggles and flippers does not reduce swimming time by quite as much as the main effects alone would suggest.

Model Diagnostic Plots

par(mfrow = c(2, 2))
plot(Swim.lm)
hat values (leverages) are all = 0.3333333
 and there are no factor predictors; no plot no. 5

Where’s My Pizza?

Experiment done to investigate the effect of various factors on pizza delivery time (response, in minutes).

Download pizza.csv

pizza image

Data source: Mackisack, M. S. (1994). What is the use of experiments conducted by statistics students? Journal of Statistics Education, 2.

Your task

Create the appropriate model for this experiment, and use the following commands to answer the questions below.

anova(Pizza.lm)
coef(Pizza.lm)

Questions

  1. Does the delivery time depend on whether or not coke is ordered?

  2. What is the estimated difference in mean delivery time between:

Order A: thick crust, coke but not garlic bread.

Order B: coke, but no thick crust or garlic bread.