Download the R markdown file for this lecture.
In this lecture we introduce linear models with two factors.
We look at the formulation of such models, and consider significance tests for the factors.
Estimation for models with two (or more) factors can be performed by the method of least squares in the usual way.
Fitted values and residuals are also defined as usual.
Suppose that there are two factors, A and B, which may be related to the response variable.
The most straightforward way of modelling these factors is to assume that their effects are additive.
This leads to what can be termed the main effects two-way model.
The rationale for using the phrase main effects will become clear when we look at interactions.
Written in the same way as was done for the one factor model:
\[Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\] where \[z_{Aij} = \left \{ \begin{array}{ll} 1 & \mbox{unit *i* observed at level *j* of *A*}\\ 0 & \mbox{otherwise} \end{array} \right .\] and \[z_{Bij} = \left \{ \begin{array}{ll} 1 & \mbox{unit *i* observed at level *j* of *B*}\\ 0 & \mbox{otherwise} \end{array} \right .\]
No \(\alpha_1 z_{Ai1}\) and \(\beta_1 z_{Bi1}\) as we assume treatment constraints. That is, \(\alpha_1 = 0\) and \(\beta_1 = 0\).
In a two factor model we are usually interested in testing for the statistical significance of both factors.
The importance of a factor may depend upon other factors in the model, in the same way that the importance of a numerical predictor in a multiple linear regression model depends upon the other explanatory variables in the model.
The importance of the factors can be assessed by comparing appropriate nested models using F tests.
There are a number of possible main effects models based on at most two factors.
This involves comparing the nested models MAB and MA.
This can be done by testing H0: \(\beta_2 = \beta_3 = \cdots = \beta_L = 0\) versus H1:
\(\beta_2, \beta_3, \cdots, \beta_L \mbox{ not all zero}\)
The appropriate F test statistic (on \(L-1, n-K-L+1\) degrees of freedom) is
\[F = \frac{(RSS_{A} - RSS_{AB})/(L-1)}{RSS_{AB}/(n-K-L+1)}\]
As usual, large values of this F statistic provide evidence against H0 (hence give small p-values).
The relevant figures for this F test can be displayed in an ANOVA table.
Df | Sum Sq | Mean Sq | F value | P value | |
---|---|---|---|---|---|
Factor A | K-1 | SSA | MSA | fA | PA |
Factor B (adj. A) | L-1 | \(SSB|A\) | \(MSB|A\) | \(f_{B|A}\) | \(P_{B|A}\) |
Residual | n-K-L+1 | RSS | RMS | ||
Total | n-1 | TSS |
The residual row refers to residuals from model MAB.
The row for factor A gives the F statistic for testing the importance of A without adjusting for B.
The second row gives the F statistic for testing the importance of B adjusted for A as just discussed.
We can construct a new ANOVA table with first two rows swapped in order to test for A adjusted for B.
Baby rats fed by foster mothers. Genotype (A, B, I or J) recorded for both rat and foster mother. Weight recorded at a given age. Factors are genotypes of rat and mother. Does weight depend on either?
One Model…
## Rat <- read.csv(file = "ratgene.csv", header = TRUE)
Rat
Rat Mother Weight
1 A A 61.5
2 A A 68.2
3 A A 64.0
4 A A 65.0
5 A A 59.7
6 A B 55.0
7 A B 42.0
8 A B 60.2
9 A I 52.5
10 A I 61.8
11 A I 49.5
12 A I 52.7
13 A J 42.0
14 A J 54.0
15 A J 61.0
16 A J 48.2
17 A J 39.6
18 B A 60.3
19 B A 51.7
20 B A 49.3
21 B A 48.0
22 B B 50.8
23 B B 64.7
24 B B 61.7
25 B B 64.0
26 B B 62.0
27 B I 56.5
28 B I 59.0
29 B I 47.2
30 B I 53.0
31 B J 51.3
32 B J 40.5
33 I A 37.0
34 I A 36.3
35 I A 68.0
36 I B 56.3
37 I B 69.8
38 I B 67.0
39 I I 39.7
40 I I 46.0
41 I I 61.3
42 I I 55.3
43 I I 55.7
44 I J 50.0
45 I J 43.8
46 I J 54.5
47 J A 59.0
48 J A 57.4
49 J A 54.0
50 J A 47.0
51 J B 59.5
52 J B 52.8
53 J B 56.0
54 J I 45.2
55 J I 57.0
56 J I 61.4
57 J J 44.8
58 J J 51.5
59 J J 53.0
60 J J 42.0
61 J J 54.0
.1 <- lm(Weight ~ Rat + Mother, data = Rat)
Rat.lmanova(Rat.lm.1)
Analysis of Variance Table
Response: Weight
Df Sum Sq Mean Sq F value Pr(>F)
Rat 3 60.2 20.052 0.3317 0.802470
Mother 3 775.1 258.360 4.2732 0.008861 **
Residuals 54 3264.9 60.461
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.1)
Call:
lm(formula = Weight ~ Rat + Mother, data = Rat)
Residuals:
Min 1Q Median 3Q Max
-18.425 -5.584 2.499 5.416 13.745
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.909 2.478 22.964 <2e-16 ***
RatB -2.025 2.795 -0.725 0.4719
RatI -2.654 2.827 -0.939 0.3520
RatJ -2.021 2.757 -0.733 0.4668
MotherB 3.516 2.862 1.229 0.2246
MotherI -1.832 2.767 -0.662 0.5107
MotherJ -6.755 2.810 -2.404 0.0197 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared: 0.2037, Adjusted R-squared: 0.1152
F-statistic: 2.302 on 6 and 54 DF, p-value: 0.04732
And Another Model…
.2 <- lm(Weight ~ Mother + Rat, data = Rat)
Rat.lmanova(Rat.lm.2)
Analysis of Variance Table
Response: Weight
Df Sum Sq Mean Sq F value Pr(>F)
Mother 3 771.6 257.202 4.2540 0.009055 **
Rat 3 63.6 21.211 0.3508 0.788698
Residuals 54 3264.9 60.461
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.2)
Call:
lm(formula = Weight ~ Mother + Rat, data = Rat)
Residuals:
Min 1Q Median 3Q Max
-18.425 -5.584 2.499 5.416 13.745
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.909 2.478 22.964 <2e-16 ***
MotherB 3.516 2.862 1.229 0.2246
MotherI -1.832 2.767 -0.662 0.5107
MotherJ -6.755 2.810 -2.404 0.0197 *
RatB -2.025 2.795 -0.725 0.4719
RatI -2.654 2.827 -0.939 0.3520
RatJ -2.021 2.757 -0.733 0.4668
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared: 0.2037, Adjusted R-squared: 0.1152
F-statistic: 2.302 on 6 and 54 DF, p-value: 0.04732
Calculate the fitted value for genotype A rat with genotype J foster mother.
Calculate the fitted value for genotype J rat with genotype A foster mother.
Calculate the fitted value for genotype B rat with genotype B foster mother.
Comments and Conclusions
The first model indicates that genotype of rat is unimportant (P=0.802) but that genotype of foster mother adjusted for genotype of rat is statistically significant (P=0.00886).
There is little change in P-values when we alter the order of the terms to get the second model. N.B. zero change only occurs in a special case (next lecture).
Overall, it seems clear that weight is not associated with genotype of rat, so whether we adjust for this term is pretty much immaterial.
Weight is clearly associated with genotype of the foster mother.
Genotype A is reference level (baseline) for both
Mother
andRat
genotypes.The
summary()
command (which gives parameter estimates) suggests that genotype J makes for poor foster mothers.