Lecture 22 Introduction to the Two Factor Model
In this lecture we introduce linear models with two factors.
We look at the formulation of such models, and consider significance tests for the factors.
Estimation for models with two (or more) factors can be performed by the method of least squares in the usual way.
Fitted values and residuals are also defined as usual.
22.1 A Main Effects Model for Two Factors
Suppose that there are two factors, A and B, which may be related to the response variable.
The most straightforward way of modelling these factors is to assume that their effects are additive.
This leads to what can be termed the main effects two-way model.
The rationale for using the phrase main effects will become clear when we look at interactions between factors.
22.1.1 Model for Two Factors as a Multiple Linear Regression
Written in the same way as was done for the one factor model:
\[Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\] where \[z_{Aij} = \left \{ \begin{array}{ll} 1 & \mbox{unit } i \mbox{ observed at level } j \mbox{ of } A\\ 0 & \mbox{otherwise} \end{array} \right .\] and \[z_{Bij} = \left \{ \begin{array}{ll} 1 & \mbox{unit } i \mbox{ observed at level } j \mbox{ of } B\\ 0 & \mbox{otherwise} \end{array} \right .\]
No \(\alpha_1 z_{Ai1}\) and \(\beta_1 z_{Bi1}\) as we assume treatment constraints. That is, \(\alpha_1 = 0\) and \(\beta_1 = 0\).
22.1.2 Tests for Main Effects in a Two-Way Model
In a two factor model we are usually interested in testing for the statistical significance of both factors.
The importance of a factor may depend upon other factors in the model, in the same way that the importance of a numerical predictor in a multiple linear regression model depends upon the other explanatory variables in the model.
The importance of the factors can be assessed by comparing appropriate nested models using F tests.
22.1.3 Models Under Consideration
There are a number of possible main effects models based on at most two factors A and B:
- Both A and B have an effect on the response: \[M_{AB}:~~Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\]
- Only A has an effect on the response: \[M_A:~~Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \varepsilon_i\]
- Only B has an effect on the response: \[M_B:~~Y_i = \mu + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\]
- Neither A nor B have an effect on the response: \[M_0:~~Y_i = \mu + \varepsilon_i\]
22.1.4 Testing for the Importance of B adjusted for A
This involves comparing the nested models \(M_{AB}\) and \(M_A\).
This can be done by testing \[H_0:~~\beta_2 = \beta_3 = \cdots = \beta_L = 0\] versus
\[H_1:~~\beta_2, \beta_3, \cdots, \beta_L \mbox{ not all zero}\]
The appropriate F test statistic (on \(L-1, n-K-L+1\) degrees of freedom) is
\[F = \frac{(RSS_{A} - RSS_{AB})/(L-1)}{RSS_{AB}/(n-K-L+1)}\]
As usual, large values of this F statistic provide evidence against \(H_0\) (hence give small p-values).
The relevant figures for this F test can be displayed in an ANOVA table.
22.1.5 ANOVA Tables for Two Factor Main Effects Model
Df | Sum Sq | Mean Sq | F value | P value | |
---|---|---|---|---|---|
Factor A | K-1 | SSA | MSA | fA | PA |
Factor B (adj. A) | L-1 | \(SSB|A\) | \(MSB|A\) | \(f_{B|A}\) | \(P_{B|A}\) |
Residual | n-K-L+1 | RSS | RMS | ||
Total | n-1 | TSS |
The residual row refers to residuals from model MAB.
The row for factor A gives the F statistic for testing the importance of A without adjusting for B.
The second row gives the F statistic for testing the importance of B adjusted for A as just discussed.
We can construct a new ANOVA table with first two rows swapped in order to test for A adjusted for B.
22.2 Example: Foster Feeding Rats
In this example, we are looking at a dataset on baby rats fed by foster mothers. Genotype (A, B, I or J) are recorded for both rat and foster mother. The weight of baby rats is recorded at a given age. The factors are genotypes of rat and mother. Does weight depend on either?
22.2.1 Analysis of the Data
Rat Mother Weight
1 A A 61.5
2 A A 68.2
3 A A 64.0
4 A A 65.0
5 A A 59.7
6 A B 55.0
Let’s first fit a two-way model with Rat
variable first, and Mother
variable second.
Analysis of Variance Table
Response: Weight
Df Sum Sq Mean Sq F value Pr(>F)
Rat 3 60.2 20.052 0.3317 0.802470
Mother 3 775.1 258.360 4.2732 0.008861 **
Residuals 54 3264.9 60.461
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = Weight ~ Rat + Mother, data = Rat)
Residuals:
Min 1Q Median 3Q Max
-18.425 -5.584 2.499 5.416 13.745
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.909 2.478 22.964 <2e-16 ***
RatB -2.025 2.795 -0.725 0.4719
RatI -2.654 2.827 -0.939 0.3520
RatJ -2.021 2.757 -0.733 0.4668
MotherB 3.516 2.862 1.229 0.2246
MotherI -1.832 2.767 -0.662 0.5107
MotherJ -6.755 2.810 -2.404 0.0197 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared: 0.2037, Adjusted R-squared: 0.1152
F-statistic: 2.302 on 6 and 54 DF, p-value: 0.04732
This first model indicates that the genotype of the rat is unimportant (P=0.802), but that the genotype of the foster mother adjusted for the genotype of the rat is statistically significant (P=0.00886).
Now we fit the model with Mother
variable first, and Rat
variable second.
Analysis of Variance Table
Response: Weight
Df Sum Sq Mean Sq F value Pr(>F)
Mother 3 771.6 257.202 4.2540 0.009055 **
Rat 3 63.6 21.211 0.3508 0.788698
Residuals 54 3264.9 60.461
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = Weight ~ Mother + Rat, data = Rat)
Residuals:
Min 1Q Median 3Q Max
-18.425 -5.584 2.499 5.416 13.745
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.909 2.478 22.964 <2e-16 ***
MotherB 3.516 2.862 1.229 0.2246
MotherI -1.832 2.767 -0.662 0.5107
MotherJ -6.755 2.810 -2.404 0.0197 *
RatB -2.025 2.795 -0.725 0.4719
RatI -2.654 2.827 -0.939 0.3520
RatJ -2.021 2.757 -0.733 0.4668
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared: 0.2037, Adjusted R-squared: 0.1152
F-statistic: 2.302 on 6 and 54 DF, p-value: 0.04732
There is little change in p-values when we alter the order of the terms to get the second model. Note that zero change only occurs in a special case (next lecture).
22.2.2 Comments and Conclusions
Overall, it seems clear that weight is not associated with genotype of the rat, so whether we adjust for this term is pretty much immaterial.
Weight is clearly associated with genotype of the foster mother.
Genotype A is reference level (baseline) for both
Mother
andRat
genotypes.The
summary()
command (which gives parameter estimates) suggests that genotype J makes for poor foster mothers.