Lecture 22 Introduction to the Two Factor Model

In class version

In this lecture we introduce linear models with two factors.

We look at the formulation of such models, and consider significance tests for the factors.

Estimation for models with two (or more) factors can be performed by the method of least squares in the usual way.

Fitted values and residuals are also defined as usual.

22.1 A Main Effects Model for Two Factors

Suppose that there are two factors, A and B, which may be related to the response variable.

The most straightforward way of modelling these factors is to assume that their effects are additive.

This leads to what can be termed the main effects two-way model.

The rationale for using the phrase main effects will become clear when we look at interactions between factors.

22.1.1 Model for Two Factors as a Multiple Linear Regression

Written in the same way as was done for the one factor model:

\[Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\] where \[z_{Aij} = \left \{ \begin{array}{ll} 1 & \mbox{unit } i \mbox{ observed at level } j \mbox{ of } A\\ 0 & \mbox{otherwise} \end{array} \right .\] and \[z_{Bij} = \left \{ \begin{array}{ll} 1 & \mbox{unit } i \mbox{ observed at level } j \mbox{ of } B\\ 0 & \mbox{otherwise} \end{array} \right .\]

No \(\alpha_1 z_{Ai1}\) and \(\beta_1 z_{Bi1}\) as we assume treatment constraints. That is, \(\alpha_1 = 0\) and \(\beta_1 = 0\).

22.1.2 Tests for Main Effects in a Two-Way Model

In a two factor model we are usually interested in testing for the statistical significance of both factors.

The importance of a factor may depend upon other factors in the model, in the same way that the importance of a numerical predictor in a multiple linear regression model depends upon the other explanatory variables in the model.

The importance of the factors can be assessed by comparing appropriate nested models using F tests.

22.1.3 Models Under Consideration

There are a number of possible main effects models based on at most two factors A and B:

  • Both A and B have an effect on the response: \[M_{AB}:~~Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\]
  • Only A has an effect on the response: \[M_A:~~Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \varepsilon_i\]
  • Only B has an effect on the response: \[M_B:~~Y_i = \mu + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\]
  • Neither A nor B have an effect on the response: \[M_0:~~Y_i = \mu + \varepsilon_i\]

22.1.4 Testing for the Importance of B adjusted for A

This involves comparing the nested models \(M_{AB}\) and \(M_A\).

This can be done by testing \[H_0:~~\beta_2 = \beta_3 = \cdots = \beta_L = 0\] versus

\[H_1:~~\beta_2, \beta_3, \cdots, \beta_L \mbox{ not all zero}\]

The appropriate F test statistic (on \(L-1, n-K-L+1\) degrees of freedom) is

\[F = \frac{(RSS_{A} - RSS_{AB})/(L-1)}{RSS_{AB}/(n-K-L+1)}\]

As usual, large values of this F statistic provide evidence against \(H_0\) (hence give small p-values).

The relevant figures for this F test can be displayed in an ANOVA table.

22.1.5 ANOVA Tables for Two Factor Main Effects Model

Df Sum Sq Mean Sq F value P value
Factor A K-1 SSA MSA fA PA
Factor B (adj. A) L-1 \(SSB|A\) \(MSB|A\) \(f_{B|A}\) \(P_{B|A}\)
Residual n-K-L+1 RSS RMS
Total n-1 TSS

The residual row refers to residuals from model MAB.

The row for factor A gives the F statistic for testing the importance of A without adjusting for B.

The second row gives the F statistic for testing the importance of B adjusted for A as just discussed.

We can construct a new ANOVA table with first two rows swapped in order to test for A adjusted for B.

22.2 Example: Foster Feeding Rats

In this example, we are looking at a dataset on baby rats fed by foster mothers. Genotype (A, B, I or J) are recorded for both rat and foster mother. The weight of baby rats is recorded at a given age. The factors are genotypes of rat and mother. Does weight depend on either?

who likes rats?
who likes rats?

22.2.1 Analysis of the Data

Download ratgene.csv

## Rat <- read.csv(file = "ratgene.csv", header = TRUE)
head(Rat)
  Rat Mother Weight
1   A      A   61.5
2   A      A   68.2
3   A      A   64.0
4   A      A   65.0
5   A      A   59.7
6   A      B   55.0

Let’s first fit a two-way model with Rat variable first, and Mother variable second.

Rat.lm.1 <- lm(Weight ~ Rat + Mother, data = Rat)
anova(Rat.lm.1)
Analysis of Variance Table

Response: Weight
          Df Sum Sq Mean Sq F value   Pr(>F)   
Rat        3   60.2  20.052  0.3317 0.802470   
Mother     3  775.1 258.360  4.2732 0.008861 **
Residuals 54 3264.9  60.461                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.1)

Call:
lm(formula = Weight ~ Rat + Mother, data = Rat)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.425  -5.584   2.499   5.416  13.745 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   56.909      2.478  22.964   <2e-16 ***
RatB          -2.025      2.795  -0.725   0.4719    
RatI          -2.654      2.827  -0.939   0.3520    
RatJ          -2.021      2.757  -0.733   0.4668    
MotherB        3.516      2.862   1.229   0.2246    
MotherI       -1.832      2.767  -0.662   0.5107    
MotherJ       -6.755      2.810  -2.404   0.0197 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared:  0.2037,    Adjusted R-squared:  0.1152 
F-statistic: 2.302 on 6 and 54 DF,  p-value: 0.04732

This first model indicates that the genotype of the rat is unimportant (P=0.802), but that the genotype of the foster mother adjusted for the genotype of the rat is statistically significant (P=0.00886).

Now we fit the model with Mother variable first, and Rat variable second.

Rat.lm.2 <- lm(Weight ~ Mother + Rat, data = Rat)
anova(Rat.lm.2)
Analysis of Variance Table

Response: Weight
          Df Sum Sq Mean Sq F value   Pr(>F)   
Mother     3  771.6 257.202  4.2540 0.009055 **
Rat        3   63.6  21.211  0.3508 0.788698   
Residuals 54 3264.9  60.461                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.2)

Call:
lm(formula = Weight ~ Mother + Rat, data = Rat)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.425  -5.584   2.499   5.416  13.745 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   56.909      2.478  22.964   <2e-16 ***
MotherB        3.516      2.862   1.229   0.2246    
MotherI       -1.832      2.767  -0.662   0.5107    
MotherJ       -6.755      2.810  -2.404   0.0197 *  
RatB          -2.025      2.795  -0.725   0.4719    
RatI          -2.654      2.827  -0.939   0.3520    
RatJ          -2.021      2.757  -0.733   0.4668    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared:  0.2037,    Adjusted R-squared:  0.1152 
F-statistic: 2.302 on 6 and 54 DF,  p-value: 0.04732

There is little change in p-values when we alter the order of the terms to get the second model. Note that zero change only occurs in a special case (next lecture).

22.2.2 Comments and Conclusions

Overall, it seems clear that weight is not associated with genotype of the rat, so whether we adjust for this term is pretty much immaterial.

Weight is clearly associated with genotype of the foster mother.

Genotype A is reference level (baseline) for both Mother and Rat genotypes.

The summary() command (which gives parameter estimates) suggests that genotype J makes for poor foster mothers.

22.2.3 Fitted Values for the Rats Model

  1. Calculate the fitted value for genotype A rat with genotype J foster mother.

  2. Calculate the fitted value for genotype J rat with genotype A foster mother.

  3. Calculate the fitted value for genotype B rat with genotype B foster mother.