Download the R markdown file for this lecture.

In this lecture we introduce linear models with two factors.

We look at the formulation of such models, and consider significance tests for the factors.

Estimation for models with two (or more) factors can be performed by the method of least squares in the usual way.

Fitted values and residuals are also defined as usual.

A Main Effects Model for Two Factors

Suppose that there are two factors, A and B, which may be related to the response variable.

The most straightforward way of modelling these factors is to assume that their effects are additive.

This leads to what can be termed the main effects two-way model.

The rationale for using the phrase main effects will become clear when we look at interactions.

Model for Two Factors as a Multiple Linear Regression

Written in the same way as was done for the one factor model:

\[Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\] where \[z_{Aij} = \left \{ \begin{array}{ll} 1 & \mbox{unit *i* observed at level *j* of *A*}\\ 0 & \mbox{otherwise} \end{array} \right .\] and \[z_{Bij} = \left \{ \begin{array}{ll} 1 & \mbox{unit *i* observed at level *j* of *B*}\\ 0 & \mbox{otherwise} \end{array} \right .\]

No \(\alpha_1 z_{Ai1}\) and \(\beta_1 z_{Bi1}\) as we assume treatment constraints. That is, \(\alpha_1 = 0\) and \(\beta_1 = 0\).

Tests for Main Effects in a Two-Way Model

In a two factor model we are usually interested in testing for the statistical significance of both factors.

The importance of a factor may depend upon other factors in the model, in the same way that the importance of a numerical predictor in a multiple linear regression model depends upon the other explanatory variables in the model.

The importance of the factors can be assessed by comparing appropriate nested models using F tests.

Models Under Consideration

There are a number of possible main effects models based on at most two factors.

  • MAB \(Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\)
  • MA \(Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \varepsilon_i\)
  • MB \(Y_i = \mu + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i\)
  • M0 \(Y_i = \mu + \varepsilon_i\)

Testing for the Importance of B Adjusted for A

This involves comparing the nested models MAB and MA.

This can be done by testing H0: \(\beta_2 = \beta_3 = \cdots = \beta_L = 0\) versus H1:

\(\beta_2, \beta_3, \cdots, \beta_L \mbox{ not all zero}\)

The appropriate F test statistic (on \(L-1, n-K-L+1\) degrees of freedom) is

\[F = \frac{(RSS_{A} - RSS_{AB})/(L-1)}{RSS_{AB}/(n-K-L+1)}\]

As usual, large values of this F statistic provide evidence against H0 (hence give small p-values).

The relevant figures for this F test can be displayed in an ANOVA table.

ANOVA Tables for Two Factor Main Effects Model

Df Sum Sq Mean Sq F value P value
Factor A K-1 SSA MSA fA PA
Factor B (adj. A) L-1 \(SSB|A\) \(MSB|A\) \(f_{B|A}\) \(P_{B|A}\)
Residual n-K-L+1 RSS RMS
Total n-1 TSS

The residual row refers to residuals from model MAB.

The row for factor A gives the F statistic for testing the importance of A without adjusting for B.

The second row gives the F statistic for testing the importance of B adjusted for A as just discussed.

We can construct a new ANOVA table with first two rows swapped in order to test for A adjusted for B.

Foster Feeding Rats

Baby rats fed by foster mothers. Genotype (A, B, I or J) recorded for both rat and foster mother. Weight recorded at a given age. Factors are genotypes of rat and mother. Does weight depend on either?

who likes rats?

Analysis of the Data

One Model…

`Download ratgene.csv

## Rat <- read.csv(file = "ratgene.csv", header = TRUE)
Rat
   Rat Mother Weight
1    A      A   61.5
2    A      A   68.2
3    A      A   64.0
4    A      A   65.0
5    A      A   59.7
6    A      B   55.0
7    A      B   42.0
8    A      B   60.2
9    A      I   52.5
10   A      I   61.8
11   A      I   49.5
12   A      I   52.7
13   A      J   42.0
14   A      J   54.0
15   A      J   61.0
16   A      J   48.2
17   A      J   39.6
18   B      A   60.3
19   B      A   51.7
20   B      A   49.3
21   B      A   48.0
22   B      B   50.8
23   B      B   64.7
24   B      B   61.7
25   B      B   64.0
26   B      B   62.0
27   B      I   56.5
28   B      I   59.0
29   B      I   47.2
30   B      I   53.0
31   B      J   51.3
32   B      J   40.5
33   I      A   37.0
34   I      A   36.3
35   I      A   68.0
36   I      B   56.3
37   I      B   69.8
38   I      B   67.0
39   I      I   39.7
40   I      I   46.0
41   I      I   61.3
42   I      I   55.3
43   I      I   55.7
44   I      J   50.0
45   I      J   43.8
46   I      J   54.5
47   J      A   59.0
48   J      A   57.4
49   J      A   54.0
50   J      A   47.0
51   J      B   59.5
52   J      B   52.8
53   J      B   56.0
54   J      I   45.2
55   J      I   57.0
56   J      I   61.4
57   J      J   44.8
58   J      J   51.5
59   J      J   53.0
60   J      J   42.0
61   J      J   54.0
Rat.lm.1 <- lm(Weight ~ Rat + Mother, data = Rat)
anova(Rat.lm.1)
Analysis of Variance Table

Response: Weight
          Df Sum Sq Mean Sq F value   Pr(>F)   
Rat        3   60.2  20.052  0.3317 0.802470   
Mother     3  775.1 258.360  4.2732 0.008861 **
Residuals 54 3264.9  60.461                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.1)

Call:
lm(formula = Weight ~ Rat + Mother, data = Rat)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.425  -5.584   2.499   5.416  13.745 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   56.909      2.478  22.964   <2e-16 ***
RatB          -2.025      2.795  -0.725   0.4719    
RatI          -2.654      2.827  -0.939   0.3520    
RatJ          -2.021      2.757  -0.733   0.4668    
MotherB        3.516      2.862   1.229   0.2246    
MotherI       -1.832      2.767  -0.662   0.5107    
MotherJ       -6.755      2.810  -2.404   0.0197 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared:  0.2037,    Adjusted R-squared:  0.1152 
F-statistic: 2.302 on 6 and 54 DF,  p-value: 0.04732

And Another Model…

Rat.lm.2 <- lm(Weight ~ Mother + Rat, data = Rat)
anova(Rat.lm.2)
Analysis of Variance Table

Response: Weight
          Df Sum Sq Mean Sq F value   Pr(>F)   
Mother     3  771.6 257.202  4.2540 0.009055 **
Rat        3   63.6  21.211  0.3508 0.788698   
Residuals 54 3264.9  60.461                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Rat.lm.2)

Call:
lm(formula = Weight ~ Mother + Rat, data = Rat)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.425  -5.584   2.499   5.416  13.745 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   56.909      2.478  22.964   <2e-16 ***
MotherB        3.516      2.862   1.229   0.2246    
MotherI       -1.832      2.767  -0.662   0.5107    
MotherJ       -6.755      2.810  -2.404   0.0197 *  
RatB          -2.025      2.795  -0.725   0.4719    
RatI          -2.654      2.827  -0.939   0.3520    
RatJ          -2.021      2.757  -0.733   0.4668    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.776 on 54 degrees of freedom
Multiple R-squared:  0.2037,    Adjusted R-squared:  0.1152 
F-statistic: 2.302 on 6 and 54 DF,  p-value: 0.04732

Comments and Conclusions

The first model indicates that genotype of rat is unimportant (P=0.802) but that genotype of foster mother adjusted for genotype of rat is statistically significant (P=0.00886).

There is little change in P-values when we alter the order of the terms to get the second model. N.B. zero change only occurs in a special case (next lecture).

Overall, it seems clear that weight is not associated with genotype of rat, so whether we adjust for this term is pretty much immaterial.

Weight is clearly associated with genotype of the foster mother.

Genotype A is reference level (baseline) for both Mother and Rat genotypes.

The summary() command (which gives parameter estimates) suggests that genotype J makes for poor foster mothers.

Fitted Values for the Rats Model

  1. Calculate the fitted value for genotype A rat with genotype J foster mother.

  2. Calculate the fitted value for genotype J rat with genotype A foster mother.

  3. Calculate the fitted value for genotype B rat with genotype B foster mother.