Download the R markdown file for this lecture.
We know that the importance of one predictor can depend on whether or not we have adjusted for another. This is also true if the predictors are factors.
This means that we need to look at multiple ANOVA tables in order to perform all the possible tests (with particular patterns of adjustment).
However, for orthogonal factorial designs it turns out that the issue of adjustment is of no concern.
In essence, the pattern of factor levels (the factorial design if this was chosen by an experimenter) is orthogonal if the sum of squares attributable to one factor is the same whether or not the other factor has been included in the model.
In terms of ANOVA tables, this means that SSA will be the same whether A appears on the first or second line of the ANOVA table.
A balanced and complete factorial design is orthogonal.
A factorial design is complete if observations are made at every possible combination of factor levels, or treatment.
For example, if factor A has 3 levels, and factor B has 4 levels, then a complete design requires that we observe responses at each of the possible \(3\times4 = 12\) treatments.
A factorial design is balanced if the same number of experimental units are observed at each treatment. In other words, nij = r is the (constant) number of replications.
Balance and completeness need to be achieved by design (they will not usually be the case in an observational study or survey).
In a two-way orthogonal design:
The P-values for each factor in the ANOVA table remain precisely the same irrespective of the order in which the factors are listed.
In considering the importance of factor B it does not matter whether or not we have adjusted for A (and vice versa).
The idea of orthogonality can be extended to the three or more factor models that will be covered in a future lecture.
Experiment performed to investigate butterfat content of milk (the response variable, measured as a percentage). Factors:
10 replicates (cows) observed at each treatment (i.e. combination of breed and maturity).
The design is complete and balanced, so therefore orthogonal.
## Cows <- read.csv(file = "cows.csv", header = TRUE)
Cows
Butterfat Breed Age
1 3.74 Ayrshire Mature
2 4.01 Ayrshire 2year
3 3.77 Ayrshire Mature
4 3.78 Ayrshire 2year
5 4.10 Ayrshire Mature
6 4.06 Ayrshire 2year
7 4.27 Ayrshire Mature
8 3.94 Ayrshire 2year
9 4.11 Ayrshire Mature
10 4.25 Ayrshire 2year
11 4.44 Ayrshire Mature
12 4.37 Ayrshire 2year
13 4.25 Ayrshire Mature
14 3.71 Ayrshire 2year
15 4.08 Ayrshire Mature
16 3.90 Ayrshire 2year
17 4.41 Ayrshire Mature
18 4.11 Ayrshire 2year
19 4.37 Ayrshire Mature
20 3.53 Ayrshire 2year
21 3.92 Canadian Mature
22 4.95 Canadian 2year
23 4.47 Canadian Mature
24 4.28 Canadian 2year
25 4.07 Canadian Mature
26 4.10 Canadian 2year
27 4.38 Canadian Mature
28 3.98 Canadian 2year
29 4.46 Canadian Mature
30 5.05 Canadian 2year
31 4.29 Canadian Mature
32 5.24 Canadian 2year
33 4.43 Canadian Mature
34 4.00 Canadian 2year
35 4.62 Canadian Mature
36 4.29 Canadian 2year
37 4.85 Canadian Mature
38 4.66 Canadian 2year
39 4.40 Canadian Mature
40 4.33 Canadian 2year
41 4.54 Guernsey Mature
42 5.18 Guernsey 2year
43 5.75 Guernsey Mature
44 5.04 Guernsey 2year
45 4.64 Guernsey Mature
46 4.79 Guernsey 2year
47 4.72 Guernsey Mature
48 3.88 Guernsey 2year
49 5.28 Guernsey Mature
50 4.66 Guernsey 2year
51 5.30 Guernsey Mature
52 4.50 Guernsey 2year
53 4.59 Guernsey Mature
54 5.04 Guernsey 2year
55 4.83 Guernsey Mature
56 4.55 Guernsey 2year
57 4.97 Guernsey Mature
58 5.38 Guernsey 2year
59 5.39 Guernsey Mature
60 5.97 Guernsey 2year
61 3.40 Holstein-Fresian Mature
62 3.55 Holstein-Fresian 2year
63 3.83 Holstein-Fresian Mature
64 3.95 Holstein-Fresian 2year
65 4.43 Holstein-Fresian Mature
66 3.70 Holstein-Fresian 2year
67 3.30 Holstein-Fresian Mature
68 3.93 Holstein-Fresian 2year
69 3.58 Holstein-Fresian Mature
70 3.54 Holstein-Fresian 2year
71 3.79 Holstein-Fresian Mature
72 3.66 Holstein-Fresian 2year
73 3.58 Holstein-Fresian Mature
74 3.38 Holstein-Fresian 2year
75 3.71 Holstein-Fresian Mature
76 3.94 Holstein-Fresian 2year
77 3.59 Holstein-Fresian Mature
78 3.55 Holstein-Fresian 2year
79 3.55 Holstein-Fresian Mature
80 3.43 Holstein-Fresian 2year
81 4.80 Jersey Mature
82 6.45 Jersey 2year
83 5.18 Jersey Mature
84 4.49 Jersey 2year
85 5.24 Jersey Mature
86 5.70 Jersey 2year
87 5.41 Jersey Mature
88 4.77 Jersey 2year
89 5.18 Jersey Mature
90 5.23 Jersey 2year
91 5.75 Jersey Mature
92 5.14 Jersey 2year
93 5.25 Jersey Mature
94 4.76 Jersey 2year
95 5.18 Jersey Mature
96 4.22 Jersey 2year
97 5.98 Jersey Mature
98 4.85 Jersey 2year
99 6.55 Jersey Mature
100 5.72 Jersey 2year
.1 <- lm(Butterfat ~ Breed + Age, data = Cows)
Cows.lmanova(Cows.lm.1)
Analysis of Variance Table
Response: Butterfat
Df Sum Sq Mean Sq F value Pr(>F)
Breed 4 34.321 8.5803 50.1150 <2e-16 ***
Age 1 0.274 0.2735 1.5976 0.2094
Residuals 94 16.094 0.1712
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
.2 <- lm(Butterfat ~ Age + Breed, data = Cows)
Cows.lmanova(Cows.lm.2)
Analysis of Variance Table
Response: Butterfat
Df Sum Sq Mean Sq F value Pr(>F)
Age 1 0.274 0.2735 1.5976 0.2094
Breed 4 34.321 8.5803 50.1150 <2e-16 ***
Residuals 94 16.094 0.1712
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Cows.lm.1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0077 0.10135464 39.541356 2.234024e-60
BreedCanadian 0.3785 0.13084828 2.892663 4.746491e-03
BreedGuernsey 0.8900 0.13084828 6.801771 9.480645e-10
BreedHolstein-Fresian -0.3905 0.13084828 -2.984372 3.621196e-03
BreedJersey 1.2325 0.13084828 9.419306 3.155879e-15
AgeMature 0.1046 0.08275572 1.263961 2.093694e-01
summary(Cows.lm.2)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0077 0.10135464 39.541356 2.234024e-60
AgeMature 0.1046 0.08275572 1.263961 2.093694e-01
BreedCanadian 0.3785 0.13084828 2.892663 4.746491e-03
BreedGuernsey 0.8900 0.13084828 6.801771 9.480645e-10
BreedHolstein-Fresian -0.3905 0.13084828 -2.984372 3.621196e-03
BreedJersey 1.2325 0.13084828 9.419306 3.155879e-15
This task concerns a complete and balanced experiment into rat weight gain.
Two factors:
Ten replicates at each treatment.
The following ANOVA table (with certain elements obscure by #
) was obtained using R.
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
Amount | 1 | 1299.6 | # | # | 0.026 |
Protein | 1 | 220.9 | # | # | 0.345 |
Residuals | 37 | 8933.0 | # |
Calculate the obscured values.
What can you conclude (if anything) about the effect of Protein
ignoring the effects of Amount
?
As we move towards more complex factorial models, with more than two factors and interactions, we need to start using the mathematical formulation of these models as they are more concise.
Remember that the one-way model \[Y_i = \mu + \alpha_2 z_{i2} + \ldots + \alpha_K z_{iK} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\]
can be written as \[\boldsymbol{Y_{ij} = \mu + \alpha_i + \varepsilon_{ij}}\] where
Yij is the response of the jth unit at the ith level of the factor (i=1,…,K; j=1,…,ni);
K denotes the number of levels, and ni the number of observations at level i of the factor.
Assume treatment constraint \(\alpha_1=0\).
\(\varepsilon_{11}, \varepsilon_{12}, \ldots, \varepsilon_{Kn_K}\) are random errors satisfying assumptions (A1)–(A4).
A two-way model with factors A and B in multiple regression form \[Y_i = \mu + \alpha_2 z_{Ai2} + \ldots + \alpha_K z_{AiK} + \beta_2 z_{Bi2} + \ldots + \beta_L z_{BiL} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\] becomes: \[\boldsymbol{Y_{ijk} = \mu + \alpha_i + \beta_j + \varepsilon_{ijk}}\]
where Yijk is the response for the kth unit at level i of factor A and level j of factor B (i=1,…,K; j=1,…,L; k=1,…,nij).
\(\alpha_1, \ldots, \alpha_K\) and \(\beta_1, \ldots, \beta_L\) are parameters describing the ‘main effects’ of A and B respectively.
Assume treatment constraints, \(\alpha_1 = 0\) and \(\beta_1 = 0\).
\(\varepsilon_{111}, \varepsilon_{112}, \ldots, \varepsilon_{KLn_{KL}}\), are error terms satisfying assumptions A1–A4.
Comments on the Dairy Cattle Data Analysis
By default, level one of
Age
(2 years old) and level one ofBreed
, Ayrshire, are set as the reference levels for the treatment constraint.The figures in the ANOVA tables for models
Cows.lm.1
andCows.lm.2
are identical, despite the difference in order in which the factors are considered. This occurs because of the orthogonal design.There is overwhelming evidence of a breed effect (P-value smaller than \(2 \times 10^{-16}\)) on mean butterfat content.
The Jerseys seem to provide the highest butterfat concentration.
There is no evidence of an age effect.
We should look at model diagnostics.