Download the R markdown file for this lecture.
In the examples that we have considered so far it has made sense to consider the main effects of ALL factors.
However, for nested factors the levels of one are defined within the context of the levels of another.
We shall explore the use of nested factors, and consider the analysis of models containing such factors.
Response = concentration of chemical in sample.
Interested to see whether two labs deliver the same results.
Send off samples to each lab, and get three technicians at each lab to analyse the samples.
There are two factors here – (i) the lab and (ii) the technician.
Makes sense to ask, if both labs give equal readings.
Does not make sense to ask if technician 1 gives
greater readings than technician 2.
That is, the concept of level 1 of the technician factor cannot be applied universally.
Technician number 1 at lab A and the technician number 1 at lab B are entirely different people.
It makes no sense to ask whether technician 1 or 2 is better because they are lab specific.
Notice that this is entirely different to the swimming example where level 1 of ‘goggles’ meant the same thing at every level of every other factor.
In this labs and technician example the second factor – the technician – is said to be ‘nested with’ the labs.
When such nesting occurs one should not include the main effect of the nested factor, but only the effect of the nested factor ‘within’ the factor nesting it.
A two factor model with factor B nested within factor A is written as \[Y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}. \label{nested}\]
The model in equation (\[nested\]) only contains a main effect for A and an interaction term, but no main effect for B.
We have a main effect for comparing labs only.
Note that the factor B effects (the \(\beta\)’s) depend on factor A, hence the notation for the subscript of \(\beta\).
Looking at differences between technicians in lab A separately to those in lab B.
Nesting of factors is the one case where you do not need main effects corresponding to all terms in an interaction.
\[Y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]
As per usual, the parameters will be constrained.
Using the treatment constraint, \(\alpha_i\) is the effect of level i of A relative to the reference group (level 1).
\(\beta_{j(i)}\) is the effect of level j of B within level i of A, relative to level 1 of B within level i of A.
Bob is level 1 in lab A, and Max is level 1 in lab B. So, the \(\beta\)’s will reflect how Bill and Bea relate to Bob and how Mel and Meg relate to Max.
The notation A/B
on the right hand side of an R formula represents the factor B nested within A.
The notation A/B
can equivalently be written as A + A:B
.
Going back to the labs and technicians example, we would have factor A as the labs and B as the scientists.
A Canadian botanist was interested in the abundance of pine pollen in cores taken from the bottom of several bogs in northern Alberta. The botanist sampled pollen at three depths: shallow
, "medium, and
deep` respectively (corresponding to 0.5, 2 and 3 metre depths respectively). She took two samples of peat at each of these depths, and prepared 2 slides from each of the six samples for microscopic examination. The number of pollen grains from the microscope slides is the response.
In this example the factor Sample
is nested within the factor Depth
.
To understand why, note that sample A at shallow depth has no connection with sample A at medium depth.
## Pollen <- read.csv(file = "pollen.csv", header = TRUE)
Pollen
Depth Sample Count
1 shallow A 12
2 shallow A 14
3 shallow B 10
4 shallow B 7
5 medium A 16
6 medium A 12
7 medium B 10
8 medium B 19
9 deep A 21
10 deep A 29
11 deep B 33
12 deep B 30
<- lm(Count ~ Depth/Sample, data = Pollen)
Pollen.lm anova(Pollen.lm)
Analysis of Variance Table
Response: Count
Df Sum Sq Mean Sq F value Pr(>F)
Depth 2 686.00 343.00 22.4918 0.00163 **
Depth:Sample 3 62.75 20.92 1.3716 0.33847
Residuals 6 91.50 15.25
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Pollen.lm)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.0 2.761340 9.0535746 0.0001018064
Depthmedium -11.0 3.905125 -2.8168114 0.0304817298
Depthshallow -12.0 3.905125 -3.0728851 0.0218611521
Depthdeep:SampleB 6.5 3.905125 1.6644794 0.1470722956
Depthmedium:SampleB 0.5 3.905125 0.1280369 0.9023034428
Depthshallow:SampleB -4.5 3.905125 -1.1523319 0.2930201810
Based on the parameter estimates from the model fitted in this Example , compute the fitted values in the following cases:
An observation from Sample A taken at medium depth.
An observation from Sample B taken at deep depth.
An observation from Sample B taken at shallow depth.
Comments
The levels of
Depth
are ordereddeep
,medium
, andshallow
by (alphabetical) default, sodeep
is the reference level for the treatment constraint.There is clear evidence that pollen varies with depth (\(P=0.0016\) from the ANOVA table).
There is no evidence of an effect of
Sample
withinDepth
. In other words, there do not seem to be systematic differences between the two samples taken at each depth.The table of parameter estimates indicates (from the fixed effects) that pollen count is highest in the deep bog.
The interaction terms estimate the effect of Sample B in comparison to Sample A (baseline) at each depth.