Lecture 19 Linear Models with One Factor

In class version

So far we have focused on regression models, where a continuous random response variable is modelled as a function of one or more numerical explanatory variables.

Another common situation is where the explanatory variables are categorical variables, or what we will often call factors.

In this lecture we will begin to look at such models, and will focus in particular on one-way models.

19.1 The One-Way Model

The one-way (or one factor) model is used when a continuous numerical response Y is dependent on a single factor (categorical explanatory variable).

In such models, the categories defined by a factor are called the levels of the factor.

19.2 Stimulating Effects of Caffeine

Question: does caffeine stimulation affect the rate at which individuals can tap their fingers?

Thirty male students randomly allocated to three treatment groups of 10 students each. Groups treated as follows:

Group 1: zero caffeine dose

Group 2: low caffeine dose

Group 3: high caffeine dose

Allocation to treatment groups was blind (subjects did not know their caffeine dosage).

Two hours after treatment, each subject tapped fingers as quickly as possible for a minute. Number of taps recorded.

Number of taps is the response variable, caffeine dose is the explanatory variable. Here caffeine dose is a factor with 3 levels - zero, low and high. Does apparent trend in boxplot provide convincing evidence that caffeine affects tapping rate?

Download caffeine.csv

## Caffeine <- read.csv(file = "caffeine.csv", header = T)
boxplot(Taps ~ Dose, data = Caffeine)

unlabelled

19.3 Mathematical Formulation of the One-Way Model

It is common to express the One-way model as

\[y_{ij} = \mu_j + \varepsilon_{ij}~~~~~~~~(j=1,\ldots,K,~~~~~i=1,\ldots,n_j)\]

where: yij is the response of the ith unit at the jth level of the factor; K denotes the number of levels; \(\mu_j\) is the (unknown) population mean for level \(j\)

nj the number of observations (replications) at level j of the factor; and, the values \(\varepsilon_{11},..,\varepsilon_{Kn_K}\) are random errors satisfying assumptions (A1)-(A4)

The null hypothesis is then expressed as

\[H_0: ~~~ \mu_1 = \mu_2 = ... =\mu_K \] and the alternative hypothesis is

\[H_1: ~~~\mbox{ means not all equal.}\]

Another way of writing this is \[y_{ij} = \mu + \alpha_j + \varepsilon_{ij}~~~~~~~~(j=1,\ldots,K,~~~~~i=1,\ldots,n_i)\]

where \(\alpha_j = \mu_j - \mu\) (mean difference to the baseline) for \(j=1,...,K\).

The mean response is \[E[Y_{ij}] = \mu + \alpha_j\]

19.4 Parameterisation of the One-Way Model

However, as it stands, the model appears to be overparameterised.

That is, we have \(K\) means and we are trying to estimate K+1 parameters \(\mu,\alpha_1,...,\alpha_K\)

We cannot estimate K+1 numbers from k bits of information.

19.5 Solution to Overparameterisation

We impose a constraint on the parameters \(\alpha_1, \alpha_2, \ldots, \alpha_K\).

The two most popular (and easily interpretable) constraints are:

  • Sum constraints: \(\sum_{j=1}^K \alpha_j = 0\).
    In this case \(\mu\) can be interpreted as a kind of “grand mean”, and \(\alpha_1, \alpha_2, \ldots, \alpha_K\) measure deviations from this grand mean.

  • Treatment constraint: \(\alpha_1 = 0\).
    In this case level 1 of the factor is regarded as the baseline or reference level, and \(\alpha_2, \alpha_3 \ldots, \alpha_K\) measure deviations from this baseline. This is highly appropriate if level 1 corresponds to a control group, for example.

We will work with the treatment constraint. It is also used by default for factors in R.

19.6 Expressing Factor Models as Regression Models

We can express a one-way factor model as a regression model by using dummy (indicator) variables.

Define dummy variables \(z_{11}, z_{12}, \ldots, z_{n K}\) by \[z_{ij} = \left \{ \begin{array}{ll} 1 & \mbox{unit i observed at factor level j}\\ 0 & \mbox{otherwise.} \end{array} \right .\]

19.7 Regression form of one-way factor model

A one-way factor model can be expressed as follows: \[y_i = \mu + \alpha_1 z_{i1} + \alpha_2 z_{i2} + \ldots + \alpha_K z_{iK} + \varepsilon_i~~~~~(i=1,2,\ldots,n)\]

This has the form of a multiple linear regression model.

The parameter \(\mu\) is the regression intercept and, under the treatment constraint (\(\alpha_1 = 0\)), the mean baseline (factor level 1) response.

19.8 Modelling the Caffeine Data

Our one-way factor model, where the factor has three levels, can be expressed as follows: \[y_i = \mu + \alpha_2 z_{i2} + \alpha_{3} z_{i3} + \varepsilon_i\]

where: yi is the number of taps recorded for the ith subject; zi2 = 1 if subject i is on low dose; zi2 = 0 otherwise; and zi3 = 1 if subject i is on high dose; zi3 = 0 otherwise.

Then, \(\mu\) is mean response for a subject on zero dose; \(\alpha_2\) is the effect of low dose in contrast to zero dose; and, \(\alpha_3\) is the effect of high dose in contrast to zero dose.

19.9 Expressing Factorial Models using Matrices

Since factorial models can be expressed as linear regression models, they may be described using matrix notation as we saw earlier.

For a one-way factorial model with treatment constraint: \[{\boldsymbol{y}} = X {\boldsymbol{\beta}} + \varepsilon\] where: \[{\boldsymbol{y}} = \left [ \begin{array}{c} Y_1\\ Y_2\\ \vdots\\ Y_n \end{array} \right ] ~~~~X = \left [ \begin{array}{cccc} 1 & z_{12} & \ldots & z_{1K}\\ 1 & z_{22} & \ldots & z_{2K}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & z_{n2} & \ldots & z_{nK} \end{array} \right ] ~~~ {\boldsymbol{\beta}} = \left [ \begin{array}{c} \mu\\ \alpha_2\\ \vdots\\ \alpha_K\\ \end{array} \right ] ~~~ \boldsymbol{\varepsilon} = \left [ \begin{array}{c} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{array} \right ]\]

There is no \(\alpha_1\) in the parameter vector, and no row for \(z_{i1}\) in the design matrix, because of the treatment constraint.

The following shows the design matrix for the one-way model for the caffeine data.

(Intercept) DoseLow DoseZero
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0

19.10 Factor Models as Regression Models

Since factor models can be regarded as regression models, all ideas about parameter estimation, fitted values etc. follow in the natural manner.

Parameter estimation can be done by the method of least squares, to give vector of estimates \(\hat {\boldsymbol{\beta}} = (\hat \mu, \hat \alpha_2, \ldots, \hat \alpha_K)^T\).

Fitted values and residuals defined in the usual way.

19.11 Back to the Caffeine Data

19.11.1 R Code

The factor levels of Dose have a natural ordering. In particular, it is intuitive to set the zero level as baseline (factor level 1).

## Caffeine <- read.csv(file = "caffeine.csv", header = T)
str(Caffeine)
'data.frame':   30 obs. of  2 variables:
 $ Taps: int  242 245 244 248 247 248 242 244 246 242 ...
 $ Dose: chr  "Zero" "Zero" "Zero" "Zero" ...
Caffeine
   Taps Dose
1   242 Zero
2   245 Zero
3   244 Zero
4   248 Zero
5   247 Zero
6   248 Zero
7   242 Zero
8   244 Zero
9   246 Zero
10  242 Zero
11  248  Low
12  246  Low
13  245  Low
14  247  Low
15  248  Low
16  250  Low
17  247  Low
18  246  Low
19  243  Low
20  244  Low
21  246 High
22  248 High
23  250 High
24  252 High
25  248 High
26  250 High
27  246 High
28  248 High
29  245 High
30  250 High

We fix up the order of the factor levels (default is alphabetical).

levels(Caffeine$Dose)
NULL
Caffeine$Dose <- factor(Caffeine$Dose, levels = c("Zero", "Low", "High"))
levels(Caffeine$Dose)
[1] "Zero" "Low"  "High"

The syntax using the factor() command (with specified levels) reorders the factor levels so that zero, low and high dose are interpreted as levels 1, 2 and 3 respectively.

Fitting the Model

We can use the lm() function in R to fit model in the same manner as for regression models.

Caffeine.lm <- lm(Taps ~ Dose, data = Caffeine)
summary(Caffeine.lm)

Call:
lm(formula = Taps ~ Dose, data = Caffeine)

Residuals:
   Min     1Q Median     3Q    Max 
-3.400 -2.075 -0.300  1.675  3.700 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 244.8000     0.7047 347.359  < 2e-16 ***
DoseLow       1.6000     0.9967   1.605  0.12005    
DoseHigh      3.5000     0.9967   3.512  0.00158 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.229 on 27 degrees of freedom
Multiple R-squared:  0.3141,    Adjusted R-squared:  0.2633 
F-statistic: 6.181 on 2 and 27 DF,  p-value: 0.006163

Using the notation introduced earlier, the parameter estimates are \(\hat \mu = 244.8\), \(\hat \alpha_2 = 1.6\) and \(\hat \alpha_3 = 3.5\).

19.12 fitted values

Dose (level) mean response fitted value
Zero (1) \(\mu\) 244.8
Low (2) \(\mu + \alpha_2\) 246.4
High (3) \(\mu + \alpha_3\) 248.3

19.13 Hypothesis tests

There are three ways we can think of the hypothesis test.

The first formulation of the model (with \(H_0: ~~~ \mu_1 = \mu_2 = ... =\mu_K\)) can be tested by just looking at the \(F\) test at the bottom of the summary(lm) output, or applying the anova() command to the output.

anova(Caffeine.lm)
Analysis of Variance Table

Response: Taps
          Df Sum Sq Mean Sq F value   Pr(>F)   
Dose       2   61.4 30.7000  6.1812 0.006163 **
Residuals 27  134.1  4.9667                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On the other hand writing the Factor models (using the treatment constraint) can be written as multiple linear regression models, using dummy variables:

\[Y_i = \mu + \alpha_2 z_{i2} + \ldots + \alpha_K z_{iK} + \varepsilon_i \quad(i=1,2,\ldots,n)\]

we can regard the F test as testing the hypothesis that all slopes are zero \(H_0: \alpha_2 = ...=\alpha_K = 0\), which gives the same omnibus F test.

z2 = Caffeine$Dose == "Low"
z3 = Caffeine$Dose == "High"
Caffeine.lm2 = lm(Caffeine$Taps ~ z2 + z3)
summary(Caffeine.lm2)

Call:
lm(formula = Caffeine$Taps ~ z2 + z3)

Residuals:
   Min     1Q Median     3Q    Max 
-3.400 -2.075 -0.300  1.675  3.700 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 244.8000     0.7047 347.359  < 2e-16 ***
z2TRUE        1.6000     0.9967   1.605  0.12005    
z3TRUE        3.5000     0.9967   3.512  0.00158 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.229 on 27 degrees of freedom
Multiple R-squared:  0.3141,    Adjusted R-squared:  0.2633 
F-statistic: 6.181 on 2 and 27 DF,  p-value: 0.006163

Finally we can consider the problem as a comparison of two models

\[M0:~~~~ Y_i = \mu + \varepsilon_i \quad (i=1,2,\ldots,n)\] and \[M1:~~~~ Y_i = \mu + \alpha_2 z_{i2} + \ldots + \alpha_K z_{iK} + \varepsilon_i \quad (i=1,2,\ldots,n)\] Thus

Caffeine.M0 = lm(Caffeine$Taps ~ 1)
Caffeine.M1 = lm(Caffeine$Taps ~ z2 + z3)
anova(Caffeine.M0, Caffeine.M1)
Analysis of Variance Table

Model 1: Caffeine$Taps ~ 1
Model 2: Caffeine$Taps ~ z2 + z3
  Res.Df   RSS Df Sum of Sq      F   Pr(>F)   
1     29 195.5                                
2     27 134.1  2      61.4 6.1812 0.006163 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Important:

  • In a multiple linear regression model it makes sense to add or remove a single explanatory variable.
  • In a factor model it (usually) only makes sense to add or remove a whole factor, not single levels of a factor.
  • This will usually require F tests (not just t-tests for single parameters).

19.13.0.1 Conclusions from the ANOVA table

The observed F-statistic for testing for the significance of caffeine dose is f = 6.181 on K-1 = 2 and n-K = 27 degrees of freedom.

The corresponding P-value is P=0.0062; we therefore have very strong evidence against H0; i.e. finger tapping speed does seem to be related to caffeine dose.

The table of coefficients in the R output shows that the mean response difference between the high dose and control groups is a little more than twice as big as the difference between low dose and control group.

19.13.1 Interpretation of the anova output

Use the anova output produced by R to answer the following questions on the caffeine data.

  1. What is the residual sum of squares for the full model M1?

  2. What is the mean sum of squares for the model M1?

  3. What is the residual sum of squares for the null model M0, given below? \[M0: Y_i = \mu + \varepsilon_i~~~~~~~(i=1,2,\ldots,30)\]

  4. What is the mean sum of squares for the model M0?