Download the R markdown file for this lecture.

We have now considered two types of testing problems in multiple linear regression:

  1. Testing whether the response is related to at least one explanatory variable;
  2. Testing the effect of a given (single) explanatory variable having adjusted for other variables.

In this lecture, we investigate the importance of a group of covariates simultaneously.

Nested Models

A linear model M0 is said to be nested within another model M1 if M0 can be recovered as a special case of M1 by setting the parameters of M1 to the necessary values.

For the paramo data, define model M1 by

\[E[\mbox{N}] = \beta_0 + \beta_1 \mbox{AR} + \beta_2 \mbox{EL} + \beta_3 \mbox{DEc} + \beta_4 \mbox{DNI}\]

Then the model M0 defined by

\[E[\mbox{N}] = \beta_0 + \beta_1 \mbox{AR} + \beta_3 \mbox{DEc}\] is nested within M1 because we can obtain M0 by setting \(\beta_2 = \beta_4 = 0\) in M1.

F Tests for Nested Models

Selecting a model is equivalent to testing hypotheses about parameters of M1 (more complex model).

Comparison of the models can be achieved by testing

H0: \(\beta_j = 0\) for all \(j \in J\) (i.e. M0 adequate); versus

H1: \(\beta_j\) not zero for all \(j \in J\) (i.e. M1 better)

where J indexes the p-q variables that appear in M1 but not M0.

The F-statistic to test these hypotheses is

\[F = \frac{[RSS_{M0} - RSS_{M1}]/(p-q)}{RSS_{M1}/(n-p-1)}\]

As before, extreme, large values of F provide evidence against H0 (hence evidence that we should prefer model M1 to M0).

If H0 is correct then F has an F distribution with (p-q),(n-p-1) degrees of freedom.

Hence if fobs is the observed value of the test statistic, and X is a random variable from an Fp-q,n-p-1 distribution, then the P-value is given by \[P= P(X \ge f_{obs})\]

Interpretation of Test Results

Model Comparison for the Paramo Data

Download paramo.csv

## Paramo <- read.csv(file = "https://r-resources.massey.ac.nz/161221/data/paramo.csv", 
##     header = TRUE, row.names = 1)
Paramo.lm0 <- lm(N ~ AR + DEc, data = Paramo)
anova(Paramo.lm0)
Analysis of Variance Table

Response: N
          Df Sum Sq Mean Sq F value   Pr(>F)   
AR         1 508.92  508.92  12.937 0.004193 **
DEc        1 557.23  557.23  14.165 0.003134 **
Residuals 11 432.71   39.34                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Paramo.lm1 <- lm(N ~ AR + EL + DEc + DNI, data = Paramo)
anova(Paramo.lm1)
Analysis of Variance Table

Response: N
          Df Sum Sq Mean Sq F value   Pr(>F)   
AR         1 508.92  508.92 11.3208 0.008328 **
EL         1  45.90   45.90  1.0211 0.338661   
DEc        1 537.39  537.39 11.9541 0.007189 **
DNI        1   2.06    2.06  0.0457 0.835412   
Residuals  9 404.59   44.95                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Comparison Using R

anova(Paramo.lm0, Paramo.lm1)
Analysis of Variance Table

Model 1: N ~ AR + DEc
Model 2: N ~ AR + EL + DEc + DNI
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     11 432.71                           
2      9 404.59  2     28.12 0.3128 0.7391

Connection with the Omnibus F Test

Connection with T Tests for Single Variables

More Testing for the Paramo Data

Paramo.lm2 <- lm(N ~ AR, data = Paramo)
Paramo.lm3 <- lm(N ~ AR + EL, data = Paramo)
anova(Paramo.lm2, Paramo.lm3)
Analysis of Variance Table

Model 1: N ~ AR
Model 2: N ~ AR + EL
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     12 989.94                           
2     11 944.04  1    45.901 0.5348 0.4799