Download the R markdown file for this lecture.

We saw previously that when linear models are fitted to time series data, the residuals can display autocorrelation.

When this occurs, it suggests that the model errors are not independent.

In this lecture we shall examine:

Consequences of Ignoring Autocorrelation in the Errors

Usually we fit a linear model to data by the method of least squares.

Under standard assumptions (A1)-(A4), the least squares estimators are unbiased for the regression parameters. In other words, \(E[\hat \beta_i] = \beta_i\) for i=1,…,p.

This unbiasedness property holds even if the error terms are autocorrelated.

However, standard errors given by standard least squares theory (and hence by standard R output) will usually be smaller than the true standard deviations of the parameter estimates.

The result is that t-test statistics will be inflated, and confidence intervals will be too narrow.

Accounting for Autocorrelation in the Errors

The previous discussion indicates that we need to account for autocorrelation in the errors when it is present.

This will require two steps:

We will turn first to modelling for autocorrelated errors using autoregressive models.

Autoregressive Models

In a time series, the errors \(\varepsilon_1, \varepsilon_2, \ldots, \varepsilon_n\) can be thought of as a realization of a random process (in time).

We can model serial correlation between these errors by making each one explicitly dependent on the previous one: \[\varepsilon_t = \phi \varepsilon_{t-1} + \eta_t\] where \(\eta_t~(t=1,2,\ldots,n)\) is white noise, that is, a sequence of independent and identically distributed (normal) random variables with zero mean.

ACF Plots for AR1 Processes

To illustrate an ACF plot for a function demonstrating autocorrelation, we create two new variables from white noise

Z1 = rnorm(1000)
X1 = Z1[-1] + 0.85 * Z1[-1000]
X2 = Z1[-1] - 0.85 * Z1[-1000]
par(mfrow = c(2, 2))
plot(X1, type = "l")
acf(X1)
plot(X2, type = "l")
acf(X2)

The X1 series has correlation of 0.51 at lag1, while the X2 series has autocorrelation of -0.47 at lag1

Autoregressive Models for Errors in Linear Modelling

If residuals in a linear model for time series data display autocorrelation then we change the model to allow for an AR1 correlation structure for the residuals in our model to account for this.

N.B. More complex models are also possible. For example:

Application to the Tourism Data

Download Motel.csv

## Tourism <- read.csv(file = "Motel.csv", header = TRUE)
Tourism$Yr = Tourism$Year - 1979 + Tourism$Month/12
Tourism$Month <- factor(Tourism$Month)
Tourism.lm <- lm(RoomNights ~ Yr + Month, data = Tourism)
acf(resid(Tourism.lm))

There is a suggestion of exponential decay in autocorrelations. - AR1 model (with positive \(\phi\)) for errors seems reasonable.

The model incorporating the AR1 error process, and written in regression format, is as follows:

\[\begin{aligned} Y_{t} &=& \beta_0 + \beta_1 \mbox{Yr} + \beta_2 z_{t,2} + \beta_3 z_{t,3} + \ldots + \beta_{12} z_{t,12} + \varepsilon_t \\ \varepsilon_t &=& \phi \varepsilon_{t-1} + \eta_{t}\end{aligned}\]

where:

Model Fitting with Autoregressive Errors

To fit a model with AR1 errors, we must:

The AR1 parameter, \(\phi\), can be estimated by the method of (restricted) maximum likelihood.

The details are beyond the scope of this paper, but you will meet maximum likelihood estimation if you take 161.200: Statistical Modelling.

Once \(\phi\) is estimated, we can account for the AR1 nature of the errors by fitting the linear model using generalized least squares.

Generalized Least Squares in R

The mathematical details of generalized least squares will not be covered in this paper.

In R, linear models can be fitted using this technique using the gls() command found in the nlme add-on package. Call this package using library(nlme) before using the gls() command.

Generalized Least Squares and the Tourism Data

Download Motel.csv

## Tourism <- read.csv(file = "Motel.csv", header = TRUE)
Tourism$Yr = Tourism$Year - 1979 + Tourism$Month/12
Tourism$Month <- factor(Tourism$Month)
Tourism.gls <- gls(RoomNights ~ Yr + Month, correlation = corAR1(), data = Tourism)
anova(Tourism.gls)
Denom. DF: 167 
            numDF  F-value p-value
(Intercept)     1 45239.15  <.0001
Yr              1  1092.85  <.0001
Month          11    81.94  <.0001
summary(Tourism.gls)
Generalized least squares fit by REML
  Model: RoomNights ~ Yr + Month 
  Data: Tourism 
       AIC      BIC    logLik
  3722.941 3769.711 -1846.471

Correlation Structure: AR(1)
 Formula: ~1 
 Parameter estimate(s):
     Phi 
0.392721 

Coefficients:
                Value Std.Error   t-value p-value
(Intercept) 261682.96  4946.534  52.90229  0.0000
Yr           12747.00   382.435  33.33115  0.0000
Month2      -31194.09  4205.044  -7.41825  0.0000
Month3       23482.73  4959.531   4.73487  0.0000
Month4       -2170.47  5225.494  -0.41536  0.6784
Month5      -19419.12  5325.710  -3.64630  0.0004
Month6      -65827.11  5362.987 -12.27434  0.0000
Month7      -44522.34  5373.207  -8.28599  0.0000
Month8      -29768.74  5365.559  -5.54812  0.0000
Month9      -12039.27  5331.948  -2.25795  0.0252
Month10      26393.95  5238.620   5.03834  0.0000
Month11      16471.67  4988.022   3.30224  0.0012
Month12     -58631.94  4272.401 -13.72342  0.0000

 Correlation: 
        (Intr) Yr     Month2 Month3 Month4 Month5 Month6 Month7 Month8 Month9
Yr      -0.640                                                               
Month2  -0.424  0.003                                                        
Month3  -0.498 -0.001  0.589                                                 
Month4  -0.522 -0.005  0.462  0.659                                          
Month5  -0.529 -0.011  0.416  0.535  0.682                                   
Month6  -0.529 -0.016  0.398  0.487  0.560  0.690                            
Month7  -0.526 -0.022  0.391  0.468  0.512  0.568  0.692                     
Month8  -0.521 -0.028  0.387  0.460  0.492  0.520  0.571  0.692              
Month9  -0.514 -0.034  0.383  0.454  0.481  0.498  0.520  0.569  0.690       
Month10 -0.501 -0.039  0.376  0.445  0.470  0.482  0.493  0.513  0.561  0.683
Month11 -0.473 -0.044  0.358  0.423  0.446  0.456  0.463  0.471  0.491  0.538
Month12 -0.400 -0.048  0.308  0.364  0.384  0.392  0.396  0.400  0.407  0.425
        Mnth10 Mnth11
Yr                   
Month2               
Month3               
Month4               
Month5               
Month6               
Month7               
Month8               
Month9               
Month10              
Month11  0.662       
Month12  0.471  0.597

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.5656600 -0.6285951 -0.1307054  0.5424996  3.2823396 

Residual standard error: 14799.38 
Degrees of freedom: 180 total; 167 residual

Comments on the Analysis

The use of an AR1 error process is obtained by setting
correlation=corAR1()
in the gls() command.

Error variance and correlation estimated by REML (restricted maximum likelihood).

Estimated value of AR1 correlation parameter is \(\hat \phi = 0.393\).

Both Yr and Month are associated with RoomNights (P-value less than \(0.0001\) in both cases).

Comparison of the output from summary(Tourism.gls) with the corresponding output from the earlier analysis (assuming independent errors) shows that the standard errors are usually larger here, particularly for the intercept and the coefficient of Yr.

The only remaining question is, was the use of gls() successful?