Download the R markdown file for this lecture.

Linear regression models can be conveniently expressed using matrix notation.

In this lecture, we will now see how results for linear models are much more easily derived and understood using matrix notation than without it.

Also note that the matrix approach is what is being done in the background by all good statistical software.

Matrix Formulation of the Linear Model

\[\boldsymbol{y} = X {\boldsymbol{\beta}} + {\boldsymbol{\varepsilon}} \label{eq:matrixLM}\]

where \(\boldsymbol{y}\) is the response vector, X is the design matrix, \({\boldsymbol{\beta}}\) is the vector of p+1 regression parameters, and \({\boldsymbol{\varepsilon}}\) is the vector of n error terms.

\(\boldsymbol{y} = \left [ \begin{array}{c} y_1\\ y_2\\ \vdots\\ y_n \end{array} \right ]\), \(X = \left [ \begin{array}{cccc} 1 & x_{11} & \ldots & x_{1p}\\ 1 & x_{21} & \ldots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n1} & \ldots & x_{np} \end{array} \right ]\), \(\boldsymbol{\beta} = \left [ \begin{array}{c} \beta_0\\ \beta_1\\ \vdots\\ \beta_p\\ \end{array} \right ]\), and \(\boldsymbol{\varepsilon} = \left [ \begin{array}{c} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{array} \right ]\)

The mean (expected) value of the random vector \(\boldsymbol{y}\) is \[\begin{aligned} \boldsymbol{\mu} &=& E[\boldsymbol{y}]\\ &=& \left [ \begin{array}{c} E[y_1] \\ E[y_2] \\ \vdots \\ E[y_n] \end{array} \right ] \\ &=& E[X {\boldsymbol{\beta}} + {\boldsymbol{\varepsilon}}]\\ &=& X {\boldsymbol{\beta}}\end{aligned}\]

Least Squares Estimation by Matrices

For observed responses \(\boldsymbol{y}\) the sum of squared errors can be written as \[SS({\boldsymbol{\beta}}) = (\boldsymbol{y} - X {\boldsymbol{\beta}})^T (\boldsymbol{y} - X {\boldsymbol{\beta}}).\]

  • This can be minimised using multivariate (vector) calculus to give the least squares estimates as \[\hat{{\boldsymbol{\beta}}} = (X^TX)^{-1} X^T \boldsymbol{y}.\]

Simple Linear Regression in matrix form

For a simple linear regression model the design matrix is \[X = \left [ \begin{array}{cc} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n} \end{array} \right ]\] If we observe responses \(\boldsymbol{y}\), the least squares estimate of \({\boldsymbol{\beta}}\) is: \[\begin{aligned} \hat{\boldsymbol{\beta}} &=& (X^T X)^{-1} X^T \boldsymbol{y} = \left [ \begin{array}{cc} n & n \bar{x}\\ n\bar{x} & \sum_i x_i^2 \end{array} \right ]^{-1} \left [ \begin{array}{cccc} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_n \end{array} \right ] \left [ \begin{array}{c} y_1\\ y_2\\ \vdots\\ y_n \end{array} \right ] \\ &=& \frac{1}{n s_{xx}} \left [ \begin{array}{cc} \sum_i x_i^2 & - n \bar{x}\\ - n\bar{x} & n \end{array} \right ] \left [ \begin{array}{c} n \bar{y}\\ \sum_i x_i y_i \end{array} \right ] = \frac{1}{s_{xx}} \left [ \begin{array}{c} \bar{y} \sum_i x_i^2 - \bar{x} \sum_i{x_i y_i} \\ s_{xy} \end{array} \right ]\end{aligned}\]

Prediction and The Hat Matrix

The vector of fitted values is given by \(\hat{\boldsymbol{\mu}} = X \hat{\boldsymbol{\beta}} = X (X^TX)^{-1} X^T \boldsymbol{y}\) and the vector of residuals by \(\boldsymbol{e} = \boldsymbol{y} - \hat{\boldsymbol{\mu}}\).

The equation for the fitted values just given, can be re-written \(\hat{\boldsymbol{\mu}} = H \boldsymbol{y} = \hat{\boldsymbol{y}}\) where H = X (XTX)-1XT is often called the hat matrix because it “puts hats on things”!

We have the equality \(\hat{\boldsymbol{y}} = \hat{\boldsymbol{\mu}}\) because we use the same values for prediction and mean estimation.

Covariance Matrices

The Covariance Matrix for \(\hat{\boldsymbol{\beta}}\)

The covariance matrix of \(\hat{\boldsymbol{\beta}}\) is

\[\begin{aligned} \mbox{Var}(\hat{\boldsymbol{\beta}}) = \mbox{Var}[ (X^TX)^{-1} X^T \boldsymbol{y} ] &=& (X^TX)^{-1} X^T \mbox{Var}(\boldsymbol{y}) [(X^TX)^{-1} X^T]^T\\ &=& (X^TX)^{-1} X^T \sigma^2 I [(X^TX)^{-1} X^T]^T\\ &=& \sigma^2 (X^TX)^{-1} X^TX (X^T X)^{-1}\\ &=& \sigma^2 (X^TX)^{-1}\end{aligned}\]

We can use \(\mbox{Var}(\boldsymbol{y}) = \sigma^2 I\) since the responses are independent and hence uncorrelated.