Linear regression models can be conveniently expressed using matrix
notation.
In this lecture, we will see how results for linear models are much
more easily derived and understood using matrix notation than without
it.
Also note that the matrix approach is what is being done in the
background by all good statistical software including R.
Prediction and The Hat Matrix
The vector of fitted values is given by \(\hat{\boldsymbol{\mu}} = X
\hat{\boldsymbol{\beta}} = X (X^TX)^{-1} X^T \boldsymbol{y}\) and
the vector of residuals by \(\boldsymbol{e} =
\boldsymbol{y} - \hat{\boldsymbol{\mu}}\).
The equation for the fitted values just given, can be re-written
\(\hat{\boldsymbol{\mu}} = H \boldsymbol{y} =
\hat{\boldsymbol{y}}\) where H = X
(XTX)-1XT is often called the
hat matrix because it “puts hats on things”!
We have the equality \(\hat{\boldsymbol{y}}
= \hat{\boldsymbol{\mu}}\) because we use the same values for
prediction and mean estimation.
Covariance Matrices
The variance-covariance (or simple covariance, or dispersion
matrix) has variances down the diagonal and covariances off the
diagonal.
It can be shown that for a matrix M and random vector
\(\boldsymbol{z}\) (of appropriate
dimensions) \[\mbox{Var} (M\boldsymbol{z}) =
M \mbox{Var} (\boldsymbol{z}) M^T.\]
The Covariance Matrix for \(\hat{\boldsymbol{\beta}}\)
The covariance matrix of \(\hat{\boldsymbol{\beta}}\) is
\[\begin{aligned}
\mbox{Var}(\hat{\boldsymbol{\beta}}) = \mbox{Var}[ (X^TX)^{-1} X^T
\boldsymbol{y} ] &=& (X^TX)^{-1} X^T \mbox{Var}(\boldsymbol{y})
[(X^TX)^{-1} X^T]^T\\
&=& (X^TX)^{-1} X^T \sigma^2 I [(X^TX)^{-1} X^T]^T\\
&=& \sigma^2 (X^TX)^{-1} X^TX (X^T X)^{-1}\\
&=& \sigma^2 (X^TX)^{-1}\end{aligned}\]
We can use \(\mbox{Var}(\boldsymbol{y}) =
\sigma^2 I\) since the responses are independent and hence
uncorrelated.
The leading diagonal of this matrix is the basis for the standard
errors of our parameter estimates seen in our regression output. is
