Download the R markdown file for this lecture.

This lecture is an overview of regression and linear models.

Terminology

This course deals with models that are used to explain how one random variable, Y, is affected by one or more other variables x1, x2, …, xp.

Here:

Y is called the response variable;

x1, x2, …, xp are called the explanatory variables, or regressors, or predictors, or covariates.

Regression Versus Correlation

Suppose you observe paired (x,y) data.

Class discussion: How is this different to a regression analysis?

Normally Distributed Responses

We shall typically assume that the distribution of Y follows a normal distribution for any given values of x1, x2 …, xp.

We shall typically assume that:

The model is then:

\[Y \sim N \left ( g(x_1, x_2, \ldots, x_p), \, \sigma^2 \right )\]

where

The model \(Y \sim N \left ( g(x_1, x_2, \ldots, x_p),\, \sigma^2 \right )\) can be expressed equivalently by

\[Y = g(x_1, x_2, \ldots, x_p) + \varepsilon\]

where \(\varepsilon \sim N(0,\, \sigma^2)\)

Notice that the mean (or expected) value of Y for this model is given by E[Y] = g(x1, x2 …, xp)

Linear Models

Usually we will assume that g is a parametric function.

Suppose you have data on a response variable y (e.g. blood pressure) and an explanatory variable x (e.g. a measurement of cholesterol).

What’s So Special About Linear Models in Statistics?

Linear or Non-Linear? That is the Question

Which of the following are linear models?

  1. \(Y \sim N( \beta_0 + \beta_1 x^{\beta_2}, \, \sigma^2)\)

  2. \(Y \sim N( \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3, \, \sigma^2)\)

  3. \(Y \sim N( \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 \log(x_3), \, \sigma^2)\)

Uses of Regression Models

Descriptive modelling: just interested in better understanding the problem under study.

Prediction: predict the value of Y that will result from particular values of the explanatory variables.

Parameter estimation: want to estimate interpretable model parameters.

Variable screening: want to investigate which explanatory variables have an effect on the response.

Regression and Causation

Regression analyses can be used to examine the association between response and predictor variables.

Possible interpretations of association:

Summary

Regression models seek to represent dependence of a response on explanatory variables.

This course focuses (primarily) on models with a particular linear form.

Typically we will assume that the response is normally distributed.

Linear regression models can be used for description, prediction, parameter estimation and variable screening.