This lecture is an overview of regression.

Terminology

This course deals with models that are used to explain how one random variable, Y, is affected by one or more other variables x1, x2, …, xp.

Here:

Y is called the response variable;

x1, x2, …, xp are called the explanatory variables, or regressors, or predictors, or covariates.

Regression Versus Correlation

Suppose you observe paired (x,y) data.

Class discussion: How is this different to a regression analysis?

Normally Distributed Responses

We shall typically assume that the distribution of Y follows a normal distribution for any given values of x1, x2 …, xp.

We shall typically assume that:

The model is then:

\[Y \sim N \left ( g(x_1, x_2, \ldots, x_p), \, \sigma^2 \right )\]

where

The model \(Y \sim N \left ( g(x_1, x_2, \ldots, x_p),\, \sigma^2 \right )\) can be expressed equivalently by

\[Y = g(x_1, x_2, \ldots, x_p) + \varepsilon\]

where \(\varepsilon \sim N(0,\, \sigma^2)\)

Notice that the mean (or expected) value of Y for this model is given by E[Y] = g(x1, x2 …, xp)

Linear Models

Usually we will assume that g is a parametric function.

Suppose you have data on a response variable y (e.g. blood pressure) and an explanatory variable x (e.g. a measurement of cholesterol).

What’s So Special About Linear Models in Statistics?

Linear or Non-Linear? That is the Question

Which of the following are linear models?

  1. \(Y \sim N( \beta_0 + \beta_1 x^{\beta_2}, \, \sigma^2)\)

  2. \(Y \sim N( \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3, \, \sigma^2)\)

  3. \(Y \sim N( \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 \log(x_3), \, \sigma^2)\)

This distinction has a massive impact. If your exploratory data analysis (EDA) shows that a relationship of some kind exists, then you can try to transform (re-scale) your variables to make the relationship linear.

Uses of Regression Models

The reason for fitting a model matters. It can determine how we gauge the usefulness of that model.

Descriptive modelling: just interested in better understanding the problem under study.

Prediction: predict the value of Y that will result from particular values of the explanatory variables.

Parameter estimation: want to estimate interpretable model parameters.

Variable screening: want to investigate which explanatory variables have an effect on the response.

Regression and Causation

Regression analyses can be used to examine the association between response and predictor variables.

Possible interpretations of association:

Summary

Regression models seek to represent dependence of a response on explanatory variables.

This course focuses (primarily) on models with a particular linear form.

Typically we will assume that the response is normally distributed.

Linear regression models can be used for description, prediction, parameter estimation and variable screening.

LS0tDQp0aXRsZTogIkxlY3R1cmUgMjogVGhlIEJhc2ljcyBvZiBSZWdyZXNzaW9uIE1vZGVsbGluZyINCnN1YnRpdGxlOiAxNjEuMjUxIFJlZ3Jlc3Npb24gTW9kZWxsaW5nDQphdXRob3I6ICJQcmVzZW50ZWQgYnkgTWF0dGhldyBQYXdsZXkgPE0uUGF3bGV5QG1hc3NleS5hYy5uej4iICANCmRhdGU6ICJXZWVrIDEgb2YgU2VtZXN0ZXIgMiwgYHIgbHVicmlkYXRlOjp5ZWFyKGx1YnJpZGF0ZTo6bm93KCkpYCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlDQogICAgdGhlbWU6IHlldGkNCiAgICBoaWdobGlnaHQ6IHRhbmdvDQogIGh0bWxfbm90ZWJvb2s6DQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KICAgIHRoZW1lOiB5ZXRpDQogICAgaGlnaGxpZ2h0OiB0YW5nbw0KICBpb3NsaWRlc19wcmVzZW50YXRpb246DQogICAgd2lkZXNjcmVlbjogdHJ1ZQ0KICAgIHNtYWxsZXI6IHRydWUNCiAgd29yZF9kb2N1bWVudDogZGVmYXVsdA0KICBzbGlkeV9wcmVzZW50YXRpb246IA0KICAgIHRoZW1lOiB5ZXRpDQogICAgaGlnaGxpZ2h0OiB0YW5nbw0KICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQNCi0tLQ0KDQoNCg0KDQo8IS0tLSBEYXRhIGlzIG9uDQpodHRwczovL3ItcmVzb3VyY2VzLm1hc3NleS5hYy5uei9kYXRhLzE2MTI1MS8NCi0tLT4NCg0KYGBge3Igc2V0dXAsIHB1cmw9RkFMU0UsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KGtuaXRyKQ0Kb3B0c19jaHVuayRzZXQoZGV2PWMoInBuZyIsICJwZGYiKSkNCm9wdHNfY2h1bmskc2V0KGZpZy5oZWlnaHQ9NiwgZmlnLndpZHRoPTcsIGZpZy5wYXRoPSJGaWd1cmVzLyIsIGZpZy5hbHQ9InVubGFiZWxsZWQiKQ0Kb3B0c19jaHVuayRzZXQoY29tbWVudD0iIiwgZmlnLmFsaWduPSJjZW50ZXIiLCB0aWR5PVRSVUUpDQpvcHRpb25zKGtuaXRyLmthYmxlLk5BID0gJycpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoYnJvb20pDQpgYGANCg0KDQo8IS0tLSBEbyBub3QgZWRpdCBhbnl0aGluZyBhYm92ZSB0aGlzIGxpbmUuIC0tLT4NCg0KDQpUaGlzIGxlY3R1cmUgaXMgYW4gb3ZlcnZpZXcgb2YgcmVncmVzc2lvbi4NCg0KIyMgVGVybWlub2xvZ3kNCg0KVGhpcyBjb3Vyc2UgZGVhbHMgd2l0aCBtb2RlbHMgdGhhdCBhcmUgdXNlZCB0byBleHBsYWluIGhvdyBvbmUgcmFuZG9tIHZhcmlhYmxlLCAqWSosIGlzIGFmZmVjdGVkIGJ5IG9uZSBvciBtb3JlIG90aGVyIHZhcmlhYmxlcyAqeH4xfiwgeH4yfiwgLi4uLCB4fnB+Ki4gICAgDQoNCg0KSGVyZToNCiAgICANCiAqWSogaXMgY2FsbGVkIHRoZSByZXNwb25zZSB2YXJpYWJsZTsNCiAgICANCiAqeH4xfiwgeH4yfiwgLi4uLCB4fnB+KiBhcmUgY2FsbGVkIHRoZSBleHBsYW5hdG9yeSB2YXJpYWJsZXMsIG9yIHJlZ3Jlc3NvcnMsIG9yIHByZWRpY3RvcnMsIG9yIGNvdmFyaWF0ZXMuDQoNCi0gIEEgc3RhdGlzdGljYWwgcmVncmVzc2lvbiBtb2RlbCBzcGVjaWZpZXMgaG93IHRoZSBkaXN0cmlidXRpb24gKlkqIGRlcGVuZHMgb24gdGhlIHZhbHVlcyAqeH4xfiwgeH4yfiwgLi4uLCB4fnB+KiAod2hpY2ggYXJlIGFzc3VtZWQgdG8gYmUgZml4ZWQgZm9yIHRoZSBwdXJwb3NlcyBvZiBvdXIgYW5hbHlzZXMpOw0KLSBSZWdyZXNzaW9uIG1vZGVscyBleHByZXNzIHRoZSByZXNwb25zZSBkaXN0cmlidXRpb24gaW4gdGVybXMgb2YgdGhlc2UgdmFsdWVzIGFuZCBhbHNvIG9uZSBvciBtb3JlIHVua25vd24gcGFyYW1ldGVycyB3aGljaCBkZXRlcm1pbmUgdGhlIHJlbGF0aW9uc2hpcC4NCi0gYnV0LCBpbiBhZGRpdGlvbiB0byBSZWdyZXNzaW9uIG1vZGVscyBleHByZXNzaW5nIHRoZSBsaW5lIHJlbGF0aW5nICp5KiB0byAqeCosIHRoZXkgYWxzbyBleHByZXNzIHRoZSBkaXN0cmlidXRpb24gKGkuZS4gcGF0dGVybiwgc3ByZWFkKSBvZiAqeSogdmFsdWVzIGFyb3VuZCB0aGF0IGxpbmUuIA0KDQoNCiMjIFJlZ3Jlc3Npb24gVmVyc3VzIENvcnJlbGF0aW9uDQoNClN1cHBvc2UgeW91IG9ic2VydmUgcGFpcmVkICooeCx5KSogZGF0YS4NCg0KLSBZb3UgY291bGQgZXhhbWluZSByZWxhdGlvbnNoaXAgYmV0d2VlbiAqeCogYW5kICp5KiBieQ0KICAgIGNhbGN1bGF0aW5nIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50ICpyKi4NCg0KQ2xhc3MgZGlzY3Vzc2lvbjogSG93IGlzIHRoaXMgZGlmZmVyZW50IHRvIGEgcmVncmVzc2lvbiBhbmFseXNpcz8NCg0KIyMgTm9ybWFsbHkgRGlzdHJpYnV0ZWQgUmVzcG9uc2VzDQoNCldlIHNoYWxsIHR5cGljYWxseSBhc3N1bWUgdGhhdCB0aGUgZGlzdHJpYnV0aW9uIG9mICpZKiBmb2xsb3dzIGEgbm9ybWFsIGRpc3RyaWJ1dGlvbiBmb3IgYW55IGdpdmVuIHZhbHVlcyBvZiAqeH4xfiwgeH4yfiAuLi4sIHh+cH4qLg0KDQpXZSBzaGFsbCB0eXBpY2FsbHkgYXNzdW1lIHRoYXQ6DQogICAgDQotIHRoZSBtZWFuIG9mIHRoZSBub3JtYWwgZGlzdHJpYnV0aW9uIG9mICpZKiBkb2VzIGRlcGVuZCBvbiAqeH4xfiwgeH4yfiAuLi4sIHh+cH4qLg0KLSB0aGUgdmFyaWFuY2Ugb2YgdGhlIG5vcm1hbCBkaXN0cmlidXRpb24gZG9lcyBub3QgZGVwZW5kIG9uIHRoZSB2YWx1ZXMgb2YgICAgICp4fjF+LCB4fjJ+IC4uLiwgeH5wfiouDQoNClRoZSBtb2RlbCBpcyB0aGVuOg0KDQokJFkgXHNpbSBOIFxsZWZ0ICggZyh4XzEsIHhfMiwgXGxkb3RzLCB4X3ApLCBcLCBcc2lnbWFeMiBccmlnaHQgKSQkDQoNCndoZXJlDQoNCi0gKmcqIGlzIHNvbWUgZnVuY3Rpb24gZm9yICpFW1ldID1nKHh+MX4sIHh+Mn4sIC4uLiwgeH5wfikqLg0KICAgICAgICBOb3RlIHRoYXQgKmcqIHdpbGwgdXN1YWxseSBkZXBlbmQgb24gc29tZSBwYXJhbWV0ZXJzICRcYmV0YV8wLCBcYmV0YV8xLCBcbGRvdHMsIFxiZXRhX3AkLg0KLSAkXG1ib3h7VmFyfShZKSA9IFxzaWdtYV4yJCBpcyB0aGUgcmVzcG9uc2UgdmFyaWFuY2UNCg0KDQpUaGUgbW9kZWwgJFkgXHNpbSBOIFxsZWZ0ICggZyh4XzEsIHhfMiwgXGxkb3RzLCB4X3ApLFwsIFxzaWdtYV4yIFxyaWdodCApJCBjYW4gYmUNCmV4cHJlc3NlZCBlcXVpdmFsZW50bHkgYnkgDQoNCiQkWSA9IGcoeF8xLCB4XzIsIFxsZG90cywgeF9wKSArIFx2YXJlcHNpbG9uJCQNCg0Kd2hlcmUgJFx2YXJlcHNpbG9uIFxzaW0gTigwLFwsIFxzaWdtYV4yKSQNCg0KTm90aWNlIHRoYXQgdGhlIG1lYW4gKG9yICoqZXhwZWN0ZWQqKikgdmFsdWUgb2YgKlkqIGZvciB0aGlzIG1vZGVsIGlzIGdpdmVuIGJ5ICpFW1ldID0gZyh4fjF+LCB4fjJ+IC4uLiwgeH5wfikqDQoNCiMjIExpbmVhciBNb2RlbHMNCg0KVXN1YWxseSB3ZSB3aWxsIGFzc3VtZSB0aGF0ICpnKiBpcyBhIHBhcmFtZXRyaWMgZnVuY3Rpb24uDQoNClN1cHBvc2UgeW91IGhhdmUgZGF0YSBvbiBhICoqcmVzcG9uc2UgdmFyaWFibGUqKiAqeSogKGUuZy4gYmxvb2QgcHJlc3N1cmUpIGFuZCBhbiAqKmV4cGxhbmF0b3J5IHZhcmlhYmxlKiogKngqIChlLmcuIGEgbWVhc3VyZW1lbnQgb2YgY2hvbGVzdGVyb2wpLg0KDQotIFdlIHdhbnQgdG8gbW9kZWwgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSBtZWFuIHZhbHVlIG9mICp5KiwgYW5kICp4Ki4NCg0KLSBXZSBtaWdodCB1c2UgYSBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwsIA0KJCRFW1ldID0gXGJldGFfMCArIFxiZXRhXzEgeCQkIA0Kd2hlcmUgJFxiZXRhXzAkIGFuZCAkXGJldGFfMSQgICBhcmUgbW9kZWwgcGFyYW1ldGVycy4NCg0KLSBUaGlzIGlzIGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgYmVjYXVzZSAkRVtZXSQgaXMgbGluZWFybHkgcmVsYXRlZCB0byB0aGUgcGFyYW1ldGVycyAkXGJldGFfMCQgYW5kICRcYmV0YV8xJCAobm90IGJlY2F1c2UgaXQgaXMgbGluZWFybHkgcmVsYXRlZCB0byAqeCopLg0KDQojIyBXaGF0J3MgU28gU3BlY2lhbCBBYm91dCBMaW5lYXIgTW9kZWxzIGluIFN0YXRpc3RpY3M/DQoNCi0gTGluZWFyIHJlZ3Jlc3Npb24gbW9kZWxzIGFyZSBlYXN5IHRvIGFwcGx5IGFuZCBpbnRlcnByZXQuDQoNCi0gVGhlIG1hdGhlbWF0aWNhbCB0aGVvcnkgdW5kZXJseWluZyBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbHMgaXMgdmVyeSB3ZWxsIHVuZGVyc3Rvb2QuDQoNCi0gV2UgY2FuIGludmVzdGlnYXRlIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBhIHJlc3BvbnNlIGFuZCBsb3RzIG9mIGV4cGxhbmF0b3J5IHZhcmlhYmxlcyBpbiBhIHN0cmFpZ2h0Zm9yd2FyZCBtYW5uZXIuDQoNCi0gQSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCB3aWxsIG9mdGVuIChidXQgbm90IGFsd2F5cykgcHJvdmlkZSBhbiBhZGVxdWF0ZSBhcHByb3hpbWF0aW9uIHRvIHJlYWxpdHkuDQoNCiMjIExpbmVhciBvciBOb24tTGluZWFyPyBUaGF0IGlzIHRoZSBRdWVzdGlvbg0KDQpXaGljaCBvZiB0aGUgZm9sbG93aW5nIGFyZSBsaW5lYXIgIG1vZGVscz8NCg0KMS4gICRZIFxzaW0gTiggXGJldGFfMCArIFxiZXRhXzEgeF57XGJldGFfMn0sIFwsIFxzaWdtYV4yKSQNCg0KMi4gICRZIFxzaW0gTiggXGJldGFfMCArIFxiZXRhXzEgeCArIFxiZXRhXzIgeF4yICsgXGJldGFfMyB4XjMsIFwsIFxzaWdtYV4yKSQNCg0KMy4gICRZIFxzaW0gTiggXGJldGFfMCArIFxiZXRhXzEgeF8xICsgXGJldGFfMiB4XzIgKyBcYmV0YV8zIFxsb2coeF8zKSwgXCwgXHNpZ21hXjIpJA0KDQoNClRoaXMgZGlzdGluY3Rpb24gaGFzIGEgbWFzc2l2ZSBpbXBhY3QuIElmIHlvdXIgZXhwbG9yYXRvcnkgZGF0YSBhbmFseXNpcyAoRURBKSBzaG93cyB0aGF0IGEgcmVsYXRpb25zaGlwIG9mIHNvbWUga2luZCBleGlzdHMsIHRoZW4geW91IGNhbiB0cnkgdG8gdHJhbnNmb3JtIChyZS1zY2FsZSkgeW91ciB2YXJpYWJsZXMgdG8gbWFrZSB0aGUgcmVsYXRpb25zaGlwIGxpbmVhci4NCg0KIyMgVXNlcyBvZiBSZWdyZXNzaW9uIE1vZGVscw0KDQpUaGUgcmVhc29uIGZvciBmaXR0aW5nIGEgbW9kZWwgbWF0dGVycy4gSXQgY2FuIGRldGVybWluZSBob3cgd2UgZ2F1Z2UgdGhlIHVzZWZ1bG5lc3Mgb2YgdGhhdCBtb2RlbC4NCg0KDQpEZXNjcmlwdGl2ZSBtb2RlbGxpbmc6IGp1c3QgaW50ZXJlc3RlZCBpbiBiZXR0ZXINCiAgICB1bmRlcnN0YW5kaW5nIHRoZSBwcm9ibGVtIHVuZGVyIHN0dWR5Lg0KDQpQcmVkaWN0aW9uOiBwcmVkaWN0IHRoZSB2YWx1ZSBvZiAqWSogdGhhdCB3aWxsIHJlc3VsdCBmcm9tIHBhcnRpY3VsYXIgdmFsdWVzIG9mIHRoZSBleHBsYW5hdG9yeSB2YXJpYWJsZXMuDQoNClBhcmFtZXRlciBlc3RpbWF0aW9uOiB3YW50IHRvIGVzdGltYXRlIGludGVycHJldGFibGUgbW9kZWwgcGFyYW1ldGVycy4NCg0KVmFyaWFibGUgc2NyZWVuaW5nOiB3YW50IHRvIGludmVzdGlnYXRlIHdoaWNoIGV4cGxhbmF0b3J5IHZhcmlhYmxlcyBoYXZlIGFuIGVmZmVjdCBvbiB0aGUgcmVzcG9uc2UuDQoNCiMjIFJlZ3Jlc3Npb24gYW5kIENhdXNhdGlvbg0KDQpSZWdyZXNzaW9uIGFuYWx5c2VzIGNhbiBiZSB1c2VkIHRvIGV4YW1pbmUgdGhlIGFzc29jaWF0aW9uIGJldHdlZW4gcmVzcG9uc2UgYW5kIHByZWRpY3RvciB2YXJpYWJsZXMuDQoNClBvc3NpYmxlIGludGVycHJldGF0aW9ucyBvZiBhc3NvY2lhdGlvbjoNCiAgICANCi0gQ2F1c2F0aW9uOiAqeSogZGVwZW5kcyBjYXVzYWxseSBvbiAqeCo7DQogICAgDQotIENvbW1vbiBSZXNwb25zZTogKnkqIGRvZXMgbm90IGRlcGVuZCBjYXVzYWxseSBvbiAqeCo7IGJvdGggKnkqIGFuZCAqeCogYXJlIHJlbGF0ZWQgKHBlcmhhcHMgY2F1c2FsbHkpIHRvICoqbHVya2luZyB2YXJpYWJsZSoqICp6KjsNCiAgICANCi0gQ29uZm91bmRpbmc6ICp4KiBpcyAoc3Ryb25nbHkpIGFzc29jaWF0ZWQgd2l0aCBsdXJraW5nIHZhcmlhYmxlICp6Kiwgc28gaXQgaXMgdGhlbiBpbXBvc3NpYmxlIHRvIHRlbGwgd2hldGhlciAqeSogZGVwZW5kcyBjYXVzYWxseSBvbiAqeCogb3IgKnoqLg0KDQojIyBFc3RhYmxpc2hpbmcgYSBDYXVzYXRpdmUgTGluay4uLg0KDQouLi5pcyBub3QgZWFzeQ0KDQoxLiAgVXNlIGEgY2FyZWZ1bGx5IGRlc2lnbmVkIGV4cGVyaW1lbnQuIFRoaXMgaXMgdGhlIGJhc2lzIG9mIHRoZSBjb3Vyc2UgMTYxLjIyMiB0YXVnaHQgaW4gU2VtZXN0ZXIgMi4NCg0KMi4gIElmIG5vdCBwb3NzaWJsZSB0byBjb25kdWN0IGFuIGV4cGVyaW1lbnQsIHRoZW4gdGhlIGZvbGxvd2luZyBxdWVzdGlvbnMgc2hvdWxkIGhlbHA6DQogICAgDQogICAgICAtIElzIHRoZSBhc3NvY2lhdGlvbiBiZXR3ZWVuIHRoZSB2YXJpYWJsZXMgc3Ryb25nPw0KICAgICAgLSBJcyB0aGUgYXNzb2NpYXRpb24gY29uc2lzdGVudD8NCiAgICAgIC0gQXJlIGhpZ2hlciBkb3NlcyBhc3NvY2lhdGVkIHdpdGggc3Ryb25nZXIgcmVzcG9uc2VzPw0KICAgICAgLSBEbyB0aGUgYWxsZWdlZCBjYXVzZXMgcHJlY2VkZSB0aGUgZWZmZWN0IGluIHRpbWU/DQogICAgICAtIElzIHRoZSBhbGxlZ2VkIGNhdXNlIHBsYXVzaWJsZT8NCiAgICANCklmICp5ZXMqIHRvIGFsbCB0aGVuIGEgY2F1c2FsIGxpbmsgc2VlbXMgcHJvYmFibGUuIEUuZy4gYSBjYXVzYWwgbGluayBiZXR3ZWVuIGx1bmcgY2FuY2VyIGFuZCBzbW9raW5nIG5vdyBmdWxseQ0KICAgICAgICBhY2NlcHRlZCBieSBzY2llbnRpc3RzIGJlY2F1c2UgYWJvdmUgY3JpdGVyaWEgc2F0aXNmaWVkLg0KDQojIyBTdW1tYXJ5DQoNClJlZ3Jlc3Npb24gbW9kZWxzIHNlZWsgdG8gcmVwcmVzZW50IGRlcGVuZGVuY2Ugb2YgYSByZXNwb25zZSBvbiBleHBsYW5hdG9yeSB2YXJpYWJsZXMuDQoNClRoaXMgY291cnNlIGZvY3VzZXMgKHByaW1hcmlseSkgb24gbW9kZWxzIHdpdGggYSBwYXJ0aWN1bGFyIGxpbmVhciBmb3JtLg0KDQpUeXBpY2FsbHkgd2Ugd2lsbCBhc3N1bWUgdGhhdCB0aGUgcmVzcG9uc2UgaXMgbm9ybWFsbHkgZGlzdHJpYnV0ZWQuDQoNCkxpbmVhciByZWdyZXNzaW9uIG1vZGVscyBjYW4gYmUgdXNlZCBmb3IgZGVzY3JpcHRpb24sIHByZWRpY3Rpb24sIHBhcmFtZXRlciBlc3RpbWF0aW9uIGFuZCB2YXJpYWJsZSBzY3JlZW5pbmcuDQo=