In this lecture we will look at simple linear regression modelling in
R.
We shall use R to fit models (i.e. estimate the unknown parameters in
models).
We shall discuss the interpretation of output from R.
Analysis of the Pulse Data:
## PulseData <- read.csv(file = "https://r-resources.massey.ac.nz/data/161251/pulse.csv",
## header = TRUE)
summary(PulseData)
Height Pulse
Min. :145.0 Min. : 64.0
1st Qu.:162.2 1st Qu.: 80.0
Median :169.0 Median : 80.0
Mean :168.7 Mean : 82.3
3rd Qu.:175.0 3rd Qu.: 84.0
Max. :185.0 Max. :116.0
Download pulse.csv
It is common to use names for your models that make sense, especially
to the person you will work with the most in future (you, yourself!).
Remember, your future self will be pretty annoyed that your current self
isn’t going to be able to answer questions about your funny selections
of names, so go easy on yourself by being clear with your work.
The most common convention is to use a name that shows what data was
used and what type of model was created. We will see
<data>.lm
lots in this course!
Pulse.lm <- lm(Pulse ~ 1 + Height, data = PulseData)
summary(Pulse.lm)
Call:
lm(formula = Pulse ~ 1 + Height, data = PulseData)
Residuals:
Min 1Q Median 3Q Max
-16.666 -4.876 -1.520 3.424 33.012
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.9069 22.8793 2.050 0.0458 *
Height 0.2098 0.1354 1.549 0.1279
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.811 on 48 degrees of freedom
Multiple R-squared: 0.04762, Adjusted R-squared: 0.02778
F-statistic: 2.4 on 1 and 48 DF, p-value: 0.1279
The most commonly sought numbers in that output are the
coefficients
(Intercept) Height
46.906927 0.209774
But in simple regression, we often want to see how these coefficients
create a line, and how that line looks alongside our data. We can do
this in a simple plot()
or a fancy ggplot()
.
N.B. Other visualisations also exist! Make an active choice.
plot(Pulse ~ Height, data = PulseData, ylab = "Resting pulse rate (beats per minute)",
xlab = "Height (in centimeters) ")
abline(Pulse.lm)
library(ggplot2)
PulseData |>
ggplot(mapping = aes(x = Height, y = Pulse)) + geom_point() + geom_smooth(method = "lm",
se = FALSE) + ylab("Resting pulse rate (beats per minute)") + xlab("Height (in centimeters) ")
`geom_smooth()` using formula = 'y ~ x'
Interpretation of Model for Pulse Data
The summary table of coefficients contains much relevant information.
We can get this part of the summary using the tidy()
command.
tidy(Pulse.lm) |>
kable()
(Intercept) |
46.906927 |
22.8793281 |
2.050188 |
0.0458292 |
Height |
0.209774 |
0.1354041 |
1.549245 |
0.1278917 |
The kable()
command here makes the pretty table for
presentation; you wouldn’t use it in an interactive situation.
The fitted model (i.e. model with parameters replaced by estimates)
is \[E[\mbox{Pulse}] = 46.91 + 0.21
~\mbox{Height}\]
Residual standard error
The error standard deviation, \(\sigma\), is estimated by s=8.811
(the residual standard error) according to the
output.
The s is a very useful statistic, and highly interpretable.
This is because about 95% of residuals will be within about
(formula explained later) \(\pm
t_{(n-2)}(0.025) ~~s \approx \pm 2~s\).
This converts to meaning that about 95% of data values \(y_i\) will be within \(2~ s\) above or below the line.
In the present case \(s\approx 8.8\)
so most pulses will be within about \(\pm
17.6\) beats above or below the predicted value on the line. This
simple mental calculation gives us an idea how accurate and useful our
regression is likely to be for prediction (So, in fact, not very
accurate or useful at all, in this example!)
It also means we can do a simple graph to help us find y
values that are unusual (We will refine this graph
later)
`geom_smooth()` using formula = 'y ~ x'
The graph illustrates that the model is really pretty poor, and also
that there are two individuals whose pulse is much higher than is
typical.
R2 for the Pulse Data
The (multiple) R-squared statistic, R2, is the
square of the correlation between the observed and fitted responses.
It also has another name, the coefficient of
determination.
It can be interpreted as the proportion of variation in the response
that is explained by the predictor in the model. (In other words, how
much the y values are determined (fixed) by the regression
model)
Hence Height
explains just R2 = 4.8%
of the variation in the response according to the fitted model.
R hint:
You can find what different parts of the output are called, by
applying the names()
function to the summary()
of a linear model you’ve fitted.
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
summary(Pulse.lm)$r.squared
[1] 0.04762205
[1] 8.810716
These “names” don’t change from model to model.
The estimated slope
The slope estimate, \(\hat{\beta_1} =
0.209774\) has associated standard error \(SE(\hat \beta_1) = 0.1354041.\)
For testing H0: \(\beta_1 = 0\) versus
H1: \(\beta_1 \ne
0\), the t-test statistic is
\[t = \hat \beta_1 / SE(\hat \beta_1) =
0.210/0.135 = 1.5492447\]
The corresponding P-value is P=0.1278917, just like
when we did the analysis by hand.
We conclude that the data provide no evidence of a (linear)
relationship between resting pulse rate and height.
A nice presentation of this can be obtained using:
Pulse.lm |>
tidy() |>
kable()
(Intercept) |
46.906927 |
22.8793281 |
2.050188 |
0.0458292 |
Height |
0.209774 |
0.1354041 |
1.549245 |
0.1278917 |
Confidence intervals for parameters
Generating the confidence interval for the parameters of our
regression model is pretty simple and is achieved using the
confint()
command.
2.5 % 97.5 %
(Intercept) 0.90495433 92.9088990
Height -0.06247409 0.4820221
This basic implementation of the confint()
command has
found 95% confidence intervals for both the slope and intercept
parameters.
In this context, the interval for \(\beta_0\) is irrelevant, because there is
no point trying to find a CI for y = Pulse when a person’s
x = Height is zero.
Asking for only the slope’s interval, and adjusting the level of
confidence are fairly easily achieved.
Even so, the interval for the slope parameter includes zero. Many
people prefer to use confidence intervals over hypothesis tests,
especially when communicating their findings.
LS0tDQp0aXRsZTogIkxlY3R1cmUgNDogTGluZWFyIFJlZ3Jlc3Npb24gTW9kZWxsaW5nIHdpdGggUiINCnN1YnRpdGxlOiAxNjEuMjUxIFJlZ3Jlc3Npb24gTW9kZWxsaW5nDQphdXRob3I6ICJQcmVzZW50ZWQgYnkgTWF0dGhldyBQYXdsZXkgPE0uUGF3bGV5QG1hc3NleS5hYy5uej4iICANCmRhdGU6ICJXZWVrIDIgb2YgU2VtZXN0ZXIgMiwgYHIgbHVicmlkYXRlOjp5ZWFyKGx1YnJpZGF0ZTo6bm93KCkpYCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlDQogICAgdGhlbWU6IHlldGkNCiAgICBoaWdobGlnaHQ6IHRhbmdvDQogIGh0bWxfbm90ZWJvb2s6DQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KICAgIHRoZW1lOiB5ZXRpDQogICAgaGlnaGxpZ2h0OiB0YW5nbw0KICBpb3NsaWRlc19wcmVzZW50YXRpb246DQogICAgd2lkZXNjcmVlbjogdHJ1ZQ0KICAgIHNtYWxsZXI6IHRydWUNCiAgd29yZF9kb2N1bWVudDogZGVmYXVsdA0KICBzbGlkeV9wcmVzZW50YXRpb246IA0KICAgIHRoZW1lOiB5ZXRpDQogICAgaGlnaGxpZ2h0OiB0YW5nbw0KICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQNCi0tLQ0KDQoNCg0KDQo8IS0tLSBEYXRhIGlzIG9uDQpodHRwczovL3ItcmVzb3VyY2VzLm1hc3NleS5hYy5uei9kYXRhLzE2MTI1MS8NCi0tLT4NCg0KYGBge3Igc2V0dXAsIHB1cmw9RkFMU0UsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KGtuaXRyKQ0Kb3B0c19jaHVuayRzZXQoZGV2PWMoInBuZyIsICJwZGYiKSkNCm9wdHNfY2h1bmskc2V0KGZpZy5oZWlnaHQ9NiwgZmlnLndpZHRoPTcsIGZpZy5wYXRoPSJGaWd1cmVzLyIsIGZpZy5hbHQ9InVubGFiZWxsZWQiKQ0Kb3B0c19jaHVuayRzZXQoY29tbWVudD0iIiwgZmlnLmFsaWduPSJjZW50ZXIiLCB0aWR5PVRSVUUpDQpvcHRpb25zKGtuaXRyLmthYmxlLk5BID0gJycpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoYnJvb20pDQpgYGANCg0KDQo8IS0tLSBEbyBub3QgZWRpdCBhbnl0aGluZyBhYm92ZSB0aGlzIGxpbmUuIC0tLT4NCg0KYGBge3IgZXh0cmFMaWJzLCBpbmNsdWRlPUZBTFNFfQ0KbGlicmFyeShnZ3Bsb3QyKQ0KYGBgDQoNCkluIHRoaXMgbGVjdHVyZSB3ZSB3aWxsIGxvb2sgYXQgc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsbGluZyBpbiBSLg0KDQpXZSBzaGFsbCB1c2UgUiB0byBmaXQgbW9kZWxzIChpLmUuIGVzdGltYXRlIHRoZSB1bmtub3duICBwYXJhbWV0ZXJzIGluIG1vZGVscykuDQoNCldlIHNoYWxsIGRpc2N1c3MgdGhlIGludGVycHJldGF0aW9uIG9mIG91dHB1dCBmcm9tIFIuDQoNCiMjIFRoZSBgbG0oKWAgQ29tbWFuZA0KDQpSIHVzZXMgdGhlICBsaW5lYXIgIG1vZGVsIGNvbW1hbmQgdG8gZml0DQogICAgbW9kZWxzIG9mIHRoaXMgdHlwZS4NCg0KVGhlIGJhc2ljIHN5bnRheCBpcyBgbG0oZm9ybXVsYSwgZGF0YSlgICAgICAgICAgd2hlcmU6DQogICAgDQotIGBmb3JtdWxhYCBpcyB0aGUgbW9kZWwgZm9ybXVsYSAocmVxdWlyZWQgYXJndW1lbnQpDQogICAgLSBgZGF0YWAgaXMgdGhlIGRhdGEgZnJhbWUgaW4gdXNlIChvcHRpb25hbCkuDQogICAgLSBWYXJpYWJsZXMgYXJlIHRha2VuIGZyb20geW91ciBzdG9yZWQgb2JqZWN0cyBpZiBubyBkYXRhIGZyYW1lIGlzDQogICAgICAgIHNwZWNpZmllZC4NCg0KIyMgQW5hbHlzaXMgb2YgdGhlIFB1bHNlIERhdGE6IA0KDQoNCmBgYHtyIGdldFB1bHNlRGF0YSwgZWNobz0tMSwgZXZhbD0tMn0NClB1bHNlRGF0YSA8LSByZWFkLmNzdihmaWxlPSIuLi8uLi9kYXRhL3B1bHNlLmNzdiIsIGhlYWRlcj1UKQ0KUHVsc2VEYXRhIDwtIHJlYWQuY3N2KGZpbGU9Imh0dHBzOi8vci1yZXNvdXJjZXMubWFzc2V5LmFjLm56L2RhdGEvMTYxMjUxL3B1bHNlLmNzdiIsIGhlYWRlcj1UUlVFKQ0Kc3VtbWFyeShQdWxzZURhdGEpIA0KYGBgDQoNCmByIHhmdW46OmVtYmVkX2ZpbGUoIi4uLy4uL2RhdGEvcHVsc2UuY3N2IilgIA0KDQpJdCBpcyBjb21tb24gdG8gdXNlIG5hbWVzIGZvciB5b3VyIG1vZGVscyB0aGF0IG1ha2Ugc2Vuc2UsIGVzcGVjaWFsbHkgdG8gdGhlIHBlcnNvbiB5b3Ugd2lsbCB3b3JrIHdpdGggdGhlIG1vc3QgaW4gZnV0dXJlICh5b3UsIHlvdXJzZWxmISkuIFJlbWVtYmVyLCB5b3VyIGZ1dHVyZSBzZWxmIHdpbGwgYmUgcHJldHR5IGFubm95ZWQgdGhhdCB5b3VyIGN1cnJlbnQgc2VsZiBpc24ndCBnb2luZyB0byBiZSBhYmxlIHRvIGFuc3dlciBxdWVzdGlvbnMgYWJvdXQgeW91ciBmdW5ueSBzZWxlY3Rpb25zIG9mIG5hbWVzLCBzbyBnbyBlYXN5IG9uIHlvdXJzZWxmIGJ5IGJlaW5nIGNsZWFyIHdpdGggeW91ciB3b3JrLg0KDQpUaGUgbW9zdCBjb21tb24gY29udmVudGlvbiBpcyB0byB1c2UgYSBuYW1lIHRoYXQgc2hvd3Mgd2hhdCBkYXRhIHdhcyB1c2VkIGFuZCB3aGF0IHR5cGUgb2YgbW9kZWwgd2FzIGNyZWF0ZWQuIFdlIHdpbGwgc2VlIGA8ZGF0YT4ubG1gIGxvdHMgaW4gdGhpcyBjb3Vyc2UhDQoNCg0KYGBge3IgUHVsc2UubG19DQpQdWxzZS5sbSA8LSBsbShQdWxzZX4xK0hlaWdodCwgZGF0YT1QdWxzZURhdGEpDQpzdW1tYXJ5KFB1bHNlLmxtKQ0KYGBgDQoNClRoZSBtb3N0IGNvbW1vbmx5IHNvdWdodCBudW1iZXJzIGluIHRoYXQgb3V0cHV0IGFyZSB0aGUgY29lZmZpY2llbnRzDQoNCmBgYHtyIGdldENvZWZzfQ0KY29lZihQdWxzZS5sbSkNCmBgYA0KDQpCdXQgaW4gc2ltcGxlIHJlZ3Jlc3Npb24sIHdlIG9mdGVuIHdhbnQgdG8gc2VlIGhvdyB0aGVzZSBjb2VmZmljaWVudHMgY3JlYXRlIGEgbGluZSwgYW5kIGhvdyB0aGF0IGxpbmUgbG9va3MgIGFsb25nc2lkZSBvdXIgZGF0YS4gV2UgY2FuIGRvIHRoaXMgaW4gYSBzaW1wbGUgYHBsb3QoKWAgb3IgYSBmYW5jeSAgIGBnZ3Bsb3QoKWAuIE4uQi4gT3RoZXIgdmlzdWFsaXNhdGlvbnMgYWxzbyBleGlzdCEgTWFrZSBhbiBhY3RpdmUgY2hvaWNlLg0KDQpgYGB7ciBQdWxzZUZpdHRlZExpbmUsIGZpZy5hbHQ9ImJhc2UgdmVyc2lvbiIsIGZpZy5jYXA9IlBsb3Qgb2YgcmVzdGluZyBwdWxzZSByYXRlIChiZWF0cyBwZXIgbWludXRlKSB2ZXJzdXMgaGVpZ2h0IChpbiBjZW50aW1ldGVycykgd2l0aCBmaXR0ZWQgbGluZSBhZGRlZCwgZm9yIGEgc2FtcGxlIG9mIDUwIGhvc3BpdGFsIHBhdGllbnRzLiBTb3VyY2U6IEEgSGFuZGJvb2sgb2YgU21hbGwgRGF0YSBTZXRzIGJ5IEhhbmQsIERhbHksIEx1bm4sIE1jQ29ud2F5IGFuZCBPc3Ryb3dza2kuIn0NCnBsb3QoUHVsc2V+SGVpZ2h0LCBkYXRhPVB1bHNlRGF0YSwgeWxhYj0iUmVzdGluZyBwdWxzZSByYXRlIChiZWF0cyBwZXIgbWludXRlKSIsIHhsYWI9IkhlaWdodCAoaW4gY2VudGltZXRlcnMpICIpDQphYmxpbmUoUHVsc2UubG0pDQpgYGANCg0KYGBge3IgUHVsc2VGaXR0ZWRMaW5lMiwgZmlnLmFsdD0iZ2dwbG90IHZlcnNpb24iLCBmaWcuY2FwPSJQbG90IG9mIHJlc3RpbmcgcHVsc2UgcmF0ZSAoYmVhdHMgcGVyIG1pbnV0ZSkgdmVyc3VzIGhlaWdodCAoaW4gY2VudGltZXRlcnMpIHdpdGggZml0dGVkIGxpbmUgYWRkZWQsIGZvciBhIHNhbXBsZSBvZiA1MCBob3NwaXRhbCBwYXRpZW50cy4gU291cmNlOiBBIEhhbmRib29rIG9mIFNtYWxsIERhdGEgU2V0cyBieSBIYW5kLCBEYWx5LCBMdW5uLCBNY0NvbndheSBhbmQgT3N0cm93c2tpLiJ9DQpsaWJyYXJ5KGdncGxvdDIpDQpQdWxzZURhdGEgfD4gZ2dwbG90KG1hcHBpbmcgPSBhZXMoeD1IZWlnaHQsIHk9UHVsc2UpKSArIGdlb21fcG9pbnQoKSArDQpnZW9tX3Ntb290aChtZXRob2QgPSAibG0iLCBzZSA9IEZBTFNFKSArDQp5bGFiKCJSZXN0aW5nIHB1bHNlIHJhdGUgKGJlYXRzIHBlciBtaW51dGUpIikgKyAgeGxhYigiSGVpZ2h0IChpbiBjZW50aW1ldGVycykgIikgIA0KYGBgDQoNCg0KDQoNCiMjIyBDb21tZW50cyBvbiB0aGUgUiBjb2RlIGZvciB0aGUgUHVsc2UgRGF0YQ0KDQpUaGUgYHJlYWQuY3N2KClgIGNvbW1hbmQgcmVhZHMgKG11bHRpdmFyaWF0ZSkgZGF0YSBmcm9tIGEgdGV4dCBmaWxlIHdpdGggY29tbWFzIHNlcGFyYXRpbmcgdGhlIHZhbHVlcy4NCg0KVGhlIG9wdGlvbiBgaGVhZGVyPVRSVUVgIGluZGljYXRlcyB0aGF0IHRoZSBmaXJzdCBsaW5lIG9mIHRoZSB0ZXh0IGZpbGUgY29udGFpbnMgY29sdW1uIGhlYWRpbmdzIChub3QgZGF0YSBwb2ludHMpLg0KDQpUaGUgYHN1bW1hcnkoKWAgY29tbWFuZCB3aWxsIHR5cGljYWxseSBnaXZlIHNlbnNpYmxlIG91dHB1dCB3aGVuIGFwcGxpZWQgdG8gYSB2YXJpZXR5IG9mIHR5cGVzIG9mIG9iamVjdC4gV2Ugc2F3IGl0IHVzZWQgdG8gc3VtbWFyaXNlIHRoZSByYXcgZGF0YSwgYW5kIHRvIHN1bW1hcmlzZSB0aGUgc2ltcGxlIHJlZ3Jlc3Npb24gbW9kZWwgd2UgZml0dGVkLiANCg0KVGhlIGZvcm11bGEgYFB1bHNlIH4gMSArIEhlaWdodGAgc3BlY2lmaWVzIHRoYXQgYFB1bHNlYCBpcyB0aGUgcmVzcG9uc2UgYW5kIGBIZWlnaHRgIHRoZSBleHBsYW5hdG9yeSAgICB2YXJpYWJsZSBpbiB0aGUgbW9kZWwuIFRoZSBgMWAgZXhwbGljaXRseSBpbmRpY2F0ZXMgaW5jbHVzaW9uIG9mIGFuIGludGVyY2VwdC4gKE1vc3QgcGVvcGxlIGxlYXZlIGl0IG91dC4pDQoNClRoZSBgZ2VvbV9zbW9vdGgoKWAgY29tbWFuZCBhZGRzIGEgbGluZSBmaXR0ZWQgdXNpbmcgdGhlIHNwZWNpZmllZCBgbWV0aG9kYCBmb3IgZml0dGluZyB0aGUgbW9kZWwuIE5vdGUgdGhhdCB3ZSB1c2VkIGAibG0iYCBoZXJlIHNvIHRoYXQgdGhpcyBsaW5lIG1hdGNoZXMgdGhlIHJlc3VsdHMgZnJvbSBmaXR0aW5nIGEgbW9kZWwgdXNpbmcgYGxtKClgIHRvIHRoaXMgZGF0YS4NCg0KIyMjIEludGVycHJldGF0aW9uIG9mIE1vZGVsIGZvciBQdWxzZSBEYXRhDQoNClRoZSBzdW1tYXJ5IHRhYmxlIG9mIGNvZWZmaWNpZW50cyBjb250YWlucyBtdWNoIHJlbGV2YW50DQogICAgaW5mb3JtYXRpb24uIFdlIGNhbiBnZXQgdGhpcyBwYXJ0IG9mIHRoZSBzdW1tYXJ5IHVzaW5nIHRoZSBgdGlkeSgpYCBjb21tYW5kLg0KDQoNCmBgYHtyIHRpZHlQdWxzZS5sbX0NCnRpZHkoUHVsc2UubG0pIHw+IGthYmxlKCkNCmBgYA0KDQpUaGUgYGthYmxlKClgIGNvbW1hbmQgaGVyZSBtYWtlcyB0aGUgcHJldHR5IHRhYmxlIGZvciBwcmVzZW50YXRpb247IHlvdSB3b3VsZG4ndCB1c2UgaXQgaW4gYW4gaW50ZXJhY3RpdmUgc2l0dWF0aW9uLg0KDQoNCg0KVGhlIGZpdHRlZCBtb2RlbCAoaS5lLiBtb2RlbCB3aXRoIHBhcmFtZXRlcnMgcmVwbGFjZWQgYnkgZXN0aW1hdGVzKSBpcyAkJEVbXG1ib3h7UHVsc2V9XSA9IGByIHJvdW5kKGNvZWYoUHVsc2UubG0pWzFdLDIpYCArIGByIHJvdW5kKGNvZWYoUHVsc2UubG0pWzJdLDMpYCB+XG1ib3h7SGVpZ2h0fSQkDQoNCiMjIyBSZXNpZHVhbCBzdGFuZGFyZCBlcnJvcg0KDQpUaGUgZXJyb3Igc3RhbmRhcmQgZGV2aWF0aW9uLCAkXHNpZ21hJCwgaXMgZXN0aW1hdGVkIGJ5ICAqcz1gciByb3VuZChzdW1tYXJ5KFB1bHNlLmxtKSRzaWdtYSwgMylgKiAodGhlICoqcmVzaWR1YWwgc3RhbmRhcmQgZXJyb3IqKikgYWNjb3JkaW5nIHRvIHRoZSBvdXRwdXQuDQoNClRoZSAqcyogaXMgYSB2ZXJ5IHVzZWZ1bCBzdGF0aXN0aWMsIGFuZCBoaWdobHkgaW50ZXJwcmV0YWJsZS4gIFRoaXMgaXMgYmVjYXVzZSAgKmFib3V0KiA5NSUgb2YgcmVzaWR1YWxzIHdpbGwgYmUgd2l0aGluIGFib3V0IChmb3JtdWxhIGV4cGxhaW5lZCBsYXRlcikgJFxwbSB0X3sobi0yKX0oMC4wMjUpIH5+cyAgXGFwcHJveCBccG0gMn5zJC4gICAgDQoNClRoaXMgY29udmVydHMgdG8gbWVhbmluZyB0aGF0IGFib3V0IDk1JSBvZiBkYXRhIHZhbHVlcyAkeV9pJCB3aWxsIGJlIHdpdGhpbiAkMn4gcyQgYWJvdmUgb3IgYmVsb3cgdGhlIGxpbmUuIA0KDQpJbiB0aGUgcHJlc2VudCBjYXNlICRzXGFwcHJveCA4LjgkIHNvIG1vc3QgcHVsc2VzICB3aWxsIGJlIHdpdGhpbiBhYm91dCAkXHBtIDE3LjYkIGJlYXRzIGFib3ZlIG9yIGJlbG93IHRoZSBwcmVkaWN0ZWQgdmFsdWUgb24gdGhlIGxpbmUuICAgVGhpcyBzaW1wbGUgbWVudGFsIGNhbGN1bGF0aW9uIGdpdmVzIHVzIGFuIGlkZWEgaG93IGFjY3VyYXRlIGFuZCB1c2VmdWwgb3VyIHJlZ3Jlc3Npb24gaXMgbGlrZWx5IHRvIGJlIGZvciBwcmVkaWN0aW9uICAoU28sIGluIGZhY3QsIG5vdCB2ZXJ5IGFjY3VyYXRlIG9yIHVzZWZ1bCBhdCBhbGwsIGluIHRoaXMgZXhhbXBsZSEpDQoNCkl0IGFsc28gbWVhbnMgd2UgY2FuIGRvIGEgc2ltcGxlIGdyYXBoIHRvIGhlbHAgdXMgZmluZCAqeSogdmFsdWVzIHRoYXQgYXJlIHVudXN1YWwgKCoqKldlIHdpbGwgcmVmaW5lIHRoaXMgZ3JhcGggbGF0ZXIqKiopDQoNCg0KYGBge3IgRmluZGluZ1R5cGljYWxWYWx1ZXMsIGVjaG89RkFMU0UsIHdhcm5pbmc9RkFMU0UsIGZpZy5jYXA9IkEgZml0dGVkIGxpbmUgYWRkZWQgdG8gYSBzY2F0dGVycGxvdCBvZiByZXN0aW5nIHB1bHNlIHJhdGUgYWdhaW5zdCBoZWlnaHQuIFRoZSBncmV5IGFyZWEgc2hvd3MgYSBwbGF1c2libGUgcmVnaW9uIHdpdGhpbiB3aGljaCA5NSUgb2Ygb2JzZXJ2YXRpb25zIHNob3VsZCBsaWUuIn0NClB1bHNlRGF0YSB8PiBnZ3Bsb3QobWFwcGluZyA9IGFlcyh4PUhlaWdodCwgeT1QdWxzZSkpICsgZ2VvbV9wb2ludCgpICsNCmdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UpICsNCmdlb21fcmliYm9uKGFlcyh5bWF4PVB1bHNlLmxtJGZpdHRlZC52YWx1ZSArIDE3LjYsIHltaW49UHVsc2UubG0kZml0dGVkLnZhbHVlLSAxNy42KSwgY29sPSJsaWdodGJsdWUiLCBhbHBoYT0wLjEpICsNCnlsYWIoIlJlc3RpbmcgcHVsc2UgcmF0ZSAoYmVhdHMgcGVyIG1pbnV0ZSkiKSArIHhsYWIoIkhlaWdodCAoaW4gY2VudGltZXRlcnMpIikgIA0KYGBgDQoNClRoZSBncmFwaCBpbGx1c3RyYXRlcyB0aGF0IHRoZSBtb2RlbCBpcyByZWFsbHkgcHJldHR5IHBvb3IsIGFuZCBhbHNvIHRoYXQgdGhlcmUgYXJlIHR3byBpbmRpdmlkdWFscyB3aG9zZSBwdWxzZSBpcyBtdWNoIGhpZ2hlciB0aGFuIGlzIHR5cGljYWwuIA0KDQoNCiMjIyAqUl4yXiogZm9yIHRoZSBQdWxzZSBEYXRhDQoNClRoZSAobXVsdGlwbGUpIFItc3F1YXJlZCBzdGF0aXN0aWMsICpSXjJeKiwgaXMgdGhlIHNxdWFyZQ0Kb2YgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlIG9ic2VydmVkIGFuZCBmaXR0ZWQgcmVzcG9uc2VzLg0KDQpJdCBhbHNvIGhhcyBhbm90aGVyIG5hbWUsICoqdGhlIGNvZWZmaWNpZW50IG9mIGRldGVybWluYXRpb24qKi4NCg0KSXQgY2FuIGJlIGludGVycHJldGVkIGFzIHRoZSBwcm9wb3J0aW9uIG9mIHZhcmlhdGlvbiBpbiB0aGUgcmVzcG9uc2UgdGhhdCBpcyBleHBsYWluZWQgYnkgdGhlIHByZWRpY3RvciBpbiB0aGUgbW9kZWwuIChJbiBvdGhlciB3b3JkcywgaG93IG11Y2ggdGhlICp5KiB2YWx1ZXMgYXJlIGRldGVybWluZWQgKGZpeGVkKSBieSB0aGUgcmVncmVzc2lvbiBtb2RlbCkNCg0KSGVuY2UgYEhlaWdodGAgZXhwbGFpbnMganVzdCAqUl4yXiogPSBgciByb3VuZChzdW1tYXJ5KFB1bHNlLmxtKSRyLnNxdWFyZWQgKjEwMCwxKWAlIG9mIHRoZSB2YXJpYXRpb24gaW4gdGhlICAgIHJlc3BvbnNlIGFjY29yZGluZyB0byB0aGUgZml0dGVkIG1vZGVsLg0KDQojIyMjIFIgaGludDoNCg0KWW91IGNhbiBmaW5kIHdoYXQgZGlmZmVyZW50IHBhcnRzIG9mIHRoZSBvdXRwdXQgYXJlIGNhbGxlZCwgYnkgYXBwbHlpbmcgdGhlIGBuYW1lcygpYCBmdW5jdGlvbiB0byB0aGUgYHN1bW1hcnkoKWAgb2YgYSBsaW5lYXIgbW9kZWwgeW91J3ZlIGZpdHRlZC4gDQoNCmBgYHtyIGV4dHJhY3RpbmdPdXRwdXR9DQpuYW1lcyhzdW1tYXJ5KFB1bHNlLmxtKSkNCnN1bW1hcnkoUHVsc2UubG0pJHIuc3F1YXJlZA0Kc3VtbWFyeShQdWxzZS5sbSkkc2lnbWENCmBgYA0KDQpUaGVzZSAibmFtZXMiIGRvbid0IGNoYW5nZSBmcm9tIG1vZGVsIHRvIG1vZGVsLg0KDQojIyMgVGhlIGVzdGltYXRlZCBzbG9wZQ0KDQoNClRoZSBzbG9wZSBlc3RpbWF0ZSwgJFxoYXR7XGJldGFfMX0gPSBgciBjb2VmKFB1bHNlLmxtKVsyXWAkIGhhcyBhc3NvY2lhdGVkIHN0YW5kYXJkDQogICAgZXJyb3IgJFNFKFxoYXQgXGJldGFfMSkgPSBgciBzdW1tYXJ5KFB1bHNlLmxtKSRjb2VmZmljaWVudHNbIkhlaWdodCIsICJTdGQuIEVycm9yIl1gLiQNCg0KRm9yIHRlc3RpbmcgKkh+MH4qOiAkXGJldGFfMSA9IDAkIHZlcnN1cyAqSH4xfio6ICRcYmV0YV8xIFxuZSAwJCwgdGhlDQogICAgdC10ZXN0IHN0YXRpc3RpYyBpcw0KDQoNCiQkdCA9IFxoYXQgXGJldGFfMSAvIFNFKFxoYXQgXGJldGFfMSkgPSAwLjIxMC8wLjEzNSA9IGByIHN1bW1hcnkoUHVsc2UubG0pJGNvZWZmaWNpZW50c1siSGVpZ2h0IiwgInQgdmFsdWUiXWAkJA0KDQpUaGUgY29ycmVzcG9uZGluZyAqUCotdmFsdWUgaXMgKlA9YHIgc3VtbWFyeShQdWxzZS5sbSkkY29lZmZpY2llbnRzWyJIZWlnaHQiLCAiUHIoPnx0fCkiXWAqLCBqdXN0IGxpa2Ugd2hlbiB3ZSBkaWQgdGhlDQogICAgYW5hbHlzaXMgYnkgaGFuZC4NCg0KV2UgY29uY2x1ZGUgdGhhdCB0aGUgZGF0YSBwcm92aWRlIG5vIGV2aWRlbmNlIG9mIGEgKGxpbmVhcikgcmVsYXRpb25zaGlwIGJldHdlZW4gcmVzdGluZyBwdWxzZSByYXRlIGFuZCBoZWlnaHQuDQoNCkEgbmljZSBwcmVzZW50YXRpb24gb2YgdGhpcyBjYW4gYmUgb2J0YWluZWQgdXNpbmc6DQoNCmBgYHtyIGdldENvZWZmc1RhYmxlfQ0KUHVsc2UubG0gfD4gdGlkeSgpIHw+IGthYmxlKCkNCmBgYA0KDQoNCiMjIENvbmZpZGVuY2UgaW50ZXJ2YWxzIGZvciBwYXJhbWV0ZXJzDQoNCkdlbmVyYXRpbmcgdGhlIGNvbmZpZGVuY2UgaW50ZXJ2YWwgZm9yIHRoZSBwYXJhbWV0ZXJzIG9mIG91ciByZWdyZXNzaW9uIG1vZGVsIGlzIHByZXR0eSBzaW1wbGUgYW5kIGlzIGFjaGlldmVkIHVzaW5nIHRoZSBgY29uZmludCgpYCBjb21tYW5kLg0KDQoNCg0KYGBge3IgQ29uZkludH0NCmNvbmZpbnQoUHVsc2UubG0pDQpgYGANCg0KDQpUaGlzIGJhc2ljIGltcGxlbWVudGF0aW9uIG9mIHRoZSBgY29uZmludCgpYCBjb21tYW5kIGhhcyBmb3VuZCA5NSUgY29uZmlkZW5jZSBpbnRlcnZhbHMgZm9yIGJvdGggdGhlIHNsb3BlIGFuZCBpbnRlcmNlcHQgcGFyYW1ldGVycy4gDQoNCkluIHRoaXMgY29udGV4dCwgdGhlIGludGVydmFsIGZvciAkXGJldGFfMCQgaXMgaXJyZWxldmFudCwgYmVjYXVzZSB0aGVyZSBpcyBubyBwb2ludCB0cnlpbmcgdG8gZmluZCBhIENJIGZvciAqeSA9IFB1bHNlKiB3aGVuIGEgcGVyc29uJ3MgKnggPSBIZWlnaHQqIGlzIHplcm8uDQoNCg0KQXNraW5nIGZvciBvbmx5IHRoZSBzbG9wZSdzIGludGVydmFsLCBhbmQgYWRqdXN0aW5nIHRoZSBsZXZlbCBvZiBjb25maWRlbmNlIGFyZSBmYWlybHkgZWFzaWx5IGFjaGlldmVkLg0KDQpFdmVuIHNvLCB0aGUgaW50ZXJ2YWwgZm9yIHRoZSBzbG9wZSBwYXJhbWV0ZXIgaW5jbHVkZXMgemVyby4gTWFueSBwZW9wbGUgcHJlZmVyIHRvIHVzZSBjb25maWRlbmNlIGludGVydmFscyBvdmVyIGh5cG90aGVzaXMgdGVzdHMsIGVzcGVjaWFsbHkgd2hlbiBjb21tdW5pY2F0aW5nIHRoZWlyIGZpbmRpbmdzLg0KDQoNCg0KDQojIyBGaW5hbCBDb21tZW50cyBvbiB0aGUgTW9kZWwgZm9yIFB1bHNlIERhdGENCg0KT3VyIGNvbmNsdXNpb24gZG9lcyBub3QgcHJvdmlkZSBldmlkZW5jZSB0aGF0IHB1bHNlIGlzIG5vdCByZWxhdGVkDQogICAgdG8gaGVpZ2h0IC0gdGhhdCdzIG5vdCB0aGUgd2F5IGh5cG90aGVzaXMgdGVzdGluZyB3b3Jrcy4NCg0KT3VyIGNvbmNsdXNpb25zIGFyZSBiYXNlZCBvbiB0aGUgYXNzdW1wdGlvbiB0aGF0IHRoZSBtb2RlbCBpcw0KICAgIGFwcHJvcHJpYXRlLg0KDQpGYWlsdXJlIG9mIHRoZSBtb2RlbCBhc3N1bXB0aW9ucyBBMS1BNCAoZGVzY3JpYmVkIHByZXZpb3VzbHkpIHdvdWxkDQogICAgaW5kaWNhdGUgYW4gaW5hcHByb3ByaWF0ZSBtb2RlbC4NCg0KV2UgbmVlZCBkaWFnbm9zdGljIHRvb2xzIHRvIGFzc2VzcyBtb2RlbCBhZGVxdWFjeS4NCg0K
Comments on the R code for the Pulse Data
The
read.csv()
command reads (multivariate) data from a text file with commas separating the values.The option
header=TRUE
indicates that the first line of the text file contains column headings (not data points).The
summary()
command will typically give sensible output when applied to a variety of types of object. We saw it used to summarise the raw data, and to summarise the simple regression model we fitted.The formula
Pulse ~ 1 + Height
specifies thatPulse
is the response andHeight
the explanatory variable in the model. The1
explicitly indicates inclusion of an intercept. (Most people leave it out.)The
geom_smooth()
command adds a line fitted using the specifiedmethod
for fitting the model. Note that we used"lm"
here so that this line matches the results from fitting a model usinglm()
to this data.