In this exercise you will:
You need to have installed R, RStudio, and the necessary packages for
the course, including the ELMER
package. See how to
get set up for this course
data(TreeDiams, package="ELMER")
str(TreeDiams)
'data.frame': 12 obs. of 2 variables:
$ Diameter: num 0.9 1.2 2.9 3.1 3.3 3.9 4.3 6.2 9.6 12.6 ...
$ Height : num 18 26 32 36 44.5 35.6 40.5 57.5 67.3 84 ...
Fit the models used in the example in Chapter 1 of ELMER.
= lm(Height~Diameter, data=TreeDiams)
TreeDiams.lm1 = lm(Height~log(Diameter), data=TreeDiams)
TreeDiams.lm2 = lm(log(Height)~log(Diameter), data=TreeDiams) TreeDiams.lm3
Just create the graph that shows the fitted models on the original data scale because this is the one that is important.
N.B. You should try to do this using ggplot()
if you
can. To get the curves of your models on your scatter plot, you might
make use of geom_function()
|> ggplot(aes(y=Height, x=Diameter)) + geom_point( ) +
TreeDiams geom_smooth(method="lm", se=FALSE) +
geom_function(fun = function(x) coef(TreeDiams.lm2)[1] + coef(TreeDiams.lm2)[2]*log(x), lty=2) +
geom_function(fun = function(x) exp(coef(TreeDiams.lm3)[1] + coef(TreeDiams.lm3)[2]*log(x)), lty=3)
`geom_smooth()` using formula 'y ~ x'
Use predict()
to find the fitted values from the three
models used so far. Use kable()
to put them into a nice
table.
<- data.frame(Diameter=c(5, 10, 25))
PredData <- predict(TreeDiams.lm1, PredData, se.fit=T)
Fits1<- predict(TreeDiams.lm2, PredData, se.fit=T)
Fits2<- predict(TreeDiams.lm3, PredData, se.fit=T)
Fits3<- cbind(Fits1$fit, Fits1$fit-Fits1$se.fit, Fits1$fit+Fits1$se.fit, Fits2$fit, Fits2$fit-Fits2$se.fit, Fits2$fit+Fits2$se.fit, exp(Fits3$fit), exp(Fits3$fit-Fits3$se.fit), exp(Fits3$fit+Fits3$se.fit))
TreeDiamsTable rownames(TreeDiamsTable) <- c("5 inch", "10 inch", "25 inch")
colnames(TreeDiamsTable) <- rep(c("Mean", "-SE", "+SE"), 3)
|> kable() TreeDiamsTable
Mean | -SE | +SE | Mean | -SE | +SE | Mean | -SE | +SE | |
---|---|---|---|---|---|---|---|---|---|
5 inch | 42.89175 | 39.55850 | 46.22499 | 50.33222 | 48.20356 | 52.46088 | 45.49154 | 43.75472 | 47.29731 |
10 inch | 56.47018 | 53.13448 | 59.80588 | 65.18187 | 62.51958 | 67.84416 | 62.77474 | 59.79172 | 65.90658 |
25 inch | 97.20547 | 88.83945 | 105.57149 | 84.81204 | 80.60945 | 89.01463 | 96.08644 | 88.97853 | 103.76216 |
Calculate a regression of height on diameter and height on
log(Diameter)
omitting the largest tree in the
TreeDiams
data.
<- TreeDiams |> filter(Height<max(Height)) |> mutate(LogDiameter=log(Diameter)) |> glimpse() TreeDiams2
Rows: 11
Columns: 3
$ Diameter <dbl> 0.9, 1.2, 2.9, 3.1, 3.3, 3.9, 4.3, 6.2, 9.6, 12.6, 16.1
$ Height <dbl> 18.0, 26.0, 32.0, 36.0, 44.5, 35.6, 40.5, 57.5, 67.3, 84.0…
$ LogDiameter <dbl> -0.1053605, 0.1823216, 1.0647107, 1.1314021, 1.1939225, 1.…
= lm(Height~Diameter, data=TreeDiams2)
TreeDiams2.lm1 = lm(Height~LogDiameter, data=TreeDiams2) TreeDiams2.lm2
Replot the scatter plots using the reduced data, with the fitted lines added.
|> ggplot(aes(y=Height, x=Diameter)) + geom_point( ) +
TreeDiams2 geom_smooth(method="lm") +
geom_function(fun = function(x) coef(TreeDiams2.lm2)[1] + coef(TreeDiams2.lm2)[2]*log(x), lty=2)
`geom_smooth()` using formula 'y ~ x'
Which of the two regressions would you choose, and why?