In this exercise you will:
You need to have installed R, RStudio, and the necessary packages for
the course, including the ELMER
package. See how to
get set up for this course
data(TreeVols, package="ELMER")
str(TreeVols)
'data.frame': 31 obs. of 3 variables:
$ Volume: num 0.29 0.29 0.28 0.46 0.53 0.55 0.44 0.63 0.56 0.68 ...
$ Height: num 21.3 19.8 19.2 21.9 24.6 25.2 20.1 24.3 22.8 24 ...
$ Girth : int 210 218 223 266 271 274 279 281 284 287 ...
<-lm(Volume~Height+Girth, data=TreeVols)
TreeVols.lm1summary(TreeVols.lm1)
Call:
lm(formula = Volume ~ Height + Girth, data = TreeVols)
Residuals:
Min 1Q Median 3Q Max
-0.188055 -0.080665 -0.007786 0.054791 0.250009
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.6800875 0.2460665 -6.828 2.04e-07 ***
Height 0.0341887 0.0120645 2.834 0.00844 **
Girth 0.0051703 0.0002923 17.688 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1115 on 28 degrees of freedom
Multiple R-squared: 0.9458, Adjusted R-squared: 0.942
F-statistic: 244.5 on 2 and 28 DF, p-value: < 2.2e-16
<-lm(log(Volume)~log(Height)+log(Girth), data=TreeVols)
TreeVols.lm2 summary(TreeVols.lm2)
Call:
lm(formula = log(Volume) ~ log(Height) + log(Girth), data = TreeVols)
Residuals:
Min 1Q Median 3Q Max
-0.169079 -0.051906 0.003382 0.063355 0.124822
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.34678 0.55008 -27.90 < 2e-16 ***
log(Height) 1.13587 0.19926 5.70 4.11e-06 ***
log(Girth) 1.98365 0.07247 27.37 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08071 on 28 degrees of freedom
Multiple R-squared: 0.9782, Adjusted R-squared: 0.9767
F-statistic: 629.3 on 2 and 28 DF, p-value: < 2.2e-16
The closeness of the predicted values to the actual observations is high for both models (R2) so we should probably be looking at the residuals (left as an additional exercise) to make a final decision. The AIC for the transformed model is lower.
AIC(TreeVols.lm1)
[1] -43.19093
AIC(TreeVols.lm2)
[1] -63.23121
Hint: Use the boxcox()
function in the MASS
package.
library(MASS)
= boxcox(Volume ~ Height + Girth, data =TreeVols) TreeVols.bc
# to get exact value of lambda:
= TreeVols.bc$"x"[which.max(TreeVols.bc$"y")]
Lambda Lambda
[1] 0.3030303
= lm(Volume^Lambda ~ Height + Girth, data =TreeVols)
TreeVols.lm3summary(TreeVols.lm3)
Call:
lm(formula = Volume^Lambda ~ Height + Girth, data = TreeVols)
Residuals:
Min 1Q Median 3Q Max
-0.044891 -0.015377 -0.001617 0.018610 0.036496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.369e-02 5.067e-02 1.060 0.298
Height 1.359e-02 2.484e-03 5.471 7.67e-06 ***
Girth 1.659e-03 6.019e-05 27.561 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02296 on 28 degrees of freedom
Multiple R-squared: 0.9779, Adjusted R-squared: 0.9763
F-statistic: 619.3 on 2 and 28 DF, p-value: < 2.2e-16
AIC(TreeVols.lm3)
[1] -141.1733
The point here is to be able to find the right value of \(\lambda\). It might be sensible to use a smarter value that is easier to interpret such as 1/3, but I’ve used the optimal value for thoroughness.
From all models fitted here, Height
is necessary. We
might find otherwise if we put more effort into search for a better
model, but given my comments below, I suspect not.
Finally, which model would be expected to fit best, based on a physical model?
If a tree is a cylinder, I’d expect the girth (linearly related to diameter) squared to be multiplied by the height somehow.