Practical Computing Exercise for Week 3 :The Tree Volumes exercise Solutions

Aims of this practical exercise

In this exercise you will:

  • get some more practice using R markdown
  • fit a variety of models to the Tree Volumes data involving a selection of transformations.

Before you undertake this exercise…

You need to have installed R, RStudio, and the necessary packages for the course, including the ELMER package. See how to get set up for this course

Get the data

data(TreeVols, package="ELMER")
str(TreeVols)
'data.frame':   31 obs. of  3 variables:
 $ Volume: num  0.29 0.29 0.28 0.46 0.53 0.55 0.44 0.63 0.56 0.68 ...
 $ Height: num  21.3 19.8 19.2 21.9 24.6 25.2 20.1 24.3 22.8 24 ...
 $ Girth : int  210 218 223 266 271 274 279 281 284 287 ...

The Exercise

  1. Fit y to x1 and x2.
TreeVols.lm1<-lm(Volume~Height+Girth, data=TreeVols)
    summary(TreeVols.lm1)

Call:
lm(formula = Volume ~ Height + Girth, data = TreeVols)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.188055 -0.080665 -0.007786  0.054791  0.250009 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.6800875  0.2460665  -6.828 2.04e-07 ***
Height       0.0341887  0.0120645   2.834  0.00844 ** 
Girth        0.0051703  0.0002923  17.688  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1115 on 28 degrees of freedom
Multiple R-squared:  0.9458,    Adjusted R-squared:  0.942 
F-statistic: 244.5 on 2 and 28 DF,  p-value: < 2.2e-16
  1. Fit \(\ln y\) to \(\ln x_1\) and \(\ln x_2\). Is this model definitely better than the first model?
TreeVols.lm2 <-lm(log(Volume)~log(Height)+log(Girth), data=TreeVols)
    summary(TreeVols.lm2)

Call:
lm(formula = log(Volume) ~ log(Height) + log(Girth), data = TreeVols)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.169079 -0.051906  0.003382  0.063355  0.124822 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -15.34678    0.55008  -27.90  < 2e-16 ***
log(Height)   1.13587    0.19926    5.70 4.11e-06 ***
log(Girth)    1.98365    0.07247   27.37  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08071 on 28 degrees of freedom
Multiple R-squared:  0.9782,    Adjusted R-squared:  0.9767 
F-statistic: 629.3 on 2 and 28 DF,  p-value: < 2.2e-16

The closeness of the predicted values to the actual observations is high for both models (R2) so we should probably be looking at the residuals (left as an additional exercise) to make a final decision. The AIC for the transformed model is lower.

AIC(TreeVols.lm1)
[1] -43.19093
AIC(TreeVols.lm2)
[1] -63.23121
  1. Fit \(y^*(\lambda)\) to \(x_1\) and \(x_2\). Estimate the value of \(\lambda\) by maximum likelihood. Is this model definitely better than the first model?

Hint: Use the boxcox() function in the MASS package.

library(MASS)
TreeVols.bc = boxcox(Volume ~ Height + Girth, data =TreeVols)

# to get exact value of lambda:
Lambda = TreeVols.bc$"x"[which.max(TreeVols.bc$"y")]
Lambda
[1] 0.3030303
TreeVols.lm3= lm(Volume^Lambda ~ Height + Girth, data =TreeVols)
summary(TreeVols.lm3)

Call:
lm(formula = Volume^Lambda ~ Height + Girth, data = TreeVols)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.044891 -0.015377 -0.001617  0.018610  0.036496 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5.369e-02  5.067e-02   1.060    0.298    
Height      1.359e-02  2.484e-03   5.471 7.67e-06 ***
Girth       1.659e-03  6.019e-05  27.561  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02296 on 28 degrees of freedom
Multiple R-squared:  0.9779,    Adjusted R-squared:  0.9763 
F-statistic: 619.3 on 2 and 28 DF,  p-value: < 2.2e-16
AIC(TreeVols.lm3)
[1] -141.1733

The point here is to be able to find the right value of \(\lambda\). It might be sensible to use a smarter value that is easier to interpret such as 1/3, but I’ve used the optimal value for thoroughness.

  1. Girth is much easier to measure than height. Is there evidence that the height variable is necessary?

From all models fitted here, Height is necessary. We might find otherwise if we put more effort into search for a better model, but given my comments below, I suspect not.

Finally, which model would be expected to fit best, based on a physical model?

If a tree is a cylinder, I’d expect the girth (linearly related to diameter) squared to be multiplied by the height somehow.