Practical Computing Exercise for Week 3 :The quine exercise: Absenteeism from School in Rural New South Wales Solutions

Aims of this practical exercise

In this exercise you will:

  • get some practice using R markdown
  • fit a variety of models to the quine data involving a variety of transformations.

Before you undertake this exercise…

You need to have installed R, RStudio, and the necessary packages for the course, including the ELMER package. See how to get set up for this course

Get the data

The quine data is in the MASS package. look at its help page for a description using ?quine.

data(quine, package="MASS")
str(quine)
'data.frame':   146 obs. of  5 variables:
 $ Eth : Factor w/ 2 levels "A","N": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
 $ Age : Factor w/ 4 levels "F0","F1","F2",..: 1 1 1 1 1 1 1 1 2 2 ...
 $ Lrn : Factor w/ 2 levels "AL","SL": 2 2 2 1 1 1 1 1 2 2 ...
 $ Days: int  2 11 14 5 5 13 20 22 6 6 ...

The problem

There are many different approaches for finding a transformation. One major problem with the log transformation is that it cannot handle response values of zero. A common tweak is to add a small increment to the zero values in the data; another is to add a constant to all response values. The logtrans() function in the MASS package will help find a suitable \(\alpha\) for the expression \(y^\prime=\log(y+\alpha)\) for the transformed response variable. We will investigate the benefits of this transformation using the example for this function.

    library(MASS)

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
    example(logtrans)

Here is the text from the help page for your convenience…

logtrans(Days ~ Age*Sex*Eth*Lrn, data = quine,
         alpha = seq(0.75, 6.5, len=20))

The question

Q: Confirm that this transformation is sensible under the Box-Cox paradigm. That is, make a transformed response variable (\(y+\alpha\)) using the preferred \(\alpha\) and check that the Box-Cox methodology does suggest the log transformation is appropriate.

A: It looks like setting alpha=2.5 is about good enough. Make the new variable as part of the model fitting (as below) or actually make a new variable if you prefer.

boxcox((Days+2.5) ~ Age*Sex*Eth*Lrn, data = quine)
Box-Cox transformation evaluation for the multiplicative model fitted to the transformed response

Box-Cox transformation evaluation for the multiplicative model fitted to the transformed response

A: The confidence interval includes zero so the log is the best option. N.B. the interval does not include 1 (no transformation) nor the commonly used square root transformation.

Q: The model used in this example is multiplicative. Determine if the transformation suggested is appropriate if an additive model is to be used instead. That is, remove all interactions and check that the selection of \(\alpha\) remains appropriate.

logtrans(Days ~ Age+Sex+Eth+Lrn, data = quine,
         alpha = seq(0.75, 6.5, len=20))

A: The confidence interval is wide enough to include the value 2.5 from above, so this becomes a reasonable option for comparing the multiplicative and additive models.