In this exercise you will:
You need to have installed R, RStudio, and the necessary packages for
the course, including the ELMER
package. See how to
get set up for this course
The quine
data is in the MASS
package. look
at its help page for a description using ?quine
.
data(quine, package="MASS")
str(quine)
'data.frame': 146 obs. of 5 variables:
$ Eth : Factor w/ 2 levels "A","N": 1 1 1 1 1 1 1 1 1 1 ...
$ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : Factor w/ 4 levels "F0","F1","F2",..: 1 1 1 1 1 1 1 1 2 2 ...
$ Lrn : Factor w/ 2 levels "AL","SL": 2 2 2 1 1 1 1 1 2 2 ...
$ Days: int 2 11 14 5 5 13 20 22 6 6 ...
There are many different approaches for finding a transformation. One
major problem with the log transformation is that it cannot handle
response values of zero. A common tweak is to add a small increment to
the zero values in the data; another is to add a constant to all
response values. The logtrans()
function in the
MASS
package will help find a suitable \(\alpha\) for the expression \(y^\prime=\log(y+\alpha)\) for the
transformed response variable. We will investigate the benefits of this
transformation using the example for this function.
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
example(logtrans)
Here is the text from the help page for your convenience…
logtrans(Days ~ Age*Sex*Eth*Lrn, data = quine,
alpha = seq(0.75, 6.5, len=20))
Q: Confirm that this transformation is sensible under the Box-Cox paradigm. That is, make a transformed response variable (\(y+\alpha\)) using the preferred \(\alpha\) and check that the Box-Cox methodology does suggest the log transformation is appropriate.
A: It looks like setting alpha=2.5 is about good enough. Make the new variable as part of the model fitting (as below) or actually make a new variable if you prefer.
boxcox((Days+2.5) ~ Age*Sex*Eth*Lrn, data = quine)
Box-Cox transformation evaluation for the multiplicative model fitted to the transformed response
A: The confidence interval includes zero so the log is the best option. N.B. the interval does not include 1 (no transformation) nor the commonly used square root transformation.
Q: The model used in this example is multiplicative. Determine if the transformation suggested is appropriate if an additive model is to be used instead. That is, remove all interactions and check that the selection of \(\alpha\) remains appropriate.
logtrans(Days ~ Age+Sex+Eth+Lrn, data = quine,
alpha = seq(0.75, 6.5, len=20))
A: The confidence interval is wide enough to include the value 2.5 from above, so this becomes a reasonable option for comparing the multiplicative and additive models.