Practical Computing Exercise for Week 7: — Solutions

Aims of this practical exercise

In this exercise you will:

  • Compare links for the Gamma family.

predict patient survival time in a Drug trial based on age and type of drug. Obtain the data using:

    data(Cancer, package="ELMER")
str(Cancer)
'data.frame':   48 obs. of  4 variables:
 $ StudyTime: int  1 1 2 3 4 4 5 5 8 8 ...
 $ Died     : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 1 ...
 $ Drug     : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age      : int  61 65 59 52 56 67 63 58 56 58 ...

N.B. The StudyTime is the response for this analysis. The indicator of death is used if we advance this analysis to use “survival analysis” techniques, which are beyond the scope of this course.

Fit a GLM and compare the use of link=log to link=identity.

The solution

Starting with the identity link and a full model:

Cancer.glm1 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="identity"))
anova(Cancer.glm1, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: identity

Response: StudyTime

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        47     27.922              
Drug      2   9.0142        45     18.908 3.603e-07 ***
Age       1   3.8141        44     15.094 0.0003951 ***
Drug:Age  2   0.1470        42     14.947 0.7851410    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and removing unnecessary terms gives us the main effects only model:

Cancer.glm1a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="identity"))
summary(Cancer.glm1a)

Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "identity"), 
    data = Cancer)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  38.9662     8.1924   4.756 2.15e-05 ***
Drug2         6.4730     2.1350   3.032 0.004064 ** 
Drug3        16.2034     3.8741   4.182 0.000135 ***
Age          -0.5370     0.1334  -4.025 0.000221 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.2951212)

    Null deviance: 27.922  on 47  degrees of freedom
Residual deviance: 15.094  on 44  degrees of freedom
AIC: 328.39

Number of Fisher Scoring iterations: 7

Turning to the log link model:

Cancer.glm2 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: log

Response: StudyTime

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        47     27.922              
Drug      2   9.0142        45     18.908 8.839e-07 ***
Age       1   2.7332        44     16.175  0.003645 ** 
Drug:Age  2   1.1493        42     15.025  0.169103    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and removing unnecessary terms gives us the main effects only model:

Cancer.glm2a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2a, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: log

Response: StudyTime

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                    47     27.922              
Drug  2   9.0142        45     18.908 7.009e-07 ***
Age   1   2.7332        44     16.175  0.003373 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Cancer.glm2a)

Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "log"), 
    data = Cancer)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.64604    0.83535   5.562 1.48e-06 ***
Drug2        0.57437    0.19695   2.916  0.00556 ** 
Drug3        1.05211    0.19773   5.321 3.32e-06 ***
Age         -0.04478    0.01473  -3.039  0.00398 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.3180527)

    Null deviance: 27.922  on 47  degrees of freedom
Residual deviance: 16.175  on 44  degrees of freedom
AIC: 331.89

Number of Fisher Scoring iterations: 5

The AIC values could be compared, but they are practically the same. You really ought to choose the link based on the relationship between the numeric predictor and the response.