Practical Computing Exercise for Week 7: — Solutions

Aims of this practical exercise

In this exercise you will:

  • Compare links for the Gamma family.

predict patient survival in a Drug trial based on age and type of drug. Obtain the data using:

    data(Cancer, package="ELMER")
str(Cancer)
'data.frame':   48 obs. of  4 variables:
 $ StudyTime: int  1 1 2 3 4 4 5 5 8 8 ...
 $ Died     : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 1 ...
 $ Drug     : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age      : int  61 65 59 52 56 67 63 58 56 58 ...

Fit a GLM and compare the use of link=log to link=identity.

The solution

Starting with the identity link and a full model:

Cancer.glm1 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="identity"))
anova(Cancer.glm1, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: identity

Response: StudyTime

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        47     27.922              
Drug      2   9.0142        45     18.908 3.603e-07 ***
Age       1   3.8141        44     15.094 0.0003951 ***
Drug:Age  2   0.1470        42     14.947 0.7851410    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and removing unnecessary terms gives us the main effects only model:

Cancer.glm1a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="identity"))
summary(Cancer.glm1a)

Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "identity"), 
    data = Cancer)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  38.9662     8.1924   4.756 2.15e-05 ***
Drug2         6.4730     2.1350   3.032 0.004064 ** 
Drug3        16.2034     3.8741   4.182 0.000135 ***
Age          -0.5370     0.1334  -4.025 0.000221 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.2951212)

    Null deviance: 27.922  on 47  degrees of freedom
Residual deviance: 15.094  on 44  degrees of freedom
AIC: 328.39

Number of Fisher Scoring iterations: 7

Turning to the log link model:

Cancer.glm2 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: log

Response: StudyTime

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        47     27.922              
Drug      2   9.0142        45     18.908 8.839e-07 ***
Age       1   2.7332        44     16.175  0.003645 ** 
Drug:Age  2   1.1493        42     15.025  0.169103    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and removing unnecessary terms gives us the main effects only model:

Cancer.glm2a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2a, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: log

Response: StudyTime

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                    47     27.922              
Drug  2   9.0142        45     18.908 7.009e-07 ***
Age   1   2.7332        44     16.175  0.003373 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Cancer.glm2a)

Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "log"), 
    data = Cancer)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.64604    0.83535   5.562 1.48e-06 ***
Drug2        0.57437    0.19695   2.916  0.00556 ** 
Drug3        1.05211    0.19773   5.321 3.32e-06 ***
Age         -0.04478    0.01473  -3.039  0.00398 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.3180527)

    Null deviance: 27.922  on 47  degrees of freedom
Residual deviance: 16.175  on 44  degrees of freedom
AIC: 331.89

Number of Fisher Scoring iterations: 5

The AIC values could be compared, but they are practically the same. You really ought to choose the link based on the relationship between the numeric predictor and the response.