In this exercise you will:
predict patient survival in a Drug trial based on age and type of drug. Obtain the data using:
'data.frame': 48 obs. of 4 variables:
$ StudyTime: int 1 1 2 3 4 4 5 5 8 8 ...
$ Died : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 1 ...
$ Drug : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 61 65 59 52 56 67 63 58 56 58 ...
Fit a GLM and compare the use of link=log to link=identity.
Starting with the identity link and a full model:
Cancer.glm1 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="identity"))
anova(Cancer.glm1, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: identity
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 3.603e-07 ***
Age 1 3.8141 44 15.094 0.0003951 ***
Drug:Age 2 0.1470 42 14.947 0.7851410
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and removing unnecessary terms gives us the main effects only model:
Cancer.glm1a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="identity"))
summary(Cancer.glm1a)
Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "identity"),
data = Cancer)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.9662 8.1924 4.756 2.15e-05 ***
Drug2 6.4730 2.1350 3.032 0.004064 **
Drug3 16.2034 3.8741 4.182 0.000135 ***
Age -0.5370 0.1334 -4.025 0.000221 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.2951212)
Null deviance: 27.922 on 47 degrees of freedom
Residual deviance: 15.094 on 44 degrees of freedom
AIC: 328.39
Number of Fisher Scoring iterations: 7
Turning to the log link model:
Cancer.glm2 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: log
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 8.839e-07 ***
Age 1 2.7332 44 16.175 0.003645 **
Drug:Age 2 1.1493 42 15.025 0.169103
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and removing unnecessary terms gives us the main effects only model:
Cancer.glm2a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2a, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: log
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 7.009e-07 ***
Age 1 2.7332 44 16.175 0.003373 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "log"),
data = Cancer)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.64604 0.83535 5.562 1.48e-06 ***
Drug2 0.57437 0.19695 2.916 0.00556 **
Drug3 1.05211 0.19773 5.321 3.32e-06 ***
Age -0.04478 0.01473 -3.039 0.00398 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.3180527)
Null deviance: 27.922 on 47 degrees of freedom
Residual deviance: 16.175 on 44 degrees of freedom
AIC: 331.89
Number of Fisher Scoring iterations: 5
The AIC values could be compared, but they are practically the same. You really ought to choose the link based on the relationship between the numeric predictor and the response.