In this exercise you will:
predict patient survival time in a Drug trial based on age and type of drug. Obtain the data using:
'data.frame': 48 obs. of 4 variables:
$ StudyTime: int 1 1 2 3 4 4 5 5 8 8 ...
$ Died : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 1 ...
$ Drug : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 61 65 59 52 56 67 63 58 56 58 ...
N.B. The StudyTime
is the response for this analysis.
The indicator of death is used if we advance this analysis to use
“survival analysis” techniques, which are beyond the scope of this
course.
Fit a GLM and compare the use of link=log to link=identity.
Starting with the identity link and a full model:
Cancer.glm1 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="identity"))
anova(Cancer.glm1, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: identity
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 3.603e-07 ***
Age 1 3.8141 44 15.094 0.0003951 ***
Drug:Age 2 0.1470 42 14.947 0.7851410
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and removing unnecessary terms gives us the main effects only model:
Cancer.glm1a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="identity"))
summary(Cancer.glm1a)
Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "identity"),
data = Cancer)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.9662 8.1924 4.756 2.15e-05 ***
Drug2 6.4730 2.1350 3.032 0.004064 **
Drug3 16.2034 3.8741 4.182 0.000135 ***
Age -0.5370 0.1334 -4.025 0.000221 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.2951212)
Null deviance: 27.922 on 47 degrees of freedom
Residual deviance: 15.094 on 44 degrees of freedom
AIC: 328.39
Number of Fisher Scoring iterations: 7
Turning to the log link model:
Cancer.glm2 = glm(StudyTime~Drug*Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: log
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 8.839e-07 ***
Age 1 2.7332 44 16.175 0.003645 **
Drug:Age 2 1.1493 42 15.025 0.169103
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and removing unnecessary terms gives us the main effects only model:
Cancer.glm2a = glm(StudyTime~Drug+Age, data=Cancer, family=Gamma(link="log"))
anova(Cancer.glm2a, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: log
Response: StudyTime
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 47 27.922
Drug 2 9.0142 45 18.908 7.009e-07 ***
Age 1 2.7332 44 16.175 0.003373 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
glm(formula = StudyTime ~ Drug + Age, family = Gamma(link = "log"),
data = Cancer)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.64604 0.83535 5.562 1.48e-06 ***
Drug2 0.57437 0.19695 2.916 0.00556 **
Drug3 1.05211 0.19773 5.321 3.32e-06 ***
Age -0.04478 0.01473 -3.039 0.00398 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.3180527)
Null deviance: 27.922 on 47 degrees of freedom
Residual deviance: 16.175 on 44 degrees of freedom
AIC: 331.89
Number of Fisher Scoring iterations: 5
The AIC values could be compared, but they are practically the same. You really ought to choose the link based on the relationship between the numeric predictor and the response.