In this exercise you will:
Investigate the car insurance claims data, used by McCullagh & Nelder (1989). Obtain the data using:
Your task is to model the average claims (AveClaim
) for
damage to an owner’s car on the basis of the policy holder’s age group
(AgeGroup
1-8), the vehicle age group
(VehicleAge
1-4), and the car group (CarGroup
1-4). Be sure you check if any of the two-way interactions are
significant.
Hint use glm with family=Gamma() and link="inverse"
First, check the variables are stored as factors:
'data.frame': 128 obs. of 8 variables:
$ AgeGroup : Factor w/ 8 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ CarGroup : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
$ VehicleAge: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
$ AveClaim : int 289 282 133 160 372 249 288 11 189 288 ...
$ Freq : int 8 8 4 1 10 28 1 1 9 13 ...
$ pacg : int 1 1 1 1 2 2 2 2 3 3 ...
$ pava : int 1 2 3 4 1 2 3 4 1 2 ...
$ cgva : int 1 2 3 4 2 4 6 8 3 6 ...
It seems the variables of interest in this exercise are factors already.
Claims.glm1 = glm(AveClaim~AgeGroup*CarGroup*VehicleAge-AgeGroup:CarGroup:VehicleAge, data=Claims, family=Gamma(link="inverse"))
anova(Claims.glm1, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: inverse
Response: AveClaim
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 122 27.8394
AgeGroup 7 2.5991 115 25.2403 1.320e-05 ***
CarGroup 3 5.5329 112 19.7074 6.915e-16 ***
VehicleAge 3 8.1945 109 11.5130 < 2.2e-16 ***
AgeGroup:CarGroup 21 1.0994 88 10.4135 0.8404234
AgeGroup:VehicleAge 21 3.8879 67 6.5257 0.0002051 ***
CarGroup:VehicleAge 9 0.6067 58 5.9190 0.5260723
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Only the age variables (humans and vehicles) show a significant interaction. The model can be reduced to:
Claims.glm1a = glm(AveClaim ~ CarGroup + AgeGroup*VehicleAge, data=Claims, family=Gamma(link="inverse"))
anova(Claims.glm1a, test="Chisq")
Analysis of Deviance Table
Model: Gamma, link: inverse
Response: AveClaim
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 122 27.839
CarGroup 3 5.1916 119 22.648 < 2.2e-16 ***
AgeGroup 7 2.9403 112 19.707 2.479e-07 ***
VehicleAge 3 8.1945 109 11.513 < 2.2e-16 ***
AgeGroup:VehicleAge 21 3.9020 88 7.611 2.597e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
glm(formula = AveClaim ~ CarGroup + AgeGroup * VehicleAge, family = Gamma(link = "inverse"),
data = Claims)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.340e-03 3.790e-04 8.814 9.91e-14 ***
CarGroup2 -2.753e-04 3.169e-04 -0.869 0.387252
CarGroup3 -5.905e-04 3.065e-04 -1.927 0.057260 .
CarGroup4 -1.737e-03 2.738e-04 -6.342 9.48e-09 ***
AgeGroup2 3.526e-04 4.716e-04 0.748 0.456651
AgeGroup3 6.446e-04 5.042e-04 1.278 0.204512
AgeGroup4 7.053e-04 5.111e-04 1.380 0.171103
AgeGroup5 1.870e-03 6.472e-04 2.889 0.004866 **
AgeGroup6 1.090e-03 5.553e-04 1.963 0.052748 .
AgeGroup7 1.065e-03 5.524e-04 1.928 0.057058 .
AgeGroup8 9.561e-04 5.398e-04 1.771 0.079991 .
VehicleAge2 -7.518e-05 4.254e-04 -0.177 0.860110
VehicleAge3 1.960e-03 8.152e-04 2.405 0.018285 *
VehicleAge4 8.495e-03 2.173e-03 3.910 0.000181 ***
AgeGroup2:VehicleAge2 8.066e-04 7.231e-04 1.116 0.267661
AgeGroup3:VehicleAge2 4.926e-04 7.427e-04 0.663 0.508900
AgeGroup4:VehicleAge2 4.381e-04 7.479e-04 0.586 0.559515
AgeGroup5:VehicleAge2 7.037e-05 9.103e-04 0.077 0.938557
AgeGroup6:VehicleAge2 5.343e-04 8.195e-04 0.652 0.516136
AgeGroup7:VehicleAge2 7.177e-04 8.316e-04 0.863 0.390448
AgeGroup8:VehicleAge2 6.604e-04 8.084e-04 0.817 0.416215
AgeGroup2:VehicleAge3 -7.669e-05 1.083e-03 -0.071 0.943683
AgeGroup3:VehicleAge3 6.353e-04 1.178e-03 0.539 0.591068
AgeGroup4:VehicleAge3 8.114e-04 1.201e-03 0.675 0.501176
AgeGroup5:VehicleAge3 -8.548e-04 1.226e-03 -0.697 0.487441
AgeGroup6:VehicleAge3 -7.627e-04 1.126e-03 -0.677 0.500009
AgeGroup7:VehicleAge3 -9.432e-04 1.110e-03 -0.850 0.397645
AgeGroup8:VehicleAge3 -2.783e-04 1.145e-03 -0.243 0.808596
AgeGroup2:VehicleAge4 -4.484e-03 2.463e-03 -1.821 0.072080 .
AgeGroup3:VehicleAge4 -8.235e-03 2.252e-03 -3.658 0.000433 ***
AgeGroup4:VehicleAge4 -1.483e-03 2.585e-03 -0.574 0.567610
AgeGroup5:VehicleAge4 -5.736e-03 2.521e-03 -2.276 0.025289 *
AgeGroup6:VehicleAge4 -3.781e-03 2.474e-03 -1.528 0.129977
AgeGroup7:VehicleAge4 -4.899e-03 2.411e-03 -2.032 0.045131 *
AgeGroup8:VehicleAge4 -3.335e-03 2.489e-03 -1.340 0.183653
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.06733543)
Null deviance: 27.839 on 122 degrees of freedom
Residual deviance: 7.611 on 88 degrees of freedom
(5 observations deleted due to missingness)
AIC: 1391.2
Number of Fisher Scoring iterations: 6
The results can be tabulated to show car goup averages and averages for the combination of human and vehicle ages. We should check the residual analysis plots first though: