Practical Computing Exercise for Week 7: Insurance Claims — Solutions

Aims of this practical exercise

In this exercise you will:

The Exercise

Investigate the car insurance claims data, used by McCullagh & Nelder (1989). Obtain the data using:

    data(Claims, package="ELMER")

Your task is to model the average claims (AveClaim) for damage to an owner’s car on the basis of the policy holder’s age group (AgeGroup 1-8), the vehicle age group (VehicleAge 1-4), and the car group (CarGroup 1-4). Be sure you check if any of the two-way interactions are significant.

Hint use glm with family=Gamma() and link="inverse"

The Solution

First, check the variables are stored as factors:

str(Claims)
'data.frame':   128 obs. of  8 variables:
 $ AgeGroup  : Factor w/ 8 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ CarGroup  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
 $ VehicleAge: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
 $ AveClaim  : int  289 282 133 160 372 249 288 11 189 288 ...
 $ Freq      : int  8 8 4 1 10 28 1 1 9 13 ...
 $ pacg      : int  1 1 1 1 2 2 2 2 3 3 ...
 $ pava      : int  1 2 3 4 1 2 3 4 1 2 ...
 $ cgva      : int  1 2 3 4 2 4 6 8 3 6 ...

It seems the variables of interest in this exercise are factors already.

Claims.glm1 = glm(AveClaim~AgeGroup*CarGroup*VehicleAge-AgeGroup:CarGroup:VehicleAge, data=Claims, family=Gamma(link="inverse"))
anova(Claims.glm1, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: inverse

Response: AveClaim

Terms added sequentially (first to last)

                    Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                  122    27.8394              
AgeGroup             7   2.5991       115    25.2403 1.320e-05 ***
CarGroup             3   5.5329       112    19.7074 6.915e-16 ***
VehicleAge           3   8.1945       109    11.5130 < 2.2e-16 ***
AgeGroup:CarGroup   21   1.0994        88    10.4135 0.8404234    
AgeGroup:VehicleAge 21   3.8879        67     6.5257 0.0002051 ***
CarGroup:VehicleAge  9   0.6067        58     5.9190 0.5260723    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only the age variables (humans and vehicles) show a significant interaction. The model can be reduced to:

Claims.glm1a = glm(AveClaim ~ CarGroup + AgeGroup*VehicleAge, data=Claims, family=Gamma(link="inverse"))
anova(Claims.glm1a, test="Chisq")
Analysis of Deviance Table

Model: Gamma, link: inverse

Response: AveClaim

Terms added sequentially (first to last)

                    Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                  122     27.839              
CarGroup             3   5.1916       119     22.648 < 2.2e-16 ***
AgeGroup             7   2.9403       112     19.707 2.479e-07 ***
VehicleAge           3   8.1945       109     11.513 < 2.2e-16 ***
AgeGroup:VehicleAge 21   3.9020        88      7.611 2.597e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Claims.glm1a)

Call:
glm(formula = AveClaim ~ CarGroup + AgeGroup * VehicleAge, family = Gamma(link = "inverse"), 
    data = Claims)

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            3.340e-03  3.790e-04   8.814 9.91e-14 ***
CarGroup2             -2.753e-04  3.169e-04  -0.869 0.387252    
CarGroup3             -5.905e-04  3.065e-04  -1.927 0.057260 .  
CarGroup4             -1.737e-03  2.738e-04  -6.342 9.48e-09 ***
AgeGroup2              3.526e-04  4.716e-04   0.748 0.456651    
AgeGroup3              6.446e-04  5.042e-04   1.278 0.204512    
AgeGroup4              7.053e-04  5.111e-04   1.380 0.171103    
AgeGroup5              1.870e-03  6.472e-04   2.889 0.004866 ** 
AgeGroup6              1.090e-03  5.553e-04   1.963 0.052748 .  
AgeGroup7              1.065e-03  5.524e-04   1.928 0.057058 .  
AgeGroup8              9.561e-04  5.398e-04   1.771 0.079991 .  
VehicleAge2           -7.518e-05  4.254e-04  -0.177 0.860110    
VehicleAge3            1.960e-03  8.152e-04   2.405 0.018285 *  
VehicleAge4            8.495e-03  2.173e-03   3.910 0.000181 ***
AgeGroup2:VehicleAge2  8.066e-04  7.231e-04   1.116 0.267661    
AgeGroup3:VehicleAge2  4.926e-04  7.427e-04   0.663 0.508900    
AgeGroup4:VehicleAge2  4.381e-04  7.479e-04   0.586 0.559515    
AgeGroup5:VehicleAge2  7.037e-05  9.103e-04   0.077 0.938557    
AgeGroup6:VehicleAge2  5.343e-04  8.195e-04   0.652 0.516136    
AgeGroup7:VehicleAge2  7.177e-04  8.316e-04   0.863 0.390448    
AgeGroup8:VehicleAge2  6.604e-04  8.084e-04   0.817 0.416215    
AgeGroup2:VehicleAge3 -7.669e-05  1.083e-03  -0.071 0.943683    
AgeGroup3:VehicleAge3  6.353e-04  1.178e-03   0.539 0.591068    
AgeGroup4:VehicleAge3  8.114e-04  1.201e-03   0.675 0.501176    
AgeGroup5:VehicleAge3 -8.548e-04  1.226e-03  -0.697 0.487441    
AgeGroup6:VehicleAge3 -7.627e-04  1.126e-03  -0.677 0.500009    
AgeGroup7:VehicleAge3 -9.432e-04  1.110e-03  -0.850 0.397645    
AgeGroup8:VehicleAge3 -2.783e-04  1.145e-03  -0.243 0.808596    
AgeGroup2:VehicleAge4 -4.484e-03  2.463e-03  -1.821 0.072080 .  
AgeGroup3:VehicleAge4 -8.235e-03  2.252e-03  -3.658 0.000433 ***
AgeGroup4:VehicleAge4 -1.483e-03  2.585e-03  -0.574 0.567610    
AgeGroup5:VehicleAge4 -5.736e-03  2.521e-03  -2.276 0.025289 *  
AgeGroup6:VehicleAge4 -3.781e-03  2.474e-03  -1.528 0.129977    
AgeGroup7:VehicleAge4 -4.899e-03  2.411e-03  -2.032 0.045131 *  
AgeGroup8:VehicleAge4 -3.335e-03  2.489e-03  -1.340 0.183653    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 0.06733543)

    Null deviance: 27.839  on 122  degrees of freedom
Residual deviance:  7.611  on  88  degrees of freedom
  (5 observations deleted due to missingness)
AIC: 1391.2

Number of Fisher Scoring iterations: 6

The results can be tabulated to show car goup averages and averages for the combination of human and vehicle ages. We should check the residual analysis plots first though:

plot(Claims.glm1a)