Practical Computing Exercise for Week 7: Snails — Solutions

Aims of this practical exercise

In this exercise you will:

  • fit binomial and quasibinomial based models
  • compare the impact of going “quasi”

The MASS package includes a dataset detailing the death of snails under experimental conditions. Obtain it using:

data(snails, package="MASS")
str(snails)
'data.frame':   96 obs. of  6 variables:
 $ Species : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ Exposure: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Rel.Hum : num  60 60 60 65.8 65.8 65.8 70.5 70.5 70.5 75.8 ...
 $ Temp    : int  10 15 20 10 15 20 10 15 20 10 ...
 $ Deaths  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ N       : int  20 20 20 20 20 20 20 20 20 20 ...
  1. Convert the relevant variables into factors.

Comment: Three variables that should be factors were stored as numeric.

Snails = snails |> mutate(Exposure=as.factor(Exposure),  Rel.Hum =as.factor(Rel.Hum),  Temp=as.factor(Temp)) |> glimpse()
Rows: 96
Columns: 6
$ Species  <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A…
$ Exposure <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ Rel.Hum  <fct> 60, 60, 60, 65.8, 65.8, 65.8, 70.5, 70.5, 70.5, 75.8, 75.8, 7…
$ Temp     <fct> 10, 15, 20, 10, 15, 20, 10, 15, 20, 10, 15, 20, 10, 15, 20, 1…
$ Deaths   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0…
$ N        <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 2…
  1. Choose a sensible model for the data. N.B. Under most experiments, we fit a model that includes all interactions of factors.
Snails.glm1 = glm(cbind(Deaths, N-Deaths) ~ Species*Exposure*Rel.Hum*Temp, data=Snails, family="binomial")
anova(Snails.glm1, test="Chisq")
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(Deaths, N - Deaths)

Terms added sequentially (first to last)

                              Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                             95     539.72              
Species                        1    53.60        94     486.12 2.454e-13 ***
Exposure                       3   367.57        91     118.54 < 2.2e-16 ***
Rel.Hum                        3    62.92        88      55.62 1.394e-13 ***
Temp                           2    24.81        86      30.81 4.091e-06 ***
Species:Exposure               3     0.26        83      30.55    0.9676    
Species:Rel.Hum                3     1.51        80      29.04    0.6797    
Exposure:Rel.Hum               9     1.54        71      27.50    0.9968    
Species:Temp                   2     0.01        69      27.48    0.9934    
Exposure:Temp                  6     6.86        63      20.63    0.3343    
Rel.Hum:Temp                   6     1.27        57      19.36    0.9732    
Species:Exposure:Rel.Hum       9     2.42        48      16.94    0.9831    
Species:Exposure:Temp          6     4.76        42      12.18    0.5753    
Species:Rel.Hum:Temp           6     1.45        36      10.73    0.9626    
Exposure:Rel.Hum:Temp         18     8.14        18       2.59    0.9765    
Species:Exposure:Rel.Hum:Temp 18     2.59         0       0.00    1.0000    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment: This is a saturated model and it has zeroes n some cells. Effects are orthogonal so we can remove more than one at a time, ultimately reducing back to a main effects only model.

Snails.glm2 = glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum+Temp, data=Snails, family="binomial")
anova(Snails.glm2, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(Deaths, N - Deaths)

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        95     539.72              
Species   1    53.60        94     486.12 2.454e-13 ***
Exposure  3   367.57        91     118.54 < 2.2e-16 ***
Rel.Hum   3    62.92        88      55.62 1.394e-13 ***
Temp      2    24.81        86      30.81 4.091e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Snails.glm2)

Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure + 
    Rel.Hum + Temp, family = "binomial", data = Snails)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.70126  -0.36953  -0.00008   0.30349   1.23318  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -23.1558  2040.6813  -0.011  0.99095    
SpeciesB       1.2895     0.1618   7.969 1.59e-15 ***
Exposure2     18.9411  2040.6813   0.009  0.99259    
Exposure3     21.1800  2040.6813   0.010  0.99172    
Exposure4     22.1251  2040.6813   0.011  0.99135    
Rel.Hum65.8   -0.6140     0.1979  -3.103  0.00192 ** 
Rel.Hum70.5   -1.2561     0.2156  -5.825 5.72e-09 ***
Rel.Hum75.8   -1.5872     0.2290  -6.930 4.19e-12 ***
Temp15         0.5756     0.1981   2.906  0.00366 ** 
Temp20         0.9444     0.1942   4.863 1.16e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 539.721  on 95  degrees of freedom
Residual deviance:  30.807  on 86  degrees of freedom
AIC: 209.67

Number of Fisher Scoring iterations: 19
  1. What happens if you alter the family to quasibinomial?
Snails.glm3 = glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum+Temp, data=Snails, family="quasibinomial")
summary(Snails.glm3)

Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure + 
    Rel.Hum + Temp, family = "quasibinomial", data = Snails)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.70126  -0.36953  -0.00008   0.30349   1.23318  

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -23.15583 1132.19943  -0.020    0.984    
SpeciesB       1.28951    0.08977  14.364  < 2e-16 ***
Exposure2     18.94108 1132.19944   0.017    0.987    
Exposure3     21.18004 1132.19943   0.019    0.985    
Exposure4     22.12515 1132.19943   0.020    0.984    
Rel.Hum65.8   -0.61395    0.10979  -5.592 2.61e-07 ***
Rel.Hum70.5   -1.25606    0.11964 -10.498  < 2e-16 ***
Rel.Hum75.8   -1.58719    0.12706 -12.492  < 2e-16 ***
Temp15         0.57559    0.10989   5.238 1.14e-06 ***
Temp20         0.94436    0.10774   8.765 1.48e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.3078191)

    Null deviance: 539.721  on 95  degrees of freedom
Residual deviance:  30.807  on 86  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 19

Comment: There is obviously some underdispersion here. It won’t make all that much difference to the conclusions made, but it is better to use the quasi model when reporting findings.

  1. Let’s remove a significant factor from this model so that we can see how this impacts on our findings above. Fit both binomial and `quasibinomial versions, and compare.

Removing Temp from the model gives:

Snails.glm4 = glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum, data=Snails, family="quasibinomial")
summary(Snails.glm4)

Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure + 
    Rel.Hum, family = "quasibinomial", data = Snails)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.22685  -0.52710  -0.00008   0.50877   1.31354  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -22.6325  1546.5591  -0.015    0.988    
SpeciesB       1.2572     0.1168  10.766  < 2e-16 ***
Exposure2     19.0005  1546.5591   0.012    0.990    
Exposure3     21.2049  1546.5591   0.014    0.989    
Exposure4     22.1249  1546.5591   0.014    0.989    
Rel.Hum65.8   -0.5974     0.1430  -4.178 6.91e-05 ***
Rel.Hum70.5   -1.2233     0.1557  -7.854 9.19e-12 ***
Rel.Hum75.8   -1.5469     0.1654  -9.351 7.76e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.5370571)

    Null deviance: 539.72  on 95  degrees of freedom
Residual deviance:  55.62  on 88  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 19

Comment: The dispersion has increased because we are not using hte right model here. It shows us that not including all significant terms leads to an increase in dispersion.

Extra note

N.B. The large number of iterations required to fit the above models ought to be a concern. This arises because the fitted values are close to zero.