In this exercise you will:
The MASS
package includes a dataset detailing the death
of snails under experimental conditions. Obtain it using:
data(snails, package="MASS")
str(snails)
'data.frame': 96 obs. of 6 variables:
$ Species : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
$ Exposure: int 1 1 1 1 1 1 1 1 1 1 ...
$ Rel.Hum : num 60 60 60 65.8 65.8 65.8 70.5 70.5 70.5 75.8 ...
$ Temp : int 10 15 20 10 15 20 10 15 20 10 ...
$ Deaths : int 0 0 0 0 0 0 0 0 0 0 ...
$ N : int 20 20 20 20 20 20 20 20 20 20 ...
Comment: Three variables that should be factors were stored as numeric.
= snails |> mutate(Exposure=as.factor(Exposure), Rel.Hum =as.factor(Rel.Hum), Temp=as.factor(Temp)) |> glimpse() Snails
Rows: 96
Columns: 6
$ Species <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A…
$ Exposure <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ Rel.Hum <fct> 60, 60, 60, 65.8, 65.8, 65.8, 70.5, 70.5, 70.5, 75.8, 75.8, 7…
$ Temp <fct> 10, 15, 20, 10, 15, 20, 10, 15, 20, 10, 15, 20, 10, 15, 20, 1…
$ Deaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0…
$ N <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 2…
= glm(cbind(Deaths, N-Deaths) ~ Species*Exposure*Rel.Hum*Temp, data=Snails, family="binomial")
Snails.glm1 anova(Snails.glm1, test="Chisq")
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(Deaths, N - Deaths)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 95 539.72
Species 1 53.60 94 486.12 2.454e-13 ***
Exposure 3 367.57 91 118.54 < 2.2e-16 ***
Rel.Hum 3 62.92 88 55.62 1.394e-13 ***
Temp 2 24.81 86 30.81 4.091e-06 ***
Species:Exposure 3 0.26 83 30.55 0.9676
Species:Rel.Hum 3 1.51 80 29.04 0.6797
Exposure:Rel.Hum 9 1.54 71 27.50 0.9968
Species:Temp 2 0.01 69 27.48 0.9934
Exposure:Temp 6 6.86 63 20.63 0.3343
Rel.Hum:Temp 6 1.27 57 19.36 0.9732
Species:Exposure:Rel.Hum 9 2.42 48 16.94 0.9831
Species:Exposure:Temp 6 4.76 42 12.18 0.5753
Species:Rel.Hum:Temp 6 1.45 36 10.73 0.9626
Exposure:Rel.Hum:Temp 18 8.14 18 2.59 0.9765
Species:Exposure:Rel.Hum:Temp 18 2.59 0 0.00 1.0000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comment: This is a saturated model and it has zeroes n some cells. Effects are orthogonal so we can remove more than one at a time, ultimately reducing back to a main effects only model.
= glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum+Temp, data=Snails, family="binomial")
Snails.glm2 anova(Snails.glm2, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(Deaths, N - Deaths)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 95 539.72
Species 1 53.60 94 486.12 2.454e-13 ***
Exposure 3 367.57 91 118.54 < 2.2e-16 ***
Rel.Hum 3 62.92 88 55.62 1.394e-13 ***
Temp 2 24.81 86 30.81 4.091e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Snails.glm2)
Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure +
Rel.Hum + Temp, family = "binomial", data = Snails)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.70126 -0.36953 -0.00008 0.30349 1.23318
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -23.1558 2040.6813 -0.011 0.99095
SpeciesB 1.2895 0.1618 7.969 1.59e-15 ***
Exposure2 18.9411 2040.6813 0.009 0.99259
Exposure3 21.1800 2040.6813 0.010 0.99172
Exposure4 22.1251 2040.6813 0.011 0.99135
Rel.Hum65.8 -0.6140 0.1979 -3.103 0.00192 **
Rel.Hum70.5 -1.2561 0.2156 -5.825 5.72e-09 ***
Rel.Hum75.8 -1.5872 0.2290 -6.930 4.19e-12 ***
Temp15 0.5756 0.1981 2.906 0.00366 **
Temp20 0.9444 0.1942 4.863 1.16e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 539.721 on 95 degrees of freedom
Residual deviance: 30.807 on 86 degrees of freedom
AIC: 209.67
Number of Fisher Scoring iterations: 19
quasibinomial
?= glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum+Temp, data=Snails, family="quasibinomial")
Snails.glm3 summary(Snails.glm3)
Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure +
Rel.Hum + Temp, family = "quasibinomial", data = Snails)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.70126 -0.36953 -0.00008 0.30349 1.23318
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -23.15583 1132.19943 -0.020 0.984
SpeciesB 1.28951 0.08977 14.364 < 2e-16 ***
Exposure2 18.94108 1132.19944 0.017 0.987
Exposure3 21.18004 1132.19943 0.019 0.985
Exposure4 22.12515 1132.19943 0.020 0.984
Rel.Hum65.8 -0.61395 0.10979 -5.592 2.61e-07 ***
Rel.Hum70.5 -1.25606 0.11964 -10.498 < 2e-16 ***
Rel.Hum75.8 -1.58719 0.12706 -12.492 < 2e-16 ***
Temp15 0.57559 0.10989 5.238 1.14e-06 ***
Temp20 0.94436 0.10774 8.765 1.48e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 0.3078191)
Null deviance: 539.721 on 95 degrees of freedom
Residual deviance: 30.807 on 86 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 19
Comment: There is obviously some underdispersion here. It won’t make all that much difference to the conclusions made, but it is better to use the quasi model when reporting findings.
binomial
and `quasibinomial versions, and compare.Removing Temp
from the model gives:
= glm(cbind(Deaths, N-Deaths) ~ Species+Exposure+Rel.Hum, data=Snails, family="quasibinomial")
Snails.glm4 summary(Snails.glm4)
Call:
glm(formula = cbind(Deaths, N - Deaths) ~ Species + Exposure +
Rel.Hum, family = "quasibinomial", data = Snails)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.22685 -0.52710 -0.00008 0.50877 1.31354
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22.6325 1546.5591 -0.015 0.988
SpeciesB 1.2572 0.1168 10.766 < 2e-16 ***
Exposure2 19.0005 1546.5591 0.012 0.990
Exposure3 21.2049 1546.5591 0.014 0.989
Exposure4 22.1249 1546.5591 0.014 0.989
Rel.Hum65.8 -0.5974 0.1430 -4.178 6.91e-05 ***
Rel.Hum70.5 -1.2233 0.1557 -7.854 9.19e-12 ***
Rel.Hum75.8 -1.5469 0.1654 -9.351 7.76e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 0.5370571)
Null deviance: 539.72 on 95 degrees of freedom
Residual deviance: 55.62 on 88 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 19
Comment: The dispersion has increased because we are not using hte right model here. It shows us that not including all significant terms leads to an increase in dispersion.
N.B. The large number of iterations required to fit the above models ought to be a concern. This arises because the fitted values are close to zero.