In this exercise you will:
family="binomial"
versus
family="quasibinomial"
Wedderburn’s leaf blotch data represents the proportion of leaf blotch on ten varieties of barley grown at nine sites in 1965. The amount of leaf blotch is recorded as a percentage. Obtain the data using:
and answer the following questions.
Rows: 90
Columns: 3
$ Prop <dbl> 0.0005, 0.0000, 0.0125, 0.0250, 0.0550, 0.0100, 0.0500, 0.0500…
$ Site <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3,…
$ Variety <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,…
Both predictors are stored as factors.
Leaf.glm = glm(Prop~Site+Variety, family=binomial(), data=LeafBlotch, weights=rep(100, nrow(LeafBlotch)))
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Analysis of Deviance Table
Model: binomial, link: logit
Response: Prop
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 89 4080.3
Site 8 1857.7 81 2222.7 < 2.2e-16 ***
Variety 9 1610.1 72 612.6 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are insufficient degrees of freedom for the interaction term. Fitting the interaction saturates the model. Strictly speaking we could have fitted the interaction which gives us a p-value for the confounded interaction and residual (it is quite large!).
Leaf.glm2 = glm(Prop~Site*Variety, family=binomial(), data=LeafBlotch, weights=rep(100, nrow(LeafBlotch)))
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Warning in eval(family$initialize): non-integer #successes in a binomial glm!
Analysis of Deviance Table
Model: binomial, link: logit
Response: Prop
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 89 4080.3
Site 8 1857.7 81 2222.7 < 2.2e-16 ***
Variety 9 1610.1 72 612.6 < 2.2e-16 ***
Site:Variety 72 612.6 0 0.0 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
:\{r plotLeaf.glm} plot(Leaf.glm,which=1)
Leaf.glm.quasi = glm(Prop~Site+Variety, family=quasibinomial(), data=LeafBlotch, weights=rep(100, nrow(LeafBlotch)))
The differences only appear in the summary()
output.
Call:
glm(formula = Prop ~ Site + Variety, family = quasibinomial(),
data = LeafBlotch, weights = rep(100, nrow(LeafBlotch)))
Deviance Residuals:
Min 1Q Median 3Q Max
-6.4435 -1.3537 -0.2028 0.9500 8.1003
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.0546 1.4220 -5.665 2.84e-07 ***
Site2 1.6391 1.4433 1.136 0.259871
Site3 3.3265 1.3492 2.466 0.016066 *
Site4 3.5822 1.3444 2.664 0.009510 **
Site5 3.5831 1.3444 2.665 0.009494 **
Site6 3.8933 1.3402 2.905 0.004875 **
Site7 4.7300 1.3348 3.544 0.000697 ***
Site8 5.5227 1.3346 4.138 9.38e-05 ***
Site9 6.7946 1.3407 5.068 3.00e-06 ***
Variety2 0.1501 0.7237 0.207 0.836289
Variety3 0.6895 0.6724 1.025 0.308587
Variety4 1.0482 0.6494 1.614 0.110910
Variety5 1.6147 0.6257 2.581 0.011895 *
Variety6 2.3712 0.6090 3.893 0.000219 ***
Variety7 2.5705 0.6065 4.238 6.58e-05 ***
Variety8 3.3420 0.6015 5.556 4.39e-07 ***
Variety9 3.5000 0.6013 5.820 1.51e-07 ***
Variety10 4.2530 0.6042 7.039 9.38e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 8.87778)
Null deviance: 4080.3 on 89 degrees of freedom
Residual deviance: 612.6 on 72 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
The standard errors for the coefficients are larger in the quasi model than in the binomial model and the dispersion parameter is larger, importantly so. The quasi model is a better option.
:\{r plotLeaf.glm.quasi} plot(Leaf.glm.quasi, which=1)
Reading off the summary we see dispersion is 8.87778 which is very large and indicates serious over-dispersion.
Both site and variety have a noticeable impact on the amount of leaf blotch.
The agridat
package includes this dataset and calls it
wedderburn.barley
. The help page for it presents several
other ways to model the proportion that are beyond the expected learning
for the 161.331 course. You might investigate if you have lots of time
on your hands, or after the conclusion of the semester.