Practical Computing Exercise for Week 6: Apple Blemishes — Solutions

Aims of this practical exercise

In this exercise you will:

  • fit logistic regression models using glm().
  • understand the parameter estimates

The exercise

In an experiment to judge the effect of background colour on a grader’s assessment of whether an apple is blemished, 400 blemished apples were randomly divided into four groups of 100, and mixed with another 100 unblemished ones. Each group of 200 apples were then assessed against a different background. Results for the correctly identified blemished apples are shown in the table below. Test the hypothesis that background colour has no effect on the proportion correctly graded blemished, using the methods of GLM. Also show how the parameter estimates given by R relate to the proportions in the table.

Background colour Black Blue Green White
% classified as blemished 71 79 80 71
Apples<-data.frame(Colour=c("Black","Blue","Green","White"),Blemished=c(71,79,80,71),n=rep(100,4))
Apples.glm <- glm((Blemished/100)~Colour, family=binomial, weights=rep(100,4), data=Apples)
Apples.glm |> summary()

Call:
glm(formula = (Blemished/100) ~ Colour, family = binomial, data = Apples, 
    weights = rep(100, 4))

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  8.954e-01  2.204e-01   4.063 4.85e-05 ***
ColourBlue   4.295e-01  3.299e-01   1.302    0.193    
ColourGreen  4.909e-01  3.333e-01   1.473    0.141    
ColourWhite -1.557e-16  3.117e-01   0.000    1.000    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3.9250e+00  on 3  degrees of freedom
Residual deviance: 1.2212e-14  on 0  degrees of freedom
AIC: 27.012

Number of Fisher Scoring iterations: 3
Apples.glm |> fitted()
   1    2    3    4 
0.71 0.79 0.80 0.71 
Apples.glm |> anova(test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: (Blemished/100)

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                       3      3.925         
Colour  3    3.925         0      0.000   0.2697

Notes:

  • you could check that the coefficients do lead to the fitted proportions. This requires that you undo the logit transformation used as the link function here (by default).
  • Summarised data often leads to a fully specified model in situations like this one where the fitted value is equal to the observed data. This is especially common in experiments with categorical predictors.
  • We can use the analysis of deviance to gauge the terms that are added to the model.

Conclusion: The colour of the background seems to make little difference in the ability to correctly identify the blemished apples as blemished.