Practical Computing Exercise for Week 6 :Anther development Solutions

Aims of this practical exercise

In this exercise you will:

  • use a categorical and numeric predictors in a binary logistic regression.

The exercise

The data in the table are the number of embryogenic anthers of the plant species Datura innoxia produced when numbers of anthers were produced under several different conditions. There is one qualitative factor, storage under special conditions or storage under standard conditions, and one quantitative factor, the centrifuging force. In the data given below y is the number of plants which did produce anthers and n is the total number tested at each combination of storage and force.

Storage Centrifuging force
40 150 350
Standard y 55 52 57
n 102 99 108
Special y 55 50 50
n 76 81 90

Fit a logistic model to these data. Consider whether there is an interaction between storage and centrifuging force and whether the effect of force is linear on a logistic scale. Write a report on the analysis for the benefit of a biologist, explaining what the coefficients mean with the logit link.

The solution

Plant <- data.frame(Storage=rep(c("Standard","Special"), c(3,3)), Force=factor(rep(c(40, 150, 350), 2)), y=c(55, 52, 57, 55, 50, 50), n=c(102, 99, 108, 76, 81, 90))
Plant.glm <- glm((y/n)~Storage*Force, data=Plant, weights=n, family=binomial)
Plant.glm |> summary()

Call:
glm(formula = (y/n) ~ Storage * Force, family = binomial, data = Plant, 
    weights = n)

Deviance Residuals: 
[1]  0  0  0  0  0  0

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)                0.9628     0.2565   3.753 0.000174 ***
StorageStandard           -0.8056     0.3244  -2.483 0.013023 *  
Force150                  -0.4848     0.3436  -1.411 0.158279    
Force350                  -0.7397     0.3329  -2.222 0.026276 *  
StorageStandard:Force150   0.4287     0.4450   0.963 0.335377    
StorageStandard:Force350   0.6937     0.4329   1.602 0.109061    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.0452e+01  on 5  degrees of freedom
Residual deviance: 9.5479e-15  on 0  degrees of freedom
AIC: 41.568

Number of Fisher Scoring iterations: 3
Plant.glm |> anova(test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: (y/n)

Terms added sequentially (first to last)

              Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                              5    10.4520           
Storage        1   5.2790         4     5.1730  0.02158 *
Force          2   2.5670         2     2.6059  0.27706  
Storage:Force  2   2.6059         0     0.0000  0.27173  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

N.B. Take note of the following:

  • There are a multitude of ways to create the data for this exercise. As long as you get the right values into a data.frame, it doesn’t matter how.
  • There are several ways to specify a binary logistic regression. Using a proportion and sample size as weights is used here, but we might just as easily have used the matrix of successes and failures approach.
  • We might put the standard technique as the baseline next time.

Report to researcher

It looks like the centrifuging force has no impact on the proportion of plants that developed anthers, but the way they are stored does. The proportion for standard storage is lower than for special storage.

You only need to present a proportion for each kind of storage in your presentation.