Practical Computing Exercise for Week 8: Helpfulness of Strangers — Solutions

Aims of this practical exercise

In this exercise you will:

  • fit a selection of log-linear models.

The exercise

A sociological experiment examined the way racial descent and gender influenced people’s helpfulness towards a stranger. The data, a \(2\times2\times2\times2\) array, is shown in the table below.

Requestor Respondents
Female Male Total
Help Refuse Total Help Refuse Total Help Refuse Total
English
females 23 0 23 24 3 27 47 3 50
males 20 4 24 21 5 26 41 9 50
Asian
females 25 2 27 17 11 28 42 13 55
males 9 15 24 21 5 26 30 20 50

Students of similar age and dressed alike approached strangers in a busy shopping precinct and requested change for a phone call. If the stranger provided or looked for change the response was counted as helpful. Not replying or not looking were counted as unhelpful. The stranger’s gender was also noted. The data can be obtained using:

    data(Helpful, package="ELMER")

The students were either Asian or English, males or females.

  1. What are the explanatory and response variables?

  2. What is the minimal model for a Poisson/log analysis?

  3. Starting with the minimal model add interactions until the deviance drops to a value consistent with random variation. Give an interpretation of this model.

  4. Does your model make sense when you look just at the proportions in the table? In other words, how well could you have predicted the model without formal analysis?

The solution

  1. Ethnicity and gender are explanatory variables; only the helpfulness is a response variable. All are treated equally in the glm() though, but that is just the way we fit a log-linear model.

  2. The starting point is the model with the four main effects.

        Helpful.min <- glm(Count ~ QRace + QGender + AGender + AHelp, data=Helpful, family=poisson)
        summary(Helpful.min)

Call:
glm(formula = Count ~ QRace + QGender + AGender + AHelp, family = poisson, 
    data = Helpful)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.99903    0.14446  20.761  < 2e-16 ***
QRaceEnglish -0.04879    0.13973  -0.349    0.727    
QGendermale  -0.04879    0.13973  -0.349    0.727    
AGendermale   0.08786    0.13982   0.628    0.530    
AHelpRefuse  -1.26851    0.16874  -7.518 5.58e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 110.137  on 15  degrees of freedom
Residual deviance:  41.087  on 11  degrees of freedom
AIC: 114.15

Number of Fisher Scoring iterations: 5
  1. Using step() is an efficient way to find a decent model.
Helpful.step <- step(Helpful.min, scope=.~QRace * QGender *AGender * AHelp, test="Chisq")
Start:  AIC=114.15
Count ~ QRace + QGender + AGender + AHelp

                  Df Deviance    AIC    LRT  Pr(>Chi)    
+ QRace:AHelp      1   29.415 104.48 11.672 0.0006346 ***
+ QGender:AHelp    1   35.370 110.43  5.717 0.0168019 *  
- QGender          1   41.209 112.27  0.122 0.7269148    
- QRace            1   41.209 112.27  0.122 0.7269148    
- AGender          1   41.482 112.55  0.395 0.5295531    
<none>                 41.087 114.15                     
+ QRace:QGender    1   40.971 116.03  0.116 0.7331687    
+ QRace:AGender    1   41.036 116.10  0.051 0.8218613    
+ AGender:AHelp    1   41.057 116.12  0.030 0.8625987    
+ QGender:AGender  1   41.084 116.15  0.003 0.9564728    
- AHelp            1  109.498 180.56 68.411 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step:  AIC=104.48
Count ~ QRace + QGender + AGender + AHelp + QRace:AHelp

                  Df Deviance    AIC     LRT  Pr(>Chi)    
+ QGender:AHelp    1   23.698 100.76  5.7169 0.0168019 *  
- QGender          1   29.537 102.60  0.1220 0.7269148    
- AGender          1   29.810 102.88  0.3952 0.5295531    
<none>                 29.415 104.48                      
+ QRace:QGender    1   29.299 106.36  0.1162 0.7331687    
+ QRace:AGender    1   29.364 106.43  0.0507 0.8218613    
+ AGender:AHelp    1   29.385 106.45  0.0300 0.8625987    
+ QGender:AGender  1   29.412 106.48  0.0030 0.9564728    
- QRace:AHelp      1   41.087 114.15 11.6716 0.0006346 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step:  AIC=100.76
Count ~ QRace + QGender + AGender + AHelp + QRace:AHelp + QGender:AHelp

                  Df Deviance     AIC     LRT  Pr(>Chi)    
- AGender          1   24.093  99.158  0.3952 0.5295531    
<none>                 23.698 100.763                      
+ QRace:QGender    1   22.817 101.882  0.8809 0.3479476    
+ QRace:AGender    1   23.648 102.712  0.0507 0.8218613    
+ AGender:AHelp    1   23.668 102.733  0.0300 0.8625987    
+ QGender:AGender  1   23.695 102.760  0.0030 0.9564728    
- QGender:AHelp    1   29.415 104.480  5.7169 0.0168019 *  
- QRace:AHelp      1   35.370 110.434 11.6716 0.0006346 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step:  AIC=99.16
Count ~ QRace + QGender + AHelp + QRace:AHelp + QGender:AHelp

                Df Deviance     AIC     LRT  Pr(>Chi)    
<none>               24.093  99.158                      
+ QRace:QGender  1   23.213 100.277  0.8809 0.3479476    
+ AGender        1   23.698 100.763  0.3952 0.5295531    
- QGender:AHelp  1   29.810 102.875  5.7169 0.0168019 *  
- QRace:AHelp    1   35.765 108.830 11.6716 0.0006346 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Helpful.step |> summary()

Call:
glm(formula = Count ~ QRace + QGender + AHelp + QRace:AHelp + 
    QGender:AHelp, family = poisson, data = Helpful)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)                2.9970     0.1374  21.814  < 2e-16 ***
QRaceEnglish               0.2007     0.1589   1.263  0.20666    
QGendermale               -0.2260     0.1591  -1.420  0.15561    
AHelpRefuse               -1.2277     0.2991  -4.105 4.05e-05 ***
QRaceEnglish:AHelpRefuse  -1.2123     0.3727  -3.253  0.00114 ** 
QGendermale:AHelpRefuse    0.8207     0.3497   2.347  0.01894 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 110.137  on 15  degrees of freedom
Residual deviance:  24.093  on 10  degrees of freedom
AIC: 99.158

Number of Fisher Scoring iterations: 5
Helpful.step |> anova(test="Chisq")
Analysis of Deviance Table

Model: poisson, link: log

Response: Count

Terms added sequentially (first to last)

              Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                             15    110.137              
QRace          1    0.122        14    110.015 0.7269148    
QGender        1    0.122        13    109.894 0.7269148    
AHelp          1   68.411        12     41.482 < 2.2e-16 ***
QRace:AHelp    1   11.672        11     29.810 0.0006346 ***
QGender:AHelp  1    5.717        10     24.093 0.0168019 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Well you might, but perhaps not all associations are truly obvious, and the ability to see them might depend on which way the four factors are presented in a two-dimensional table.