Practical Computing Exercise for Week 5 :The second River Problems exercise using all of the data Solutions

Aims of this practical exercise

In this exercise you will:

Complete the exercise given in ELMER
get some practice using R markdown
fit a variety of models to the entire RiverProbAll data.

Before you undertake this exercise…

You should only attempt this (second) exercise using this dataset after completing the first exercise.)

Get the data

data(RiverProbAll, package="ELMER")
str(RiverProbAll)

'data.frame':   91 obs. of  3 variables:
 $ Flow    : int  24 24 26 26 50 48 72 72 25 23 ...
 $ Type    : Factor w/ 3 levels "Canadian","Kayak",..: 1 3 2 1 1 1 1 2 1 2 ...
 $ Problems: int  1 0 0 0 1 0 1 2 1 3 ...

You’ll also want the subset used previously.

data(RiverProb, package="ELMER")
str(RiverProb)

'data.frame':   21 obs. of  2 variables:
 $ Flow    : int  24 36 24 25 72 37 46 24 37 46 ...
 $ Problems: int  0 0 0 0 0 0 0 0 0 0 ...

The Exercise

The dataset RiverProbAll gives the number of problems experienced by travelers on the Whanganui river. This is the full data set including the craft type as a categorical variable.

In the previous exercise, you fitted a generalised linear model to this data using a Poisson distribution for the number of problems, using a log link function.

The example in Chapter 3 of ELMER used a subset of the Whanganui river data for Other craft. An identity link was used in that analysis instead of a log link.

Calculate the predicted number of problems for a range of river flows using this model, and compare with the predictions for the same flow and Other craft type for the model in the previous exercise .

Compare the models using a graph.

The solution

We need the model from the last exercise and the coefficients from the Chapter 3 example which used the identity link. It makes sense to just use the subset data given we are just comparing fitted values but we should make sure the coefficients are the same for the Other craft.

River.glm1 <- glm(Problems~Flow*Type, family=poisson(link="log"),data=RiverProbAll)
River.glm1 |> coef()

   (Intercept)           Flow      TypeKayak      TypeOther Flow:TypeKayak 
   0.081954697   -0.012241681    0.016486317    3.788587331    0.003830624 
Flow:TypeOther 
  -0.153219124

River.glm1a <- glm(Problems~Flow, family=poisson(link="log"), data=RiverProb)
River.glm1a |> coef()

(Intercept)        Flow 
  3.8705420  -0.1654608

Beta0 = coef(River.glm1a)[1]; Beta1 = coef(River.glm1a)[2]

Hint: the values of the final parameters after 8 iterations were 1.22329 and -0.01703 for the weighted model fitted in Chapter 3.

Create the graph, add the straight line for the model using the identity link (dark blue), and then the back-transformed line created on the log scale (green).

Graph0 = RiverProb |> ggplot(aes(x=Flow, y=Problems)) + geom_point() +
xlab("Flow") + ylab("Number of problems") 
Graph0 + 
geom_abline(intercept =  1.22329, slope = -0.01703, col="darkblue") +
geom_function(fun=function(x) {exp(Beta0 + Beta1*x)}, col="green")

scatter plot

Massey University — School of Mathematical and Computational Sciences

161.331 Biostatistics

Staff member responsible for this workshop: Jonathan Godfrey a.j.godfrey@massey.ac.nz