Practical Computing Exercise for Week 3 :The Salmon exercise Solutions

Aims of this practical exercise

In this exercise you will:

  • rework some of the example given in ELMER
  • complete the corresponding exercise in ELMER

Before you undertake this exercise…

You need to have installed R, RStudio, and the necessary packages for the course, including the ELMER package. See how to get set up for this course

Get the data

data(Salmon, package="ELMER" ) 
glimpse(Salmon)
Rows: 28
Columns: 3
$ Year     <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 176, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 1…
Salmon.Graph0 = Salmon |> ggplot(aes(y=Recruits, x=Spawners)) + geom_point() 
Salmon.Graph0+geom_smooth(method="lm")
`geom_smooth()` using formula 'y ~ x'
Plot of the Salmon data

Plot of the Salmon data

Salmon = Salmon |> mutate(Ratio = Spawners/Recruits, InvR= 1/Recruits, InvS=1/Spawners, LnRatio= log(Recruits/Spawners)) |> glimpse()
Rows: 28
Columns: 7
$ Year     <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 176, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 1…
$ Ratio    <dbl> 0.4347630, 0.4287856, 0.3812500, 0.6210046, 0.2683165, 0.9822…
$ InvR     <dbl> 0.0004514673, 0.0007496252, 0.0012500000, 0.0022831050, 0.000…
$ InvS     <dbl> 0.0010384216, 0.0017482517, 0.0032786885, 0.0036764706, 0.001…
$ LnRatio  <dbl> 0.83295427, 0.84679824, 0.96429995, 0.47641684, 1.31558799, 0…
    NewSalmon <- Salmon |> filter(Spawners>200 | Recruits>200)     |> glimpse()
Rows: 27
Columns: 7
$ Year     <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 237, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 7…
$ Ratio    <dbl> 0.4347630, 0.4287856, 0.3812500, 0.6210046, 0.2683165, 0.9822…
$ InvR     <dbl> 0.0004514673, 0.0007496252, 0.0012500000, 0.0022831050, 0.000…
$ InvS     <dbl> 0.0010384216, 0.0017482517, 0.0032786885, 0.0036764706, 0.001…
$ LnRatio  <dbl> 0.83295427, 0.84679824, 0.96429995, 0.47641684, 1.31558799, 0…
    NewSalmon.lm1 <-lm(Ratio~Spawners, data=NewSalmon)
    NewSalmon.lm2<-lm(InvR~InvS, data=NewSalmon)
    NewSalmon.lm3<-lm(LnRatio~Spawners, data=NewSalmon)

The question

Determine that the data for 1951 in the Salmon example is in fact an influential observation and that it was correct to have removed it from the presented analysis.

N.B. there are multiple approaches here.

  1. use the original and the reduced data to fit the models and see how much the coefficients and therefore fitted values change. Notice the standard errors for the slope coefficients. Depending on the model used for this comparison, the difference in the slope coefficients may not seem all that dramatic.

  2. fit the model to the full data and look at the Cook’s distances and other measures of influence. The standard deviation of residuals is almost certainly reduced if the outlier is removed because its standardised residual is huge (at least for one of the candidate models).

  3. Plot the raw (or even better the standardised) residuals against the deletion residuals rstudent() for the model fitted to the full data.

N.B. the outcome may differ depending on which model you choose to use for this task.