In this exercise you will:
You need to have installed R, RStudio, and the necessary packages for
the course, including the ELMER
package. See how to
get set up for this course
data(Salmon, package="ELMER" )
glimpse(Salmon)
Rows: 28
Columns: 3
$ Year <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 176, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 1…
= Salmon |> ggplot(aes(y=Recruits, x=Spawners)) + geom_point()
Salmon.Graph0 +geom_smooth(method="lm") Salmon.Graph0
`geom_smooth()` using formula 'y ~ x'
Plot of the Salmon data
= Salmon |> mutate(Ratio = Spawners/Recruits, InvR= 1/Recruits, InvS=1/Spawners, LnRatio= log(Recruits/Spawners)) |> glimpse() Salmon
Rows: 28
Columns: 7
$ Year <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 176, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 1…
$ Ratio <dbl> 0.4347630, 0.4287856, 0.3812500, 0.6210046, 0.2683165, 0.9822…
$ InvR <dbl> 0.0004514673, 0.0007496252, 0.0012500000, 0.0022831050, 0.000…
$ InvS <dbl> 0.0010384216, 0.0017482517, 0.0032786885, 0.0036764706, 0.001…
$ LnRatio <dbl> 0.83295427, 0.84679824, 0.96429995, 0.47641684, 1.31558799, 0…
<- Salmon |> filter(Spawners>200 | Recruits>200) |> glimpse() NewSalmon
Rows: 27
Columns: 7
$ Year <int> 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1…
$ Spawners <int> 963, 572, 305, 272, 824, 940, 486, 307, 1066, 480, 393, 237, …
$ Recruits <int> 2215, 1334, 800, 438, 3071, 957, 934, 971, 2257, 1451, 686, 7…
$ Ratio <dbl> 0.4347630, 0.4287856, 0.3812500, 0.6210046, 0.2683165, 0.9822…
$ InvR <dbl> 0.0004514673, 0.0007496252, 0.0012500000, 0.0022831050, 0.000…
$ InvS <dbl> 0.0010384216, 0.0017482517, 0.0032786885, 0.0036764706, 0.001…
$ LnRatio <dbl> 0.83295427, 0.84679824, 0.96429995, 0.47641684, 1.31558799, 0…
<-lm(Ratio~Spawners, data=NewSalmon)
NewSalmon.lm1 <-lm(InvR~InvS, data=NewSalmon)
NewSalmon.lm2<-lm(LnRatio~Spawners, data=NewSalmon) NewSalmon.lm3
Determine that the data for 1951 in the Salmon example is in fact an influential observation and that it was correct to have removed it from the presented analysis.
N.B. there are multiple approaches here.
use the original and the reduced data to fit the models and see how much the coefficients and therefore fitted values change. Notice the standard errors for the slope coefficients. Depending on the model used for this comparison, the difference in the slope coefficients may not seem all that dramatic.
fit the model to the full data and look at the Cook’s distances and other measures of influence. The standard deviation of residuals is almost certainly reduced if the outlier is removed because its standardised residual is huge (at least for one of the candidate models).
Plot the raw (or even better the standardised) residuals against
the deletion residuals rstudent()
for the model fitted to
the full data.
N.B. the outcome may differ depending on which model you choose to use for this task.