4.9 Adding a Confidence Region
4.9.2 Solution
Use geom_ribbon()
and map values to ymin
and ymax
.
In the climate
data set, Anomaly10y
is a 10-year running average of the deviation (in Celsius) from the average 1950–1980 temperature, and Unc10y
is the 95% confidence interval. We’ll set ymax
and ymin
to Anomaly10y
plus or minus Unc10y
(Figure 4.24):
library(gcookbook) # Load gcookbook for the climate data set
library(dplyr)
# Grab a subset of the climate data
<- climate %>%
climate_mod filter(Source == "Berkeley") %>%
select(Year, Anomaly10y, Unc10y)
climate_mod#> Year Anomaly10y Unc10y
#> 1 1800 -0.435 0.505
#> 2 1801 -0.453 0.493
#> 3 1802 -0.460 0.486
#> ...<199 more rows>...
#> 203 2002 0.856 0.028
#> 204 2003 0.869 0.028
#> 205 2004 0.884 0.029
# Shaded region
ggplot(climate_mod, aes(x = Year, y = Anomaly10y)) +
geom_ribbon(aes(ymin = Anomaly10y - Unc10y, ymax = Anomaly10y + Unc10y), alpha = 0.2) +
geom_line()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'Year' with labels 1800, 1850, 1900, 1950 and 2000.
#> It has y-axis 'Anomaly10y' with labels -1.5, -1.0, -0.5, 0.0, 0.5 and 1.0.
#> It has 2 layers.
#> Layer 1 is a ribbon graph that VI can not process.
#> Layer 1 has alpha set to 0.2.
#> Layer 2 is a set of 1 line.
#> Line 1 connects 205 points.
The shaded region is actually a very dark grey, but it is mostly transparent. The transparency is set with alpha = 0.2
, which makes it 80% transparent.
4.9.3 Discussion
Notice that the geom_ribbon()
comes before geom_line()
, so that the line is drawn on top of the shaded region. If the reverse order were used, the shaded region could obscure the line. In this particular case that wouldn’t be a problem since the shaded region is mostly transparent, but it would be a problem if the shaded region were opaque.
Instead of a shaded region, you can also use dotted lines to represent the upper and lower bounds (Figure 4.25):
# With a dotted line for upper and lower bounds
ggplot(climate_mod, aes(x = Year, y = Anomaly10y)) +
geom_line(aes(y = Anomaly10y - Unc10y), colour = "grey50", linetype = "dotted") +
geom_line(aes(y = Anomaly10y + Unc10y), colour = "grey50", linetype = "dotted") +
geom_line()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'Year' with labels 1800, 1850, 1900, 1950 and 2000.
#> It has y-axis 'Anomaly10y' with labels -1.5, -1.0, -0.5, 0.0, 0.5 and 1.0.
#> It has 3 layers.
#> Layer 1 is a set of 1 line.
#> Line 1 connects 205 points.
#> Layer 1 has colour set to medium gray.
#> Layer 1 has linetype set to dotted.
#> Layer 2 is a set of 1 line.
#> Line 1 connects 205 points.
#> Layer 2 has colour set to medium gray.
#> Layer 2 has linetype set to dotted.
#> Layer 3 is a set of 1 line.
#> Line 1 connects 205 points.
Shaded regions can represent things other than confidence regions, such as the difference between two values, for example.
In the area graphs in Recipe 4.7, the y range of the shaded area goes from 0 to y. Here, it goes from ymin
to ymax
.