2.5 Creating a Box Plot

2.5.1 Problem

You want to create a box plot for comparing distributions.

2.5.2 Solution

To make a box plot (Figure 2.10), use plot() and pass it a factor of x values and a vector of y values. When x is a factor (as opposed to a numeric vector), it will automatically create a box plot:

plot(ToothGrowth$supp, ToothGrowth$len)
Box plot with base graphics (left); With multiple grouping variables (right)

Figure 2.10: Box plot with base graphics (left); With multiple grouping variables (right)

If the two vectors are in the same data frame, you can also use the boxplot() function with formula syntax. With this syntax, you can combine two variables on the x-axis, as in Figure 2.10:

# Formula syntax
boxplot(len ~ supp, data = ToothGrowth)

# Put interaction of two variables on x-axis
boxplot(len ~ supp + dose, data = ToothGrowth)

With the ggplot2 package, you can get a similar result (Figure 2.11), with geom_boxplot():

library(ggplot2)
ggplot(ToothGrowth, aes(x = supp, y = len)) +
  geom_boxplot()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'supp' with labels OJ and VC.
#> It has y-axis 'len' with labels 10, 20 and 30.
#> The chart is a boxplot comprised of 2 boxes with whiskers.
#> There is a box at x=OJ.
#> It has median 22.7. The box goes from 15.52 to 25.73, and the whiskers extend to 8.2 and 30.9.
#> There are 0 outliers for this boxplot.
#> There is a box at x=VC.
#> It has median 16.5. The box goes from 11.2 to 23.1, and the whiskers extend to 4.2 and 33.9.
#> There are 0 outliers for this boxplot.
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'interaction(supp, dose)' with labels OJ.0.5, VC.0.5, OJ.1, VC.1, OJ.2 and VC.2.
#> It has y-axis 'len' with labels 10, 20 and 30.
#> The chart is a boxplot comprised of 6 boxes with whiskers.
#> There is a box at x=OJ.0.5.
#> It has median 12.25. The box goes from 9.7 to 16.18, and the whiskers extend to 8.2 and 21.5.
#> There are 0 outliers for this boxplot.
#> There is a box at x=VC.0.5.
#> It has median 7.15. The box goes from 5.95 to 10.9, and the whiskers extend to 4.2 and 11.5.
#> There are 0 outliers for this boxplot.
#> There is a box at x=OJ.1.
#> It has median 23.45. The box goes from 20.3 to 25.65, and the whiskers extend to 14.5 and 27.3.
#> There are 0 outliers for this boxplot.
#> There is a box at x=VC.1.
#> It has median 16.5. The box goes from 15.27 to 17.3, and the whiskers extend to 13.6 and 18.8.
#> There are 1 outliers for this boxplot.
#> There is a box at x=OJ.2.
#> It has median 25.95. The box goes from 24.58 to 27.08, and the whiskers extend to 22.4 and 29.4.
#> There are 1 outliers for this boxplot.
#> There is a box at x=VC.2.
#> It has median 25.95. The box goes from 23.38 to 28.8, and the whiskers extend to 18.5 and 33.9.
#> There are 0 outliers for this boxplot.
Box plot with ggplot() (left); With multiple grouping variables (right)Box plot with ggplot() (left); With multiple grouping variables (right)

Figure 2.11: Box plot with ggplot() (left); With multiple grouping variables (right)

It’s also possible to make box plots for multiple variables, by combining the variables with interaction(), as in Figure 2.11:

ggplot(ToothGrowth, aes(x = interaction(supp, dose), y = len)) +
  geom_boxplot()

Note

You may have noticed that the box plots from base graphics are ever-so-slightly different from those from ggplot2. This is because they use slightly different methods for calculating quantiles. See ?geom_boxplot and ?boxplot.stats for more information on how they differ.

2.5.3 See Also

For more on making basic box plots, see Recipe ??.