2.4 Creating a Histogram

2.4.1 Problem

You want to view the distribution of one-dimensional data with a histogram.

2.4.2 Solution

To make a histogram (Figure 2.8), use hist() and pass it a vector of values:

hist(mtcars$mpg)

# Specify approximate number of bins with breaks
hist(mtcars$mpg, breaks = 10)
Histogram with base graphics (left); With more bins. Notice that because the bins are narrower, there are fewer items in each bin. (right)Histogram with base graphics (left); With more bins. Notice that because the bins are narrower, there are fewer items in each bin. (right)

Figure 2.8: Histogram with base graphics (left); With more bins. Notice that because the bins are narrower, there are fewer items in each bin. (right)

With the ggplot2, you can get a similar result using geom_histogram() (Figure 2.9):

library(ggplot2)
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'mpg' with labels 10, 15, 20, 25, 30 and 35.
#> It has y-axis 'count' with labels 0, 1, 2, 3, 4 and 5.
#> The chart is a bar chart with 30 vertical bars.

# With wider bins
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 4)
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'mpg' with labels 10, 15, 20, 25, 30 and 35.
#> It has y-axis 'count' with labels 0.0, 2.5, 5.0, 7.5 and 10.0.
#> The chart is a bar chart with 6 vertical bars.
#> Bar 1 is centered horizontally at 12, and spans vertically from 0 to 3.
#> Bar 2 is centered horizontally at 16, and spans vertically from 0 to 10.
#> Bar 3 is centered horizontally at 20, and spans vertically from 0 to 10.
#> Bar 4 is centered horizontally at 24, and spans vertically from 0 to 4.
#> Bar 5 is centered horizontally at 28, and spans vertically from 0 to 1.
#> Bar 6 is centered horizontally at 32, and spans vertically from 0 to 4.
ggplot2 histogram with default bin width (left); With wider bins (right)ggplot2 histogram with default bin width (left); With wider bins (right)

Figure 2.9: ggplot2 histogram with default bin width (left); With wider bins (right)

When you create a histogram without specifying the bin width, ggplot() prints out a message telling you that it’s defaulting to 30 bins, and to pick a better bin width. This is because it’s important to explore your data using different bin widths; the default of 30 may or may not show you something useful about your data.

2.4.3 See Also

For more in-depth information about creating histograms, see Recipes Recipe 6.1 and Recipe ??.