3.1 Making a Basic Bar Graph

3.1.1 Problem

You have a data frame where one column represents the x position of each bar, and another column represents the vertical (y) height of each bar.

3.1.2 Solution

Use ggplot() with geom_col() and specify what variables you want on the x- and y-axes (Figure 3.1):

library(gcookbook)  # Load gcookbook for the pg_mean data set
ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'group' with labels ctrl, trt1 and trt2.
#> It has y-axis 'weight' with labels 0, 2 and 4.
#> The chart is a bar chart with 3 vertical bars.
#> Bar 1 is centered horizontally at ctrl, and spans vertically from 0 to 5.03.
#> Bar 2 is centered horizontally at trt1, and spans vertically from 0 to 4.66.
#> Bar 3 is centered horizontally at trt2, and spans vertically from 0 to 5.53.
Bar graph of values with a discrete x-axis

Figure 3.1: Bar graph of values with a discrete x-axis

Note

In previous versions of ggplot2, the recommended way to create a bar graph of values was to use geom_bar(stat = "identity"). As of ggplot2 2.2.0, there is a geom_col() function which does the same thing.

3.1.3 Discussion

When x is a continuous (or numeric) variable, the bars behave a little differently. Instead of having one bar at each actual x value, there is one bar at each possible x value between the minimum and the maximum, as in Figure 3.2. You can convert the continuous variable to a discrete variable by using factor().

# There's no entry for Time == 6
BOD
#>   Time demand
#> 1    1    8.3
#> 2    2   10.3
#> 3    3   19.0
#> 4    4   16.0
#> 5    5   15.6
#> 6    7   19.8

# Time is numeric (continuous)
str(BOD)
#> 'data.frame':    6 obs. of  2 variables:
#>  $ Time  : num  1 2 3 4 5 7
#>  $ demand: num  8.3 10.3 19 16 15.6 19.8
#>  - attr(*, "reference")= chr "A1.4, p. 270"

ggplot(BOD, aes(x = Time, y = demand)) +
  geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'Time' with labels 2, 4 and 6.
#> It has y-axis 'demand' with labels 0, 5, 10, 15 and 20.
#> The chart is a bar chart with 6 vertical bars.
#> Bar 1 is centered horizontally at 1, and spans vertically from 0 to 8.3.
#> Bar 2 is centered horizontally at 2, and spans vertically from 0 to 10.3.
#> Bar 3 is centered horizontally at 3, and spans vertically from 0 to 19.
#> Bar 4 is centered horizontally at 4, and spans vertically from 0 to 16.
#> Bar 5 is centered horizontally at 5, and spans vertically from 0 to 15.6.
#> Bar 6 is centered horizontally at 7, and spans vertically from 0 to 19.8.

# Convert Time to a discrete (categorical) variable with factor()
ggplot(BOD, aes(x = factor(Time), y = demand)) +
  geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'factor(Time)' with labels 1, 2, 3, 4, 5 and 7.
#> It has y-axis 'demand' with labels 0, 5, 10, 15 and 20.
#> The chart is a bar chart with 6 vertical bars.
#> Bar 1 is centered horizontally at 1, and spans vertically from 0 to 8.3.
#> Bar 2 is centered horizontally at 2, and spans vertically from 0 to 10.3.
#> Bar 3 is centered horizontally at 3, and spans vertically from 0 to 19.
#> Bar 4 is centered horizontally at 4, and spans vertically from 0 to 16.
#> Bar 5 is centered horizontally at 5, and spans vertically from 0 to 15.6.
#> Bar 6 is centered horizontally at 7, and spans vertically from 0 to 19.8.
Bar graph of values with a continuous x-axis (left); With x variable converted to a factor (notice that the space for 6 is gone; right)Bar graph of values with a continuous x-axis (left); With x variable converted to a factor (notice that the space for 6 is gone; right)

Figure 3.2: Bar graph of values with a continuous x-axis (left); With x variable converted to a factor (notice that the space for 6 is gone; right)

Notice that there was no row in BOD for Time = 6. When the x variable is continuous, ggplot2 will use a numeric axis which will have space for all numeric values within the range – hence the empty space for 6 in the plot. When Time is converted to a factor, ggplot2 uses it as a discrete variable, where the values are treated as arbitrary labels instead of numeric values, and so it won’t allocate space on the x axis for all possible numeric values between the minimum and maximum.

In these examples, the data has a column for x values and another for y values. If you instead want the height of the bars to represent the count of cases in each group, see Recipe 3.3.

By default, bar graphs use a dark grey for the bars. To use a color fill, use fill. Also, by default, there is no outline around the fill. To add an outline, use colour. For Figure 3.3, we use a light blue fill and a black outline:

ggplot(pg_mean, aes(x = group, y = weight)) +
  geom_col(fill = "lightblue", colour = "black")
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'group' with labels ctrl, trt1 and trt2.
#> It has y-axis 'weight' with labels 0, 2 and 4.
#> The chart is a bar chart with 3 vertical bars.
#> Bar 1 is centered horizontally at ctrl, and spans vertically from 0 to 5.03.
#> Bar 2 is centered horizontally at trt1, and spans vertically from 0 to 4.66.
#> Bar 3 is centered horizontally at trt2, and spans vertically from 0 to 5.53.
#> It has fill set to very light greenish blue.
#> It has colour set to black.
A single fill and outline color for all bars

Figure 3.3: A single fill and outline color for all bars

Note

In ggplot2, the default is to use the British spelling, colour, instead of the American spelling, color. Internally, American spellings are remapped to the British ones, so if you use the American spelling it will still work.

3.1.4 See Also

If you want the height of the bars to represent the count of cases in each group, see Recipe 3.3.

To reorder the levels of a factor based on the values of another variable, see Recipe ??. To manually change the order of factor levels, see Recipe ??.

For more information about using colors, see Chapter ??.