3.1 Making a Basic Bar Graph
3.1.1 Problem
You have a data frame where one column represents the x position of each bar, and another column represents the vertical (y) height of each bar.
3.1.2 Solution
Use ggplot()
with geom_col()
and specify what variables you want on the x- and y-axes (Figure 3.1):
library(gcookbook) # Load gcookbook for the pg_mean data set
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'group' with labels ctrl, trt1 and trt2.
#> It has y-axis 'weight' with labels 0, 2 and 4.
#> The chart is a bar chart with 3 vertical bars.
#> Bar 1 is centered horizontally at ctrl, and spans vertically from 0 to 5.03.
#> Bar 2 is centered horizontally at trt1, and spans vertically from 0 to 4.66.
#> Bar 3 is centered horizontally at trt2, and spans vertically from 0 to 5.53.
Note
In previous versions of ggplot2, the recommended way to create a bar graph of values was to use
geom_bar(stat = "identity")
. As of ggplot2 2.2.0, there is ageom_col()
function which does the same thing.
3.1.3 Discussion
When x is a continuous (or numeric) variable, the bars behave a little differently. Instead of having one bar at each actual x value, there is one bar at each possible x value between the minimum and the maximum, as in Figure 3.2. You can convert the continuous variable to a discrete variable by using factor()
.
# There's no entry for Time == 6
BOD#> Time demand
#> 1 1 8.3
#> 2 2 10.3
#> 3 3 19.0
#> 4 4 16.0
#> 5 5 15.6
#> 6 7 19.8
# Time is numeric (continuous)
str(BOD)
#> 'data.frame': 6 obs. of 2 variables:
#> $ Time : num 1 2 3 4 5 7
#> $ demand: num 8.3 10.3 19 16 15.6 19.8
#> - attr(*, "reference")= chr "A1.4, p. 270"
ggplot(BOD, aes(x = Time, y = demand)) +
geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'Time' with labels 2, 4 and 6.
#> It has y-axis 'demand' with labels 0, 5, 10, 15 and 20.
#> The chart is a bar chart with 6 vertical bars.
#> Bar 1 is centered horizontally at 1, and spans vertically from 0 to 8.3.
#> Bar 2 is centered horizontally at 2, and spans vertically from 0 to 10.3.
#> Bar 3 is centered horizontally at 3, and spans vertically from 0 to 19.
#> Bar 4 is centered horizontally at 4, and spans vertically from 0 to 16.
#> Bar 5 is centered horizontally at 5, and spans vertically from 0 to 15.6.
#> Bar 6 is centered horizontally at 7, and spans vertically from 0 to 19.8.
# Convert Time to a discrete (categorical) variable with factor()
ggplot(BOD, aes(x = factor(Time), y = demand)) +
geom_col()
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'factor(Time)' with labels 1, 2, 3, 4, 5 and 7.
#> It has y-axis 'demand' with labels 0, 5, 10, 15 and 20.
#> The chart is a bar chart with 6 vertical bars.
#> Bar 1 is centered horizontally at 1, and spans vertically from 0 to 8.3.
#> Bar 2 is centered horizontally at 2, and spans vertically from 0 to 10.3.
#> Bar 3 is centered horizontally at 3, and spans vertically from 0 to 19.
#> Bar 4 is centered horizontally at 4, and spans vertically from 0 to 16.
#> Bar 5 is centered horizontally at 5, and spans vertically from 0 to 15.6.
#> Bar 6 is centered horizontally at 7, and spans vertically from 0 to 19.8.
Notice that there was no row in BOD
for Time
= 6. When the x variable is continuous, ggplot2 will use a numeric axis which will have space for all numeric values within the range – hence the empty space for 6 in the plot. When Time
is converted to a factor, ggplot2 uses it as a discrete variable, where the values are treated as arbitrary labels instead of numeric values, and so it won’t allocate space on the x axis for all possible numeric values between the minimum and maximum.
In these examples, the data has a column for x values and another for y values. If you instead want the height of the bars to represent the count of cases in each group, see Recipe 3.3.
By default, bar graphs use a dark grey for the bars. To use a color fill, use fill
. Also, by default, there is no outline around the fill. To add an outline, use colour
. For Figure 3.3, we use a light blue fill and a black outline:
ggplot(pg_mean, aes(x = group, y = weight)) +
geom_col(fill = "lightblue", colour = "black")
#> This is an untitled chart with no subtitle or caption.
#> It has x-axis 'group' with labels ctrl, trt1 and trt2.
#> It has y-axis 'weight' with labels 0, 2 and 4.
#> The chart is a bar chart with 3 vertical bars.
#> Bar 1 is centered horizontally at ctrl, and spans vertically from 0 to 5.03.
#> Bar 2 is centered horizontally at trt1, and spans vertically from 0 to 4.66.
#> Bar 3 is centered horizontally at trt2, and spans vertically from 0 to 5.53.
#> It has fill set to very light greenish blue.
#> It has colour set to black.
Note
In ggplot2, the default is to use the British spelling, colour, instead of the American spelling, color. Internally, American spellings are remapped to the British ones, so if you use the American spelling it will still work.
3.1.4 See Also
If you want the height of the bars to represent the count of cases in each group, see Recipe 3.3.
To reorder the levels of a factor based on the values of another variable, see Recipe ??. To manually change the order of factor levels, see Recipe ??.
For more information about using colors, see Chapter ??.