7.7 Converting Data from Long to Wide

7.7.1 Problem

You want to convert a data frame from “long” format to “wide” format.

7.7.2 Solution

Use the spread() function from the tidyr package. In this example, we’ll use the plum data set, which is in a long format:

library(gcookbook) # For the data set
plum
#>   length      time survival count
#> 1   long   at_once     dead    84
#> 2   long in_spring     dead   156
#>  ...<4 more rows>...
#> 7  short   at_once    alive   107
#> 8  short in_spring    alive    31

The conversion to wide format takes each unique value in one column and uses those values as headers for new columns, then uses another column for source values. For example, we can “move” values in the survival column to the top and fill them with values from count:

library(tidyr)
spread(plum, survival, count)
#>   length      time dead alive
#> 1   long   at_once   84   156
#> 2   long in_spring  156    84
#> 3  short   at_once  133   107
#> 4  short in_spring  209    31

7.7.3 Discussion

The spread() function requires you to specify a key column which is used for header names, and a value column which is used to fill the values in the output data frame. It’s assumed that you want to use all the other columns as ID variables.

In the preceding example, there are two ID columns, length and time, one key column, survival, and one value column, count. What if we want to use two of the columns as keys? Suppose, for example, that we want to use length and survival as keys. This would leave us with time as the ID column.

The way to do this is to combine the length and survival columns together and put it in a new column, then use that new column as a key.

# Create a new column, length_survival, from length and survival.
plum %>%
  unite(length_survival, length, survival)
#>   length_survival      time count
#> 1       long_dead   at_once    84
#> 2       long_dead in_spring   156
#>  ...<4 more rows>...
#> 7     short_alive   at_once   107
#> 8     short_alive in_spring    31

# Now pass it to spread() and use length_survival as a key
plum %>%
  unite(length_survival, length, survival) %>%
  spread(length_survival, count)
#>        time long_alive long_dead short_alive short_dead
#> 1   at_once        156        84         107        133
#> 2 in_spring         84       156          31        209

7.7.4 See Also

See Recipe 7.6 to do conversions in the other direction, from wide to long.

See the unstack() function for another way of converting from long to wide.