7.2 Calculating New Columns From Existing Columns

7.2.1 Problem

You want to calculate a new column of values in a data frame.

7.2.2 Solution

Use mutate() from the dplyr package.

library(gcookbook) # Load gcookbook for the heightweight data set
heightweight
#>     sex ageYear ageMonth heightIn weightLb
#> 1     f   11.92      143     56.3     85.0
#> 2     f   12.92      155     62.3    105.0
#>  ...<232 more rows>...
#> 236   m   13.92      167     62.0    107.5
#> 237   m   12.58      151     59.3     87.0

This will convert heightIn to centimeters and store it in a new column, heightCm:

library(dplyr)
heightweight %>%
  mutate(heightCm = heightIn * 2.54)
#>     sex ageYear ageMonth heightIn weightLb heightCm
#> 1     f   11.92      143     56.3     85.0  143.002
#> 2     f   12.92      155     62.3    105.0  158.242
#>  ...<232 more rows>...
#> 236   m   13.92      167     62.0    107.5  157.480
#> 237   m   12.58      151     59.3     87.0  150.622

This returns a new data frame, so if you want to replace the original variable, you will need to save the result over it.

7.2.3 Discussion

You can use mutate() to transform multiple columns at once:

heightweight %>%
  mutate(
    heightCm = heightIn * 2.54,
    weightKg = weightLb / 2.204
  )
#>     sex ageYear ageMonth heightIn weightLb heightCm weightKg
#> 1     f   11.92      143     56.3     85.0  143.002 38.56624
#> 2     f   12.92      155     62.3    105.0  158.242 47.64065
#>  ...<232 more rows>...
#> 236   m   13.92      167     62.0    107.5  157.480 48.77495
#> 237   m   12.58      151     59.3     87.0  150.622 39.47368

It is also possible to calculate a new column based on multiple columns:

heightweight %>%
  mutate(bmi = weightKg / (heightCm / 100)^2)

With mutate(), the columns are added sequentially. That means that we can reference a newly-created column when calculating a new column:

heightweight %>%
  mutate(
    heightCm = heightIn * 2.54,
    weightKg = weightLb / 2.204,
    bmi = weightKg / (heightCm / 100)^2
  )
#>     sex ageYear ageMonth heightIn weightLb heightCm weightKg      bmi
#> 1     f   11.92      143     56.3     85.0  143.002 38.56624 18.85919
#> 2     f   12.92      155     62.3    105.0  158.242 47.64065 19.02542
#>  ...<232 more rows>...
#> 236   m   13.92      167     62.0    107.5  157.480 48.77495 19.66736
#> 237   m   12.58      151     59.3     87.0  150.622 39.47368 17.39926

With base R, calculating a new colum can be done by referencing the new column with the $ operator and assigning some values to it:

heightweight$heightCm <- heightweight$heightIn * 2.54

7.2.4 See Also

See Recipe 7.3 for how to perform group-wise transformations on data.