1.4 Loading a Delimited Text Data File
1.4.2 Solution
The most common way to read in a file is to use comma-separated values (CSV) data:
<- read.csv("datafile.csv") data
Alternatively, you can use the read_csv()
function (note the underscore instead of period) from the readr package. This function is significantly faster than read.csv()
, and
1.4.3 Discussion
Since data files have many different formats, there are many options for loading them. For example, if the data file does not have headers in the first row:
<- read.csv("datafile.csv", header = FALSE) data
The resulting data frame will have columns named V1
, V2
, and so on, and you will probably want to rename them manually:
# Manually assign the header names
names(data) <- c("Column1", "Column2", "Column3")
You can set the delimiter with sep. If it is space-delimited, use sep = " "
. If it is tab-delimited, use \t
, as in:
<- read.csv("datafile.csv", sep = "\t") data
By default, strings in the data are treated as factors. Suppose this is your data file, and you read it in using read.csv()
:
"First","Last","Sex","Number"
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21
The resulting data frame will store First
and Last
as factors, though it makes more sense in this case to treat them as strings (or character vectors in R terminology). To differentiate this, use stringsAsFactors = FALSE
. If there are any columns that should be treated as factors, you can then convert them individually:
<- read.csv("datafile.csv", stringsAsFactors = FALSE)
data
# Convert to factor
$Sex <- factor(data$Sex)
datastr(data)
#> 'data.frame': 3 obs. of 4 variables:
#> $ First : chr "Currer" "Dr." ""
#> $ Last : chr "Bell" "Seuss" "Student"
#> $ Sex : Factor w/ 2 levels "F","M": 1 2 NA
#> $ Number: int 2 49 21
Alternatively, you could load the file with strings as factors, and then convert individual columns from factors to characters.