Lecture 6
College of Idaho
CSCI 2025 - Winter 2026
readr package.readr parses files and how to control it.readr packagetidyverse, readr provides functions for reading rectangular data from delimited files (like CSVs).tibbles.We’ll use functions from readr, which is loaded with the tidyverse.
The most common function is read_csv() for comma-separated values.
read_csv() prints the column specification, which is its guess for each column’s type.
project-name/
├── project-name.code-workspace
├── README.md <- Overview of the project and how to run it.
├── data/
│ ├── raw/ <- Original data (never edit these files).
│ └── processed/ <- Cleaned data ready for analysis (RDS or CSV).
├── R/ <- Folder for reusable functions and source code.
├── scripts/ <- Analysis scripts (e.g., 01_cleanup.R, 02_model.R).
├── output/ <- Plots, tables, and exported results.
└── docs/ <- Quarto or RMarkdown reports./ every time you go down a folderdata/raw/MIRION_spec_1.csvloading_the_data.R script, let’s use read_csv to load the three data sets into Rreadr provides functions for other common delimiters:
read_csv2(): For semicolon-separated files. Common in European countries.read_tsv(): For tab-separated files.read_delim(): For files with any delimiter.TRUE or FALSE).readr tries to be clever and guess column types by looking at the first 1000 rows. Sometimes, it guesses wrong.You can override readr’s guesses using the col_types argument. More on this later in the course.
skip = n: Skip the first n lines (useful for files with metadata at the top).comment = "#": Drop all lines that start with #.col_names = FALSE: If your file has no column headers. readr will label them X1, X2, etc.na = "...": Specify which strings should be treated as missing values (e.g., na = "N/A").skipRows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): x, y
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 2
x y
<dbl> <dbl>
1 1 2
To save a data frame back to a file, use write_csv().
MIRION_spec_1.csv into Rdata/processed/MIRION_spec_tidy.csvFor saving intermediate R objects, write_csv() is not ideal because it loses type information (e.g., factors become characters).
It’s better to use write_rds() and read_rds() to save and load a single R object in R’s native RDS format.
The tidyverse ecosystem has packages for many other data types:
readxl: For Excel files (.xls, .xlsx).haven: For other statistical formats (SPSS, Stata, SAS).googlesheets4: For Google Sheets.read_csv() is your go-to for reading delimited text files.readr prints. If they’re wrong, fix them with col_types.write_csv() to save your data frames to CSV files for sharing.write_rds() to save R objects for later use in R.