Importing Data

Lecture 6

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

This lesson

  • Learn how to import data into R using the readr package.
  • Understand how readr parses files and how to control it.
  • Learn how to write data from R back to a file.
  • Become aware of other packages for importing different file types.

The readr package

  • Part of the tidyverse, readr provides functions for reading rectangular data from delimited files (like CSVs).
  • It’s fast, consistent, and produces tibbles.
  • It provides detailed information about how files were parsed.

Setup

We’ll use functions from readr, which is loaded with the tidyverse.

library(tidyverse)

Reading delimited data

The most common function is read_csv() for comma-separated values.

students <- read_csv("path-to-file.csv")
students

read_csv() prints the column specification, which is its guess for each column’s type.

Paths and Organizing Your Computer

  • Think of your computer like a file cabinet
  • When working on a project in R, put it in a folder and make that folder a “workspace” or “project” in Positron, RStudio, or VSCode

Example

project-name/
├── project-name.code-workspace
├── README.md             <- Overview of the project and how to run it.
├── data/
│   ├── raw/              <- Original data (never edit these files).
│   └── processed/        <- Cleaned data ready for analysis (RDS or CSV).
├── R/                    <- Folder for reusable functions and source code.
├── scripts/              <- Analysis scripts (e.g., 01_cleanup.R, 02_model.R).
├── output/               <- Plots, tables, and exported results.
└── docs/                 <- Quarto or RMarkdown reports.

Where to put this

  • Good practice: use Github
  • DON’T PUT FOLDER IN DOWNLOADS!
  • Organize your computer
  • Let’s look through Dr. F’s organization
  • Avoid putting in Dropbox, OneDrive, Google Drive, folders if using github and writing code

Practice

  • Download Star Formation data from Teams
  • Create a new folder on your computer called StarFormation (don’t put it in your Downloads folder)
  • Create the following folder structure
StarFormation/
├── project-name.code-workspace 
├── README.md             
├── data/
│   ├── raw/              
│   │   ├── MIRION_meta_all_1.csv
│   │   ├── MIRION_meta_din_1.csv
│   │   └── MIRION_spec_1.csv              
│   └── processed/        
├── scripts/              
│   └── loading_the_data.R
├── output/               
└── docs/                 
    └── Tables-for-CSCI2040.txt

Paths

  • Root directory: Highest level folder of your project or computer (sometimes the same)
  • Working directory: The folder R is currently “looking at”
  • Making a project in RStudio/Positron automatically sets the working directory to the project folder
  • Two types of Paths:
    • Absolute paths: Full path from the root of your computer
    • Relative paths: Path from your working directory
  • Specifying a path in R:
    • Use forward slashes / every time you go down a folder
    • Example: data/raw/MIRION_spec_1.csv

Practice

  • In our loading_the_data.R script, let’s use read_csv to load the three data sets into R

Other delimiters

  • readr provides functions for other common delimiters:
    • read_csv2(): For semicolon-separated files. Common in European countries.
    • read_tsv(): For tab-separated files.
    • read_delim(): For files with any delimiter.

Data Types in R

  • There are a variety of data types in R. Some common ones include:
  • Character: Text data (e.g., names, categories).
  • Numeric: Decimal numbers (e.g., 3.14, -2.5).
  • Integer: Whole numbers (e.g., 1, 42).
  • Logical: Boolean values (TRUE or FALSE).
  • Factor: Categorical data with a fixed set of levels (e.g., “low”, “medium”, “high”).
  • More on this later in the course!
  • For now: Understanding data types is crucial for data analysis and manipulation in R.

Parsing & Column Specification

  • readr tries to be clever and guess column types by looking at the first 1000 rows. Sometimes, it guesses wrong.
  • Common issues:
  • Leading zeros in numeric columns (R reads them as characters).
  • Mixed types in a column (e.g., numbers and text).
  • Units included in numeric columns (e.g., “10 kg”).

Manually specifying column types

You can override readr’s guesses using the col_types argument. More on this later in the course.

Other common import options

  • skip = n: Skip the first n lines (useful for files with metadata at the top).
  • comment = "#": Drop all lines that start with #.
  • col_names = FALSE: If your file has no column headers. readr will label them X1, X2, etc.
  • na = "...": Specify which strings should be treated as missing values (e.g., na = "N/A").

Example with skip

read_csv("
  This is metadata
  That we want to skip
  x,y
  1,2
", skip = 3)
Rows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): x, y

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 2
      x     y
  <dbl> <dbl>
1     1     2

Writing to a file

To save a data frame back to a file, use write_csv().

write_csv(file-name, "path-to-file.csv")

Practice with reading and writing

  • Read MIRION_spec_1.csv into R
  • Put data in tidy format
  • Save the tidy data to data/processed/MIRION_spec_tidy.csv

Saving and loading R objects

For saving intermediate R objects, write_csv() is not ideal because it loses type information (e.g., factors become characters).

It’s better to use write_rds() and read_rds() to save and load a single R object in R’s native RDS format.

write_rds(file, "path-to-file.rds")
restored_file <- read_rds("path-to-file.rds")

Other data sources

The tidyverse ecosystem has packages for many other data types:

  • readxl: For Excel files (.xls, .xlsx).
  • haven: For other statistical formats (SPSS, Stata, SAS).
  • googlesheets4: For Google Sheets.
  • Many more!

Summary

  • read_csv() is your go-to for reading delimited text files.
  • Always check the column specifications that readr prints. If they’re wrong, fix them with col_types.
  • Use write_csv() to save your data frames to CSV files for sharing.
  • Use write_rds() to save R objects for later use in R.

Wrap Up

Do Next

  1. Read Chapter 7: Data import from r4ds.
  2. No need to practice this one and that’s all for today!