Importing Data

Lecture 6

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

This lesson

Learn how to import data into R using the readr package.
Understand how readr parses files and how to control it.
Learn how to write data from R back to a file.
Become aware of other packages for importing different file types.

The `readr` package

Part of the tidyverse, readr provides functions for reading rectangular data from delimited files (like CSVs).
It’s fast, consistent, and produces tibbles.
It provides detailed information about how files were parsed.

Setup

We’ll use functions from readr, which is loaded with the tidyverse.

library(tidyverse)

Reading delimited data

The most common function is read_csv() for comma-separated values.

students <- read_csv("path-to-file.csv")
students

read_csv() prints the column specification, which is its guess for each column’s type.

Paths and Organizing Your Computer

Think of your computer like a file cabinet
When working on a project in R, put it in a folder and make that folder a “workspace” or “project” in Positron, RStudio, or VSCode

Example

project-name/
├── project-name.code-workspace
├── README.md             <- Overview of the project and how to run it.
├── data/
│   ├── raw/              <- Original data (never edit these files).
│   └── processed/        <- Cleaned data ready for analysis (RDS or CSV).
├── R/                    <- Folder for reusable functions and source code.
├── scripts/              <- Analysis scripts (e.g., 01_cleanup.R, 02_model.R).
├── output/               <- Plots, tables, and exported results.
└── docs/                 <- Quarto or RMarkdown reports.

Where to put this

Good practice: use Github
DON’T PUT FOLDER IN DOWNLOADS!
Organize your computer
Let’s look through Dr. F’s organization
Avoid putting in Dropbox, OneDrive, Google Drive, folders if using github and writing code

Practice

Download Star Formation data from Teams
Create a new folder on your computer called StarFormation (don’t put it in your Downloads folder)
Create the following folder structure

StarFormation/
├── project-name.code-workspace 
├── README.md             
├── data/
│   ├── raw/              
│   │   ├── MIRION_meta_all_1.csv
│   │   ├── MIRION_meta_din_1.csv
│   │   └── MIRION_spec_1.csv              
│   └── processed/        
├── scripts/              
│   └── loading_the_data.R
├── output/               
└── docs/                 
    └── Tables-for-CSCI2040.txt

Paths

Root directory: Highest level folder of your project or computer (sometimes the same)
Working directory: The folder R is currently “looking at”
Making a project in RStudio/Positron automatically sets the working directory to the project folder
Two types of Paths:
- Absolute paths: Full path from the root of your computer
- Relative paths: Path from your working directory
Specifying a path in R:
- Use forward slashes / every time you go down a folder
- Example: data/raw/MIRION_spec_1.csv

Practice

In our loading_the_data.R script, let’s use read_csv to load the three data sets into R

Other delimiters

readr provides functions for other common delimiters:
- read_csv2(): For semicolon-separated files. Common in European countries.
- read_tsv(): For tab-separated files.
- read_delim(): For files with any delimiter.

Data Types in R

There are a variety of data types in R. Some common ones include:
Character: Text data (e.g., names, categories).
Numeric: Decimal numbers (e.g., 3.14, -2.5).
Integer: Whole numbers (e.g., 1, 42).
Logical: Boolean values (TRUE or FALSE).
Factor: Categorical data with a fixed set of levels (e.g., “low”, “medium”, “high”).
More on this later in the course!
For now: Understanding data types is crucial for data analysis and manipulation in R.

Parsing & Column Specification

readr tries to be clever and guess column types by looking at the first 1000 rows. Sometimes, it guesses wrong.
Common issues:
Leading zeros in numeric columns (R reads them as characters).
Mixed types in a column (e.g., numbers and text).
Units included in numeric columns (e.g., “10 kg”).

Manually specifying column types

You can override readr’s guesses using the col_types argument. More on this later in the course.

Other common import options

skip = n: Skip the first n lines (useful for files with metadata at the top).
comment = "#": Drop all lines that start with #.
col_names = FALSE: If your file has no column headers. readr will label them X1, X2, etc.
na = "...": Specify which strings should be treated as missing values (e.g., na = "N/A").

Example with `skip`

read_csv("
  This is metadata
  That we want to skip
  x,y
  1,2
", skip = 3)

Rows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): x, y

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 1 × 2
      x     y
  <dbl> <dbl>
1     1     2

Writing to a file

To save a data frame back to a file, use write_csv().

write_csv(file-name, "path-to-file.csv")

Practice with reading and writing

Read MIRION_spec_1.csv into R
Put data in tidy format
Save the tidy data to data/processed/MIRION_spec_tidy.csv

Saving and loading R objects

For saving intermediate R objects, write_csv() is not ideal because it loses type information (e.g., factors become characters).

It’s better to use write_rds() and read_rds() to save and load a single R object in R’s native RDS format.

write_rds(file, "path-to-file.rds")
restored_file <- read_rds("path-to-file.rds")

Other data sources

The tidyverse ecosystem has packages for many other data types:

readxl: For Excel files (.xls, .xlsx).
haven: For other statistical formats (SPSS, Stata, SAS).
googlesheets4: For Google Sheets.
Many more!

Summary

read_csv() is your go-to for reading delimited text files.
Always check the column specifications that readr prints. If they’re wrong, fix them with col_types.
Use write_csv() to save your data frames to CSV files for sharing.
Use write_rds() to save R objects for later use in R.

Wrap Up

Do Next

Read Chapter 7: Data import from r4ds.
No need to practice this one and that’s all for today!

Importing Data

This lesson

The readr package

Setup

Reading delimited data

Paths and Organizing Your Computer

Example

Where to put this

Practice

Paths

Practice

Other delimiters

Data Types in R

Parsing & Column Specification

Manually specifying column types

Other common import options

Example with skip

Writing to a file

Practice with reading and writing

Saving and loading R objects

Other data sources

Summary

Wrap Up

Do Next

The `readr` package

Example with `skip`