Lecture 5
College of Idaho
CSCI 2025 - Winter 2026
tidyr to make data tidy
pivot_longer() for lengthening datapivot_wider() for widening dataTidy data is a consistent way of structuring datasets that makes them easier to work with. A dataset is tidy if it follows three rules:
This structure is a standard in the tidyverse.
We will be using functions from the tidyr package, which is part of the tidyverse.
pivot_longer()# A tibble: 6 × 11
religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Agnostic 27 34 60 81 76 137 122
2 Atheist 12 27 37 52 35 70 73
3 Buddhist 27 21 30 34 33 58 62
4 Catholic 418 617 732 670 638 1116 949
5 Don’t kn… 15 14 15 11 10 35 21
6 Evangeli… 575 869 1064 982 881 1486 949
# ℹ 3 more variables: `$100-150k` <dbl>, `>150k` <dbl>,
# `Don't know/refused` <dbl>
This is not tidy. The income brackets are variable names, not a variable.
pivot_longer()religion, income, and count.pivot_longer() to do this.tidy_relig_income <- relig_income |>
pivot_longer(
cols = !religion,
names_to = "income",
values_to = "count"
)
tidy_relig_income |> head()# A tibble: 6 × 3
religion income count
<chr> <chr> <dbl>
1 Agnostic <$10k 27
2 Agnostic $10-20k 34
3 Agnostic $20-30k 60
4 Agnostic $30-40k 81
5 Agnostic $40-50k 76
6 Agnostic $50-75k 137
cols: The columns to pivot into longer format. !religion means all columns except religion.names_to: The name of the new column that will contain the names of the original columns.values_to: The name of the new column that will contain the values from the original columns.Let’s practice with another dataset: billboard.
pivot_wider()pivot_wider() is the opposite of pivot_longer(). It’s used when an observation is scattered across multiple rows.
# A tibble: 6 × 3
fish station seen
<fct> <fct> <int>
1 4842 Release 1
2 4842 I80_1 1
3 4842 Lisbon 1
4 4842 Rstr 1
5 4842 Base_TD 1
6 4842 BCE 1
fish_encounters has two rows for each station: one for when a fish was seen and one for when it wasn’tstation column contains variable namesseen column contains(0/1) valuespivot_wider()Goal: one row per station, with columns indicating whether fish was seen or not.
# A tibble: 6 × 12
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4842 1 1 1 1 1 1 1 1 1 1 1
2 4843 1 1 1 1 1 1 1 1 1 1 1
3 4844 1 1 1 1 1 1 1 1 1 1 1
4 4845 1 1 1 1 1 NA NA NA NA NA NA
5 4847 1 1 1 NA NA NA NA NA NA NA NA
6 4848 1 1 1 1 NA NA NA NA NA NA NA
Problem: NA values where there should be zeros, can fix this with values_fill.
pivot_wider() with values_fillfish_encounters |>
pivot_wider(
names_from = station,
values_from = seen,
values_fill = 0
) |>
head()# A tibble: 6 × 12
fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4842 1 1 1 1 1 1 1 1 1 1 1
2 4843 1 1 1 1 1 1 1 1 1 1 1
3 4844 1 1 1 1 1 1 1 1 1 1 1
4 4845 1 1 1 1 1 0 0 0 0 0 0
5 4847 1 1 1 0 0 0 0 0 0 0 0
6 4848 1 1 1 1 0 0 0 0 0 0 0
names_from: The column to get the new column names from.values_from: The column to get the cell values from.values_fill: A value to replace NAs with.Let’s practice with cms_patient_care
pivot_longer() when your column names are actually values of a variable (to make data longer and narrower).pivot_wider() when an observation is scattered across multiple rows (to make data wider and shorter).pivot_longer() and pivot_wider() when preparing data for plottingThese two functions are the foundation of data tidying in R.