Strings

Lecture 12

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Introduction

Working with Strings

  • Text data, or “strings”, are very common.
  • The stringr package, part of the tidyverse, provides a modern and consistent interface for working with strings.
  • All functions in stringr start with str_.

Data: Idaho Naturalization

  • We’re going to use Naturalization data from the Idaho State Archives
  • This dataset was assembled by Dr. Rachel Miller
  • Link to Google Sheet
  • Let’s load the data

Creating Strings

Basics

  • You can create strings with single (') or double (") quotes.
  • To include a literal quote in your string, you need to “escape” it with a \.
  • Let’s work through a few examples…

Combining Strings

  • str_c() combines multiple vectors into a single character vector.
  • str_glue() from the glue package is great for embedding R code inside a string.
  • Wrap your R code in {}.
  • Let’s create a full_name and greeting column using both functions.

Extracting Data

Separating into Columns

  • tidyr::separate_wider_delim() splits a column into multiple new columns based on a delimiter.
  • You must provide names for the new columns.
df <- tibble(x = c("a_b_1", "c_d_2", "e_f_3"))
df |> separate_wider_delim(
  x,
  delim = "_",
  names = c("first", "second", "third")
)
# A tibble: 3 × 3
  first second third
  <chr> <chr>  <chr>
1 a     b      1    
2 c     d      2    
3 e     f      3    

Separating into Rows

  • tidyr::separate_longer_delim() splits a column into multiple new rows.
  • Useful when each cell contains a varying number of items.
df <- tibble(x = 1:2, y = c("a,b", "c,d,e"))
df |> separate_longer_delim(y, delim = ",")
# A tibble: 5 × 2
      x y    
  <int> <chr>
1     1 a    
2     1 b    
3     2 c    
4     2 d    
5     2 e    

Practice

Let’s extract the AR codes from the Naturalization data!

Working with Letters

Length

  • str_length() gives you the number of characters in a string.
str_length(c("a", "R for data science", NA))
[1]  1 18 NA

Subsetting

  • str_sub() extracts a part of a string.
  • You provide the start and end positions (inclusive).
  • Negative numbers count from the end of the string.
x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3)
[1] "App" "Ban" "Pea"
str_sub(x, -2, -1)
[1] "le" "na" "ar"

Case

  • str_to_lower(): convert to lowercase.
  • str_to_upper(): convert to uppercase.
  • str_to_title(): convert to title case.
str_to_lower("I am shouting.")
[1] "i am shouting."
str_to_title("a tale of two cities")
[1] "A Tale Of Two Cities"

Whitespace

  • str_trim() removes whitespace from the start and end of a string.
  • str_squish() also removes whitespace from the start and end, and reduces any internal whitespace to a single space.
text <- "  this   has  a lot of   whitespace   "
str_trim(text)
[1] "this   has  a lot of   whitespace"
str_squish(text)
[1] "this has a lot of whitespace"

Pattern Matching

  • The real power of stringr comes from pattern matching with regular expressions.
  • We’ll cover these in the next lecture!
  • Functions include:
    • str_detect(): find if a pattern exists.
    • str_count(): count the number of matches.
    • str_replace(): replace matches with a new string.
    • str_extract(): pull out the matching text.

Practice

Let’s practice working with strings!

Wrap-Up

Strings

  • The stringr package provides a consistent set of tools for working with strings.
  • Create strings with str_c() and str_glue().
  • Extract data with tidyr::separate_*() functions.
  • Manipulate letters with str_length(), str_sub(), str_to_*(), and str_trim()/str_squish().
  • The next step is to master pattern matching with regular expressions.

Do Next

  1. Read Chapter 14: Strings from r4ds.
  2. Open the Recitation Gem and say “Provide me practice problems for Chapter 14” or work through some of the exercises in the text.
  3. Move on the Lecture 13.