Regular Expressions

Lecture 13

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Introduction

Regular Expressions (Regex)

A powerful tool for describing patterns in strings.
Used for finding, extracting, and replacing text.
stringr provides a consistent interface for working with regex.

Key Functions

`str_detect()`

str_detect() returns TRUE if a pattern is found in a string, FALSE otherwise.

x <- c("apple", "banana", "pear", "pineapple", "naartjie")
str_detect(x, "an")

[1] FALSE  TRUE FALSE FALSE FALSE

`str_count()`

str_count() counts the number of matches in a string.

str_count("banana", "a")

[1] 3

`str_extract()` and `str_extract_all()`

str_extract() extracts the first match.
str_extract_all() extracts all matches.

str_extract("banana", "an")

[1] "an"

str_extract_all("banana", "an")

[[1]]
[1] "an" "an"

`str_replace()` and `str_replace_all()`

Replaces the first or all matches with a new string.

x <- c("Dr_Eric_Friedlander", "Dr_Brandy_Wiegers", "Anthony_Campitelli")
str_replace(x, "_", " ")

[1] "Dr Eric_Friedlander" "Dr Brandy_Wiegers"   "Anthony Campitelli"

str_replace_all(x, " ", " ")

[1] "Dr_Eric_Friedlander" "Dr_Brandy_Wiegers"   "Anthony_Campitelli"

Regex

Instead of inputting literal strings, you can use regex patterns to describe more complex matches.

x <- c("Dr_Eric_Friedlander", "Dr_Brandy_Wiegers", "Anthony_Campitelli")
# remove all vowels
str_replace_all(x, "[aeiouAEIOU]", "")

[1] "Dr_rc_Frdlndr" "Dr_Brndy_Wgrs" "nthny_Cmptll"

Pattern Components

Anchors

^ matches the start of the string.
$ matches the end of the string.

x <- c("apple", "banana", "pear")
str_detect(x, "^a")

[1]  TRUE FALSE FALSE

str_detect(x, "a$")

[1] FALSE  TRUE FALSE

Character Classes

. matches any character except a newline.
\d matches any digit.
\s matches any whitespace.
[abc] matches a, b, or c.
[^abc] matches anything except a, b, or c.
| matches either the expression before or after the |.

Repetition

?: 0 or 1 time.
+: 1 or more times.
*: 0 or more times.
{n}: exactly n times.
{n,}: n or more times.
{n,m}: between n and m times.

Practice!

What do each of these do?

"^b.*a$"
"^.{5}$"
"[aeiou]"
Create a regular expression that will match telephone numbers as commonly written in your country.

Grouping and Back References

`()` for Grouping

Parentheses create a “capturing group” to extract parts of a match.

str_match("apple, banana, pear", "([a-z]+), ([a-z]+)")

     [,1]            [,2]    [,3]    
[1,] "apple, banana" "apple" "banana"

Back References

\1, \2, etc. refer to previously captured groups.

str_replace("abab", "(a)(b)", "\\2\\1")

[1] "baab"

Practice!

Exercise 6 from 15.4.7.

Other Tools

`tidyr::separate_wider_regex()`

Separates a column into multiple columns using regex with capture groups.

df <- tibble(x = "123-abc")
df |> separate_wider_regex(x, c(num = "\\d+", "-", chr = "[a-z]+"))

# A tibble: 1 × 2
  num   chr  
  <chr> <chr>
1 123   abc

`fixed()`

Use fixed() to match a literal string without interpreting it as a regex.

str_detect("a.b", fixed("."))

[1] TRUE

Practice!

Let’s extract the AR codes and the county names from the Naturalization data!

Wrap-Up

Do Next

Read Chapter 15: Regular Expression from r4ds.
Open the Recitation Gem and say “Provide me practice problems for Chapter 15” or work through some of the exercises in the text.
That it for tonight! See you tomorrow.

Regular Expressions

Introduction

Regular Expressions (Regex)

Key Functions

str_detect()

str_count()

str_extract() and str_extract_all()

str_replace() and str_replace_all()

Regex

Pattern Components

Anchors

Character Classes

Repetition

Practice!

Grouping and Back References

() for Grouping

Back References

Practice!

Other Tools

tidyr::separate_wider_regex()

fixed()

Practice!

Wrap-Up

Do Next

`str_detect()`

`str_count()`

`str_extract()` and `str_extract_all()`

`str_replace()` and `str_replace_all()`

`()` for Grouping

`tidyr::separate_wider_regex()`

`fixed()`