Lecture 4
College of Idaho
CSCI 2025 - Winter 2026
dplyrnycflights13nycflights13 package if you haven’t alreadytidyverse and nycflights13nycflights113 contains on-time data for all flights that departed NYC in 2013dplyr is a package in the tidyverse, designed for data manipulationdplyr verb:
|>) to chain together multiple verbs|>) takes the output of one expression and “pipes” it as the first argument to the next expression
dplyr verbs can be grouped into a few categories based on what they do to the data frame:
filter(): keep rows that meet certain criteria>, <, ==, !=, >=, <=, %in% for comparisons& (and), | (or), and ! (not) to combine multiple conditionsarrange(): reorder rowsdistinct(): keep only unique rowsslice_ function: slice_head(), slice_tail(), slice_sample(), slice_min(), slice_max()Let’s do the following:
filter to keep only flight in Januaryarrange to sort by dep_delay (departure delay)distinct to keep only unique carrier valuesmutate(): create new columns or modify and combine existing columnsselect(): keep only specified columnsrename(): rename columnsLet’s do the following:
mutate to create a new column speed (distance / air_time * 60)select to drop the tailnum columnselect to keep only flight, carrier, and speedrename to rename speed to avg_speedrelocate to move avg_speed to be the first columngroup_by(): specify one or more columns to group by
ungroup() to remove groupingsummarize(): compute summary statistics for each groupgroup_by to group by carriersummarize to compute the average dep_delay for each carrier