Numbers

Lecture 11

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Introduction

Working with Numbers

  • Numeric data is the most common type of data you’ll work with.
  • This lecture covers creating, transforming, and summarizing numeric vectors.
  • We’ll use functions from base R and dplyr.

Creating Numbers

Parsing Numbers

  • Sometimes numbers are stored as text (strings).
  • readr::parse_double() converts strings to numbers, assuming they are just numbers.
  • readr::parse_number() is more flexible and can extract numbers from strings with other text.

Example

Let’s work with the parks dataset.

Counts

count()

  • dplyr::count() is a quick way to count the number of rows for each unique value of a variable.
  • It’s a shortcut for group_by() and summarize(n = n()).

Example

What cities appear in the parks dataset most frequently?

Weighted Counts

  • You can provide a weight to count() to sum up a variable instead of just counting rows.
  • This is useful for summarizing pre-counted data.

Example

Let’s create a new metric: best city for parks since this was created.

Practice

Let’s do some practice!

Arithmetic

Basic Operators

  • R provides the standard arithmetic operators: +, -, *, /, ^.
  • %% (remainder) and %/% (integer division) are useful for modular arithmetic.

Example

  • From the flights dataset, create new columns for the departure hour and minute.

Transformations

Logarithms

  • Logarithms (log(), log2(), log10()) are useful for data that spans multiple orders of magnitude.
  • They help turn exponential growth into linear growth, which is often easier to model

Rounding

  • round(x): rounds to the nearest integer, can specify number of digits.
  • floor(x): always rounds down.
  • ceiling(x): always rounds up.

Cutting

  • cut() divides a numeric vector into a set of discrete bins (a factor).
  • You can specify the breaks for the bins.

Practice!

Compute the average rank of each city, rounding to 2 decimal places.

Ranks & Offsets

Ranks

  • dplyr::min_rank() gives ranks, handling ties by giving them the same rank.
  • Use desc() to rank from highest to lowest.
  • row_number() is similar but gives each row a unique rank.

Offsets

  • dplyr::lag() gets the previous value in a vector.
  • dplyr::lead() gets the next value.
  • Useful for computing differences or finding changes.

Numeric Summaries

Center

  • mean(): the average value. Can be sensitive to outliers.
  • median(): the middle value. More robust to outliers.

Spread

  • sd(): standard deviation, measures how spread out the data is around the mean.
  • IQR(): interquartile range (Q3 - Q1), measures the spread of the middle 50% of the data.

Position

  • min() and max(): the minimum and maximum values.
  • quantile(x, p): finds the value that is greater than p% of the data.
    • quantile(x, 0.25) is the 25th percentile (Q1).
    • median(x) is a shortcut for quantile(x, 0.5).

Summary

Numbers

  • Create with parse_number().
  • Transform with arithmetic, logs, rounding, and cut().
  • count() is a powerful tool for quick exploration.
  • Use ranks (min_rank) and offsets (lag, lead) for more complex analysis.
  • Summarize with measures of center (mean, median), spread (sd, IQR), and position (min, max, quantile).

Do Next

  1. Read Chapter 13: Numbers from r4ds.
  2. Open the Recitation Gem and say “Provide me practice problems for Chapter 13” or work through some of the exercises in the text.
  3. That’s it! See you Monday!