Numbers

Lecture 11

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Introduction

Working with Numbers

Numeric data is the most common type of data you’ll work with.
This lecture covers creating, transforming, and summarizing numeric vectors.
We’ll use functions from base R and dplyr.

Creating Numbers

Parsing Numbers

Sometimes numbers are stored as text (strings).
readr::parse_double() converts strings to numbers, assuming they are just numbers.
readr::parse_number() is more flexible and can extract numbers from strings with other text.

Example

Let’s work with the parks dataset.

Counts

`count()`

dplyr::count() is a quick way to count the number of rows for each unique value of a variable.
It’s a shortcut for group_by() and summarize(n = n()).

Example

What cities appear in the parks dataset most frequently?

Weighted Counts

You can provide a weight to count() to sum up a variable instead of just counting rows.
This is useful for summarizing pre-counted data.

Example

Let’s create a new metric: best city for parks since this was created.

Practice

Let’s do some practice!

Arithmetic

Basic Operators

R provides the standard arithmetic operators: +, -, *, /, ^.
%% (remainder) and %/% (integer division) are useful for modular arithmetic.

Example

From the flights dataset, create new columns for the departure hour and minute.

Transformations

Logarithms

Logarithms (log(), log2(), log10()) are useful for data that spans multiple orders of magnitude.
They help turn exponential growth into linear growth, which is often easier to model

Rounding

round(x): rounds to the nearest integer, can specify number of digits.
floor(x): always rounds down.
ceiling(x): always rounds up.

Cutting

cut() divides a numeric vector into a set of discrete bins (a factor).
You can specify the breaks for the bins.

Practice!

Compute the average rank of each city, rounding to 2 decimal places.

Ranks & Offsets

Ranks

dplyr::min_rank() gives ranks, handling ties by giving them the same rank.
Use desc() to rank from highest to lowest.
row_number() is similar but gives each row a unique rank.

Offsets

dplyr::lag() gets the previous value in a vector.
dplyr::lead() gets the next value.
Useful for computing differences or finding changes.

Numeric Summaries

Center

mean(): the average value. Can be sensitive to outliers.
median(): the middle value. More robust to outliers.

Spread

sd(): standard deviation, measures how spread out the data is around the mean.
IQR(): interquartile range (Q3 - Q1), measures the spread of the middle 50% of the data.

Position

min() and max(): the minimum and maximum values.
quantile(x, p): finds the value that is greater than p% of the data.
- quantile(x, 0.25) is the 25th percentile (Q1).
- median(x) is a shortcut for quantile(x, 0.5).

Summary

Numbers

Create with parse_number().
Transform with arithmetic, logs, rounding, and cut().
count() is a powerful tool for quick exploration.
Use ranks (min_rank) and offsets (lag, lead) for more complex analysis.
Summarize with measures of center (mean, median), spread (sd, IQR), and position (min, max, quantile).

Do Next

Read Chapter 13: Numbers from r4ds.
Open the Recitation Gem and say “Provide me practice problems for Chapter 13” or work through some of the exercises in the text.
That’s it! See you Monday!