Lecture 31
College of Idaho
CSCI 2025 - Winter 2026
recipes package: A tidy interface for data preprocessing.recipe Conceptrecipe(): Define the formula and data.step_*(): Add processing steps.prep(): Train the recipe (calculate means, SDs, levels, etc.).bake(): Apply the recipe to new data.recipe(formula, data).outcome (LHS) vs predictor (RHS).|>).prep(): Estimating Parametersprep() executes the recipe on the training data.bake(): Applying to Databake() applies the transformations to data.new_data = NULL to get the processed data you used to prep the recipe.# A tibble: 6 × 9
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year species
<dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 -0.895 0.780 -1.42 -0.568 -1.28 Adelie
2 -0.822 0.119 -1.07 -0.506 -1.28 Adelie
3 -0.675 0.424 -0.426 -1.19 -1.28 Adelie
4 -1.33 1.08 -0.568 -0.940 -1.28 Adelie
5 -0.858 1.74 -0.782 -0.692 -1.28 Adelie
6 -0.931 0.323 -1.42 -0.723 -1.28 Adelie
# ℹ 3 more variables: island_Dream <dbl>, island_Torgersen <dbl>,
# sex_male <dbl>
prep and bake?step_naomit(): Remove rows with NA (simple).step_impute_*(): Impute missing values (mean, median, knn).step_log(): Log transform skewed variables.step_mutate(): General mutations (similar to dplyr::mutate).step_normalize() (does both center and scale).step_dummy(all_nominal_predictors()).step_normalize() (Variable scale influences distance).step_dummy() (Must be numeric).step_dummy(): Categorical -> Numeric.step_normalize(): Center and Scale.recipe().step_*().prep() on training data.bake() to get the result.recipes from Tidy Modeling with R.