R Basics

Lecture 1

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Before we start…

  • Make sure you have done the following:
    • Downloaded and installed R
    • Downloaded and installed Positron (or R Studio)
    • Created a Gemini Pro AI account (free for students)

Introduction to R

What is R?

  • R is a statistical programming language
    • Inspired by the S programming language developed at Bell Labs in the 1970s
    • Created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in the early 1990s for teaching statistics
    • Now one of the most popular programming languages for data science and statistics
  • R is free and open source
    • Anyone can download and use R for free
    • Anyone can contribute to the development of R and its packages

R is not the same as R Studio, R Markdown, or Positron

  • R is the programming language
  • R Studio (now Positron) is an integrated development environment (IDE) for R
    • Provides a user-friendly interface for writing and running R code

Working with R

Writing Scripts in R

  • Let’s open up a script and run some R code!
    • Running R from the console
    • Running R from a script file

R Packages

  • R has a rich ecosystem of packages that extend its functionality
    • You can install packages using install.packages("package_name") (only do this once)
    • Load packages into your R session using library(package_name) (do this every time you start a new R session)
  • An important package for this class:
    • tidyverse: A collection of packages for data manipulation, visualization, and modeling
      • ggplot2: A package for creating graphics using the Grammar of Graphics
      • dplyr: A package for data manipulation
  • Let’s install and load tidyverse

Palmer Penguins Dataset

  • Install the palmerpenguins package
  • Contains dataset on the famous Palmer Penguins: penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
  • Explore more with ?penguins

Important Vocabulary

  • Data frame: A table-like structure in R that holds data (penguins)
    • In the tidyverse data frames are referred to as tibbles
  • Variable: is a quantity, quality, or property that you can measure (e.g. species, bill_length_mm, island)
  • Observation or Data point: a single object that has been measured (e.g. one of the penguins in the data set)
  • Value: the entry for a particular variable and observation (e.g. for the first penguin, the value of species is “Adelie”, the value of bill_length_mm is 39.1)
  • Tabular Data: data organized in rows and columns (like a spreadsheet or data frame)
  • Tidy Data: tabular data where each observation is a row, each variable is a column, and each type of observational unit forms a table

Taking a quick look at penguins

  • glimpse gives a quick overview of the data frame
  • head shows the first few rows of the data frame
  • tail shows the last few rows of the data frame

Wrap up

Do Next

  1. Read the Introduction of R for Data Science
  2. Post any question you have to the HW channel in our Team.
  3. Move on to Lecture 02.