A Menagerie Of Tools

Dr. Eric Friedlander

College of Idaho
CSCI 2025 - Winter 2026

Today

Quarto
Jupyter Notebooks
Git and GitHub
Environment Mangement
AI (Maybe)

What is Quarto?

Quarto Basics

Quarto is a unified authoring framework for data science
Combines your code, its results, and your prose
Quarto documents are fully reproducible
Supports dozens of output formats:
- PDFs
- Word files
- Presentations (like this one!)
- Websites
- Books

Quarto vs R Markdown

If you’ve used R Markdown, Quarto will feel very familiar!
Quarto unifies the functionality of many R Markdown packages (rmarkdown, bookdown, distill, xaringan, etc.)
Native support for multiple languages:
- R
- Python
- Julia
- Observable
Quarto is the “next generation” of R Markdown

Quarto Files

The `.qmd` File

Quarto files are plain text files with the extension .qmd
They contain three types of content:
1. YAML header: Metadata and settings (surrounded by ---)
2. Code chunks: Blocks of code to execute
3. Markdown text: Prose with formatting
Let’s do these things together. Open a new project in Positron.

1. YAML Header

Located at the very top of the file
Surrounded by three dashes ---
Defines document metadata and options
Example:

---
title: "Diamond sizes"
date: 2024-01-14
format: html
---

2. Markdown Text

Use markdown for text formatting:
- Bold: **text**
- Italic: *text*
- Code: `code`
- Lists: - Item or 1. Item
- Headers: # Header 1, ## Header 2
- Links: [text](url)
- Images: ![caption](path/to/image)

3. Code Chunks

Chunks are where your code lives and runs
Insert a chunk:
- Keyboard shortcut: Cmd/Ctrl + Alt + I (or Cmd/Ctrl + Shift + I in some setups)
- Manually type: ```{r} and ```
Run code:
- Cmd/Ctrl + Enter (run line)
- Cmd/Ctrl + Shift + Enter (run entire chunk)

Chunk Options

Customizing Code Execution

You can control how code is executed and displayed using chunk options
Options are added at the top of the chunk with #|

Common Options

Option	Effect
`eval: false`	Code is not run (no results generated). Useful for examples.
`include: false`	Code runs, but code AND results are hidden. Good for setup chunks.
`echo: false`	Code is hidden, but results are shown. Great for reports for non-coders.
`message: false`	Hides messages (like when loading packages).
`warning: false`	Hides warnings.

Inline Code

You can embed code directly into your text
Syntax: `r code`
Useful for reporting results dynamically
Example:
- Source: We have `r nrow(diamonds)` diamonds.
- Output: “We have 53940 diamonds.”

Output Formats

Documents

HTML (format: html): The default. Great for interactivity.
PDF (format: pdf): Requires a LaTeX installation. Professional look.
Word (format: docx): Useful for collaborating with non-data scientists.

Presentations

Quarto makes creating slides easy!
revealjs (format: revealjs): HTML presentations (code, interactivity).
PowerPoint (format: pptx): Standard office slides.
Beamer (format: beamer): PDF slides using LaTeX.

Slide Structure

## (Level 2 header) starts a new slide
# (Level 1 header) starts a new section (title slide)

Rendering

To create your output file, you need to Render the .qmd file
In Positron:
- Click the Preview button (often an icon) or Render using the Render command
- Or run quarto preview in the terminal to see live updates
This executes all code and converts the markdown to your target format

Summary

Quarto is a powerful tool for combining code and prose.
Use YAML for settings, Markdown for text, and Chunks for code.
Customize chunks with options like echo, eval, and message.
Render to many formats: HTML, PDF, Slides.
Check the Quarto Documentation for more advanced features.

Jupyter Notebooks

What is Jupyter?

An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
Supports many programming languages, including Python, R, and Julia.
Widely used in data science, machine learning, and scientific computing.
Consists of notebooks that are organized into cells.
Each cell can contain either code or markdown text.
Notebooks can be run interactively, allowing you to see results immediately.

Jupyter vs. Quarto

Both are tools for combining code and prose.
Jupyter is more interactive and web-based, while Quarto is more document-focused.
Jupyter is great for exploratory data analysis, while Quarto is better for creating polished reports and presentations.
Quarto supports multiple output formats, while Jupyter primarily focuses on notebooks.

Creating a Jupyter Notebook

In Positron:
- Go to File > New File > Jupyter Notebook.
- Choose the kernel (e.g., Python, R).

Practice!

Let’s make a jupyter notebook together!

Introduction

What is Version Control?

A system that records changes to a file or set of files over time so that you can recall specific versions later.
It allows you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when.
Think of it like “track changes” in a word processor, but for all your project files (code, data, text, etc.).

Why is it useful?

Collaboration: Many people can work on the same project without overwriting each other’s changes.
Organization: Keeps a history of your project. No more final_report_v2_final_final.docx!
Backup: Your project history is stored in a remote, safe place (like GitHub). If your computer crashes, your work is not lost.
Experimentation: You can try out new ideas in a “branch” without messing up the main version of your project.

Git and GitHub

What is Git?

Git is a free and open source distributed version control system.
It’s a command-line tool that you run on your computer to track changes in your project files.
It’s the most popular version control system in the world.

What is GitHub?

GitHub is a web-based hosting service for Git repositories.
It’s a place to store your projects and collaborate with others.
It provides a graphical interface to Git, plus other features like issue tracking, project management, and more.
Git is the tool, GitHub is the service.

Authentication

No More Passwords

GitHub removed support for password authentication in 2021.
You must use a Personal Access Token (PAT) or SSH keys to push code.
If you try to use your password, you will get an authentication error.

Setting up a PAT (R/Positron Users)

The easiest way to set this up is using R helper packages:

Install helpers: install.packages(c("usethis", "gitcreds"))
Create token: usethis::create_github_token()
- This opens a browser.
- Click “Generate token”.
- Copy the token (it starts with ghp_).
Store credentials: gitcreds::gitcreds_set()
- Paste your token when asked for a password.

That’s it! You only need to do this once per computer. Your credentials are stored system-wide and will work for all your projects.

Getting Started

Cloning a Repository

“Cloning” is how you get a copy of a project from GitHub onto your computer.
You only need to do this once per project.
You’ll get a URL from the GitHub page of the repository.
This creates a new directory on your computer with all the project files.

The Core Workflow

The Add-Commit-Push Cycle

This is the most common workflow you’ll use.

Make changes: Edit your files, create new files, etc.
Add files: Tell Git which files you want to save in the next “snapshot” (commit).
Commit files: Save the snapshot with a descriptive message.
Push changes: Send your committed changes to GitHub.

`git add`

Use git add to stage your changes. Staging is the step before committing.
You can add specific files or all files with changes.

`git commit`

A “commit” is a snapshot of your staged changes.
Each commit has a unique ID and a message.
The message is important! It should be a short, descriptive summary of the changes you made.

`git push`

git push sends your committed changes from your local computer to the remote repository on GitHub.

Good Commit Messages

Keep them short and descriptive.
Use the imperative mood (e.g., “Add feature” not “Added feature”).
Explain what the change is and why you made it.

Good:

Fix typo in course syllabus
Update plot styling in HW1
Feat: Add enrollment module skeleton
Docs: Explain data cleaning process

Bad:

stuff
fixed it
aaaaaaaa

Staying in Sync

`git pull`

If you’re working with others, they will merge changes into the main branch.
git pull fetches changes from the remote repository and merges them into your local copy.
It’s a critical habit to pull from main before you start working to make sure you have the latest version.

Practice!

Let’s break up into groups. Have one person create a new repository on GitHub and add the others as collaborators:

(Person A) Create a new public repository on GitHub. Add your teamates as collaborators.
(Person A) Initialize it with a README.md file.
(Person B/C) Accept the collaborator invitation (check your email!).
(Everyone) Clone it to your computer.
(Person A ONLY) Create a quarto file, load a package or dataset and make a very plain/vanilla plot, then add, commit, and push your changes.
(Person B) pull the changes from GitHub to see Person A’s work, add a change to the plot, then add, commit, and push your changes.
(Person C) pull the changes from GitHub to see Person B’s work, add a change to the plot, then add, commit, and push your changes.
(Person A) pull the changes from GitHub to see Person C’s work.

What is a Branch?

A branch is a parallel version of your repository.
It allows you to work on new features or fixes without affecting the main codebase (the main branch).
This is essential for teamwork.

A Collaborative Workflow: The Feature Branch

For our group project, we will use a workflow that prevents accidentally breaking the main version of our app. You will not push directly to main.

The core idea is:

Create a copy of the project (a “branch”) where you can work safely.
Make all your changes on that branch.
When you’re ready, you’ll open a “Pull Request” to have your changes reviewed and merged into the main branch.

The Branching Workflow: A Full Example

Sync your local main branch:
Create a new branch for your work: For the project, your team leader (the “GitHub Sentinel”) will do this. Branch names should be descriptive, like feature/team-name.
Work on your branch: Now you do the familiar add-commit cycle.
Push your branch to GitHub: The -u flag sets the upstream branch so you can just git push next time.

Pull Requests: Proposing Changes

Now that your branch is on GitHub, you needs to ask for it to be merged into main.

What is a Pull Request?

A Pull Request (or PR) is a formal request to merge your changes.
You’re asking the project maintainer (the “Lead Architect” for your project) to “pull” your changes from your branch into the main branch.
It’s a place for code review and discussion before the changes are integrated.

How to open a Pull Request

After pushing the branch, go to the repository on GitHub.
GitHub will usually show a banner prompting you to “Compare & pull request”. Click it!
If not, go to the “Pull requests” tab and click “New pull request”.
Select your team’s branch as the “compare” branch and main as the “base” branch.
Write a clear title and description for your changes. Explain what you did.
Click “Create pull request”.

Review and Merging

Once the PR is open, the “Lead Architect” and other team members can review your code, add comments, and request changes.
If changes are requested, your team makes them on the local branch, then commits and pushes again. The PR will update automatically.
Once approved, the Lead Architect will merge your branch into main. Your work is now part of the official project!

A Note on Merge Conflicts

If you and another person edit the same lines in the same file on different branches, Git won’t know which version to keep when merging.
This is a merge conflict.
The person merging the PR (or the Sentinel, before opening it) will need to manually resolve the conflict by choosing which code to keep.
The best way to avoid conflicts: Pull from main often and communicate with your team!

After a PR is Merged

Once your team’s branch is merged by the Lead Architect, everyone should update their local repository.

Switch back to your main branch: git checkout main
Pull the latest changes (which now include the merged work): git pull
Now your main is up to date, and you can create a new branch for the next feature!

Practice!

In the repo you just created:

Ignore PRs for now, just practice branching and merging directly to main.
(Everyone) Create a new branch for your work.
(Everyone) Checkout your new branch.
(Everyone) Create a new plot in your quarto file. Don’t edit any of the existing code.
(A then B then C) Merge your changes into the main branch and push to GitHub.
(Everyone) Pull the changes. Check them out to a new branch and change one of your partners plots! Don’t simply add to it, make sure you delete somethings too…
(A then B then C) Merge yopur changes into the main branch and push to GitHub.

Environment Management

The Problem: “It works on my machine”

You share code with a colleague, but it fails on their machine.
You revisit an old project, but the code no longer runs because packages have updated.
You update a package for Project A, but it breaks Project B.

The Solution: Dependency Management

Dependency management ensures your project uses the correct package versions.
In R, renv is the standard tool for this.
Key features:
- Isolation: Each project has its own library of packages.
- Reproducibility: Records exact package versions in a lockfile.
- Portability: Easy to install the same environment on another machine.

How `renv` Works

Key Components

Project Library: Use a per-project library (renv/library) instead of the system library.
Lockfile (renv.lock): A JSON file recording the exact version and source of every package.
Activation Script (renv/activate.R): Automatically loads the project environment when you open the project.

The Workflow

Initialize: renv::init() (do this once)
Work: install.packages() and get your code working
Snapshot: renv::snapshot(), (each time you finish installing packages and get your code working)
Restore: renv::restore(), (any time you download the project)

0. Initialize

Run renv::init() to start using renv in a project.
This:
- Sets up the project library.
- Generates a lockfile (renv.lock) with currently installed packages.
- Creates an .Rprofile to activate renv on startup.
Restart R to ensure the environment is active.

1. Work

Install packages as usual:
- install.packages("dplyr")
- install.packages("ggplot2")
These go into your project library, not your global library.
Other projects are unaffected!

2. Snapshot

When your code works, save the state of your library:
- renv::snapshot()
This updates renv.lock with the versions you are currently using.
Commit renv.lock to Git!

3. Restore

When you (or a collaborator) download the project:
- Open the project (R will auto-activate renv).
- Run renv::restore().
This installs all packages exactly as specified in renv.lock.

Checking Status

Run renv::status() to see if your library matches your lockfile.
It tells you if:
- You installed packages but haven’t snapshotted them.
- Your lockfile has packages you haven’t installed yet.

Best Practices

What to Commit to Git

Do Commit:
- renv.lock
- .Rprofile
- renv/activate.R
- renv/settings.json (if it exists)
Do NOT Commit:
- renv/library (these are the installed files, which are large and platform-specific)
- renv/python (if using Python)
- renv/staging

Summary

Use renv for every serious project.
init() to start.
snapshot() to save.
restore() to load.
Enjoy reproducible data science!

Practice!

In the file you just created:

Have one person initialize renv for the project.
Practice installing new packages, adding them to the lockfile, pushing them to github, and then pulling/restoring.

AI Tools

Overview

Github Copilot
Gemini Code Assistant
Gemini CLI

A Menagerie Of Tools

Today

What is Quarto?

Quarto Basics

Quarto vs R Markdown

Quarto Files

The .qmd File

1. YAML Header

2. Markdown Text

3. Code Chunks

Chunk Options

Customizing Code Execution

Common Options

Inline Code

Output Formats

Documents

Presentations

Slide Structure

Rendering

Summary

Jupyter Notebooks

What is Jupyter?

Jupyter vs. Quarto

Creating a Jupyter Notebook

Practice!

Introduction

What is Version Control?

Why is it useful?

Git and GitHub

What is Git?

What is GitHub?

Authentication

No More Passwords

Setting up a PAT (R/Positron Users)

Getting Started

Cloning a Repository

The Core Workflow

The Add-Commit-Push Cycle

git add

git commit

git push

Good Commit Messages

Staying in Sync

git pull

Practice!

What is a Branch?

A Collaborative Workflow: The Feature Branch

The Branching Workflow: A Full Example

Pull Requests: Proposing Changes

What is a Pull Request?

How to open a Pull Request

Review and Merging

A Note on Merge Conflicts

After a PR is Merged

Practice!

Environment Management

The Problem: “It works on my machine”

The Solution: Dependency Management

How renv Works

Key Components

The Workflow

0. Initialize

1. Work

2. Snapshot

3. Restore

Checking Status

Best Practices

What to Commit to Git

Summary

Practice!

AI Tools

Overview

The `.qmd` File

`git add`

`git commit`

`git push`

`git pull`

How `renv` Works