Roadmap

Last time, we learned:

The tidyverse
Basics of ggplot2
Advanced features of ggplot2
Extensions of ggplot2

Today, we will cover:

Code Style
Workflow
Reproducibility
Useful Base R

Code Style

Disclaimer

There are honestly no hard, fast rules about what is the correct code. You can produce all styles of code that will run and get you your desired results.

However, the following recommendations are made with these truths in mind:

Using a consistent style cuts down on the decisions you have to make, therefore allowing you to focus your attention on the substance of your code
- This makes code easier to write and your overall process more efficient!
Easy-to-read code is faster-read-code: both for any collaborators you may have and also future-you.
- This is particularly helpful when scanning code in order to troubleshoot an error

You can read more about the specific style we’re using in this class here.

Naming Objects

It’s good practice to name objects (and oftentimes variables) using only lowercase letters, numbers, and _ (to separate words).

Remember to give them descriptive names, even if that means they’re longer.

If you have many related variables, try and be consistent with your naming convention.

A common prefix is preferable to a common suffix due to RStudio’s autocomplete feature.

# Code goal: 
short_flights <- flights |> 
  filter(air_time < 60)

# Code foul: 
SHORTFLIGHTS <- flights |> 
  filter(air_time < 60)

Spacing

For readability you’ll want to put spaces around all mathematical operators¹ (i.e. +, -, ==, <, etc.) as well as the assignment operator (<-).

# Code goals: 
z <- (a + b)^2 / d
mean(x, na.rm = TRUE)

# Code foul: 
z<-( a+b ) ^ 2/d
mean (x ,na.rm=TRUE)

To make code easier to skim quickly it’s alright to add extra space for better alignment.

flights |> 
  mutate(
    speed      = distance / air_time,
    dep_hour   = dep_time %/% 100,
    dep_minute = dep_time %%  100
  )

Pipes

As you begin to use more functions, sequentially, it can start to get unclear what’s happening when, and to what.

median(sqrt(log(mean(gapminder$pop))))

With nested functions, like those above, you need to read the order of operations inside out, which is a bit awkward. It becomes even more confusing the more function calls you have, especially when they have multiple arguments each.

Enter the pipe¹: |>

Pipes read “left to right” which is much more intuitive!

gapminder$pop |> mean() |> log() |> sqrt() |> median()

The above code takes what’s on the left-hand side of |> and gives it as the first unnamed argument to the first function (mean()).
The result of that function call is then “piped” to the first unnamed argument of the second function (log())…

Pipes

As you can see, pipes allow us to “chain” many function calls together easily.

The so-called “native pipe” (i.e. built into base R) is relatively new. Before this, the pipe was a function from the magrittr package that looks like this: %>%.

This latter version continues to work but has a different functionality than the new, native pipe.

Most importantly, while both the magrittr pipe and the native pipe take the LHS (left-hand side) and “pipe” it to the RHS (right-hand side), they operate differently when it comes to explicitly specifying which argument of the RHS to pipe the LHS into.

a <- c("Z", NA, "C", "G", "A")
# magrittr pipe
a %>% gsub('A', '-', x = .)

# native pipe
a |> gsub('A','-', x = _) # _ is the placeholder for |> 
a |> gsub(pattern = 'A', replacement = '-') # leaving the "piped" argument as the only unnamed argument also works 
a |> (\(placeholder) gsub('A', '-', x = placeholder))() # using an anonymous function call allows you to be explicit while specifying your own placeholder

Pipes

Some good syntax practices:

You should always put a space before |> and it should usually be the last thing on a line.
New functions should be on a new line, indented 2 spaces (RStudio will automatically do this for you if you hit return after a pipe)
Named arguments within a function should also get their own line

# code goals
flights |>  
  group_by(tailnum) |> 
  summarize(
    delay = mean(arr_delay, na.rm = TRUE),
    n = n()
  )

# code fouls
flights |> group_by(tailnum) |> 
  summarize(delay = mean(arr_delay, na.rm = TRUE), n = n())

Selecting the native pipe

The |> is recommended over %>% simply because it’s much simpler to use and it’s always available (%>% relied on the magrittr package which was a dependency of tidyverse packages).

You’ll need to specify to R that you want to enable its usage by going to Tools > Global Options > Code. Within the “Editing” Tab there is an option to “Use native pipe operator, |>”. Check it!

Keyboard Shortcut

To insert a pipe (with spaces) quickly: Ctrl+Shift+M (Windows & Linux OS) Shift+Command+M (Mac)