Homework 9 Key

Code

knitr::opts_chunk$set(comment = ">")

library(tidyverse)
library(ggthemes)

Answer 1

Compute the number of unique values in each column of palmerpenguins::penguins¹.

library(palmerpenguins)
data(penguins)

penguins |> summarise(across(everything(), n_distinct))

> # A tibble: 1 × 8
>   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
>     <int>  <int>          <int>         <int>             <int>       <int>
> 1       3      3            165            81                56          95
> # ℹ 2 more variables: sex <int>, year <int>

Answer 2

Compute the mean of every column in mtcars.

mtcars |> summarise(across(everything(), mean))

>        mpg    cyl     disp       hp     drat      wt     qsec     vs      am
> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625
>     gear   carb
> 1 3.6875 2.8125

Answer 3:

Group diamonds by cut, clarity, and color then count the number of observations and compute the mean of each numeric column.

diamonds |> summarise(n = n(),
                      across(where(is.numeric), mean),
                      .by = c(cut, clarity, color))

> # A tibble: 276 × 11
>    cut       clarity color     n carat depth table price     x     y     z
>    <ord>     <ord>   <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
>  1 Ideal     SI2     E       469 0.874  61.7  56.1 3891.  6.02  6.02  3.71
>  2 Premium   SI1     E       614 0.726  61.2  58.8 3363.  5.64  5.61  3.44
>  3 Good      VS1     E        89 0.681  61.6  59.2 3713.  5.49  5.52  3.39
>  4 Premium   VS2     I       315 1.24   61.3  58.9 7156.  6.70  6.67  4.09
>  5 Good      SI2     J        53 1.32   62.4  59.1 5306.  6.85  6.86  4.27
>  6 Very Good VVS2    J        29 1.10   62.4  58.3 5960.  6.34  6.37  3.96
>  7 Very Good VVS1    I        69 0.571  62.2  58.0 2056.  5.17  5.20  3.22
>  8 Very Good SI1     H       547 0.974  62.0  58.0 4934.  6.15  6.17  3.82
>  9 Fair      VS2     E        42 0.690  64.5  59.4 3042.  5.50  5.45  3.53
> 10 Very Good VS1     H       257 0.772  62.0  57.7 3750.  5.68  5.70  3.53
> # ℹ 266 more rows

Answer 4:

What happens if you use a list of functions in across(), but don’t name them? How is the output named?

airquality |> 
  summarize(
    across(Ozone:Day, list(
      \(x) median(x, na.rm = TRUE),
      \(x) sum(is.na(x))
    )),
    n = n()
  )

>   Ozone_1 Ozone_2 Solar.R_1 Solar.R_2 Wind_1 Wind_2 Temp_1 Temp_2 Month_1
> 1    31.5      37       205         7    9.7      0     79      0       7
>   Month_2 Day_1 Day_2   n
> 1       0    16     0 153

The default behavior of across if the names for multiple functions are not supplied is simply to append the variable name with a number, i.e. the first function will be {.col}_1, the second function will be {.col}_2, etc.

Answer 5:

Explain what each step of the following pipeline does. If you haven’t seen the function before, look up its help page to learn the specifics of what it does.

diamonds |>
  split(diamonds$cut) |>
  map(\(df) lm(price ~ carat, data = df)) |>
  map(summary) %>%
  map_dbl("r.squared")

1: A function from base R that does not use tidy evaluation and therefore requires base indexing with $. This step splits the diamonds dataset into different elements of a list by the cut variable
2: This map call creates an anonymous function that vectorizes over the elements of the list created in the previous step, running a linear model regressing price onto carat.
3: This map call vectorizes the summary() function on each model produced in the previous step.
4: This map_dbl call literally plucks (see map documentation for the .f argument) the "r.squared" element from each list element from the previous step and returns them as a named double vector.

>      Fair      Good Very Good   Premium     Ideal 
> 0.7383940 0.8509539 0.8581622 0.8556336 0.8670887

Footnotes

You’ll need to download the palmerpenguins package in order to use penguins dataset.↩︎