Code
::opts_chunk$set(comment = ">")
knitr
library(tidyverse)
library(ggthemes)
Compute the number of unique values in each column of palmerpenguins::penguins
1.
Compute the mean of every column in mtcars.
Group diamonds
by cut
, clarity
, and color
then count the number of observations and compute the mean of each numeric column.
> # A tibble: 276 × 11
> cut clarity color n carat depth table price x y z
> <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 Ideal SI2 E 469 0.874 61.7 56.1 3891. 6.02 6.02 3.71
> 2 Premium SI1 E 614 0.726 61.2 58.8 3363. 5.64 5.61 3.44
> 3 Good VS1 E 89 0.681 61.6 59.2 3713. 5.49 5.52 3.39
> 4 Premium VS2 I 315 1.24 61.3 58.9 7156. 6.70 6.67 4.09
> 5 Good SI2 J 53 1.32 62.4 59.1 5306. 6.85 6.86 4.27
> 6 Very Good VVS2 J 29 1.10 62.4 58.3 5960. 6.34 6.37 3.96
> 7 Very Good VVS1 I 69 0.571 62.2 58.0 2056. 5.17 5.20 3.22
> 8 Very Good SI1 H 547 0.974 62.0 58.0 4934. 6.15 6.17 3.82
> 9 Fair VS2 E 42 0.690 64.5 59.4 3042. 5.50 5.45 3.53
> 10 Very Good VS1 H 257 0.772 62.0 57.7 3750. 5.68 5.70 3.53
> # ℹ 266 more rows
What happens if you use a list of functions in across()
, but don’t name them? How is the output named?
airquality |>
summarize(
across(Ozone:Day, list(
\(x) median(x, na.rm = TRUE),
\(x) sum(is.na(x))
)),
n = n()
)
> Ozone_1 Ozone_2 Solar.R_1 Solar.R_2 Wind_1 Wind_2 Temp_1 Temp_2 Month_1
> 1 31.5 37 205 7 9.7 0 79 0 7
> Month_2 Day_1 Day_2 n
> 1 0 16 0 153
The default behavior of across
if the names for multiple functions are not supplied is simply to append the variable name with a number, i.e. the first function will be {.col}_1
, the second function will be {.col}_2
, etc.
Explain what each step of the following pipeline does. If you haven’t seen the function before, look up its help page to learn the specifics of what it does.
diamonds |>
split(diamonds$cut) |>
map(\(df) lm(price ~ carat, data = df)) |>
map(summary) %>%
map_dbl("r.squared")
R
that does not use tidy evaluation and therefore requires base indexing with $
. This step splits the diamonds
dataset into different elements of a list by the cut
variable
map
call creates an anonymous function that vectorizes over the elements of the list created in the previous step, running a linear model regressing price
onto carat
.
map
call vectorizes the summary()
function on each model produced in the previous step.
map_dbl
call literally plucks (see map
documentation for the .f
argument) the "r.squared"
element from each list element from the previous step and returns them as a named double vector.
> Fair Good Very Good Premium Ideal
> 0.7383940 0.8509539 0.8581622 0.8556336 0.8670887
You’ll need to download the palmerpenguins
package in order to use penguins
dataset.↩︎