Code
::opts_chunk$set(comment = ">")
knitrlibrary(tidyverse)
library(ggrepel)
library(ggthemes)
Read in the data, clean up the names, and pivot it in a way so the first few rows look like this:
> # A tibble: 5,307 × 6
> artist track time date_entered week rank
> <chr> <chr> <time> <date> <int> <dbl>
> 1 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 1 87
> 2 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 2 82
> 3 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 3 72
> 4 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 4 77
> 5 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 5 87
> 6 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 6 94
> 7 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 7 99
> 8 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 1 91
> 9 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 2 87
> 10 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 3 92
> # ℹ 5,297 more rows
Create a variable named date
that corresponds to the week
based on the date_entered
.
if_else
basically says, “if week
is equal to 1 then assign the date_entered
value to date
. Otherwise, add the number of week
s minus 1 to the date_entered
value (i.e. for the second week you would want to add 2 - 1
or 1
week to the date_entered
value).
> # A tibble: 5,307 × 7
> artist track time date_entered week rank date
> <chr> <chr> <time> <date> <int> <dbl> <date>
> 1 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 1 87 2000-02-26
> 2 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 2 82 2000-03-04
> 3 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 3 72 2000-03-11
> 4 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 4 77 2000-03-18
> 5 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 5 87 2000-03-25
> 6 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 6 94 2000-04-01
> 7 2 Pac Baby Don't Cry (Keep... 04:22 2000-02-26 7 99 2000-04-08
> 8 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 1 91 2000-09-02
> 9 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 2 87 2000-09-09
> 10 2Ge+her The Hardest Part Of ... 03:15 2000-09-02 3 92 2000-09-16
> # ℹ 5,297 more rows
Create a dataset of the song(s) with the most weeks in the top 3 by month of 2000.
billboard_top3_month <- billboard_tidy_date |>
mutate(month = month(date),
year = year(date),
top3 = if_else(rank <= 3 & year == 2000, 1, 0)) |>
mutate(peak_weeks = sum(top3),
.by = c(month, artist, track)) |>
slice_max(peak_weeks,
by = month) |>
distinct(month, artist, track, peak_weeks) |>
arrange(month)
billboard_top3_month
month
variable from out date
variable.
top3
indicator we provide the two conditions that the rank for that observation is less than or equal to 3 AND the year is 2000. If so, top3
will be assigned a value of 1 and if not it will get a value of 0.
> # A tibble: 19 × 4
> month artist track peak_weeks
> <dbl> <chr> <chr> <dbl>
> 1 1 Aguilera, Christina What A Girl Wants 3
> 2 2 Savage Garden I Knew I Loved You 4
> 3 3 Lonestar Amazed 4
> 4 4 Hill, Faith Breathe 5
> 5 4 Santana Maria, Maria 5
> 6 5 Hill, Faith Breathe 4
> 7 5 Santana Maria, Maria 4
> 8 6 Aaliyah Try Again 2
> 9 6 Anthony, Marc You Sang To Me 2
> 10 6 Hill, Faith Breathe 2
> 11 6 Santana Maria, Maria 2
> 12 6 Vertical Horizon Everything You Want 2
> 13 7 Aaliyah Try Again 4
> 14 8 Sisqo Incomplete 4
> 15 8 matchbox twenty Bent 4
> 16 9 Janet Doesn't Really Matte... 5
> 17 10 Madonna Music 4
> 18 11 Creed With Arms Wide Open 4
> 19 12 Destiny's Child Independent Women Pa... 5
Pick one month of 2000 and visualize the entire charting trajectory of the songs that spent at least 1 week in the top 3 during that month.
billboard_top3_month_viz <- billboard_tidy_date |>
mutate(month = month(date),
year = year(date),
top3 = if_else(rank <= 3 & year == 2000, 1, 0)) |>
mutate(month_peak = ifelse(top3 == 1, month, NA),
.by = c(month, artist, track)) |>
filter(any(month_peak == 4),
.by = c(track, artist))
date_order <- billboard_top3_month_viz |>
filter(date == date("2000-05-06")) |>
arrange(desc(rank))
ggplot(billboard_top3_month_viz, aes(date, rank, group = track, color = artist)) +
annotate(geom = "rect", xmin = ymd("2000-04-01"), xmax = ymd("2000-05-01"), ymin = 0, ymax = 85,
fill = "#59a14f", alpha = 0.25) +
geom_line(show.legend = TRUE) +
geom_label_repel(data = billboard_top3_month_viz |> slice_max(date, by = track),
aes(label = track),
show.legend = FALSE) +
scale_color_manual("Artist", values = c("#4e79a7","#f28e2c","#e15759","#76b7b2")) +
labs(title = "Billboard Top 100 Trajectory for Songs that Hit Top 3 During April 2000",
x = "Date",
y = "Rank",
caption = "Note: April shaded in green") +
theme_tufte(base_size = 14) +
theme(legend.position = c(0.85, 0.85),
legend.title.align = 0.5,
legend.box.background = element_rect(colour = "black", fill = "#f6f7f9"),
plot.background = element_rect(fill = "#f6f7f9", color = "#f6f7f9"))