Homework 5

Key

Click link above for answers to homework 5.

Data

billboard_top100.csv

Instructions

Answer each of the following questions. Be sure to display all your code in the rendered version (use echo: true throughout1).

Exercises

  1. Download the billboard data set introduced in lecture (above) to the same folder where you’re saving your qmd for this homework2.
  2. Read in the data, clean up the names, and pivot it in a way so the first few rows look like this:
> # A tibble: 5,307 × 6
>    artist  track                   time   date_entered  week  rank
>    <chr>   <chr>                   <time> <date>       <int> <dbl>
>  1 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       1    87
>  2 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       2    82
>  3 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       3    72
>  4 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       4    77
>  5 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       5    87
>  6 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       6    94
>  7 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       7    99
>  8 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       1    91
>  9 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       2    87
> 10 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       3    92
> # ℹ 5,297 more rows
  1. Create a variable named date that corresponds to the week based on the date_entered3. Read the footnote for a hint but if you need help visualizing what the final dataset might look like, you can reveal another hint below.
billboard_tidy_date
> # A tibble: 5,307 × 7
>    artist  track                   time   date_entered  week  rank date      
>    <chr>   <chr>                   <time> <date>       <int> <dbl> <date>    
>  1 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       1    87 2000-02-26
>  2 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       2    82 2000-03-04
>  3 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       3    72 2000-03-11
>  4 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       4    77 2000-03-18
>  5 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       5    87 2000-03-25
>  6 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       6    94 2000-04-01
>  7 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       7    99 2000-04-08
>  8 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       1    91 2000-09-02
>  9 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       2    87 2000-09-09
> 10 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       3    92 2000-09-16
> # ℹ 5,297 more rows
  1. Create a dataset of the song(s) with the most weeks in the top 3 by month of 2000. The final dataset should look like this:
> # A tibble: 19 × 4
>    month artist              track                   peak_weeks
>    <dbl> <chr>               <chr>                        <dbl>
>  1     1 Aguilera, Christina What A Girl Wants                3
>  2     2 Savage Garden       I Knew I Loved You               4
>  3     3 Lonestar            Amazed                           4
>  4     4 Hill, Faith         Breathe                          5
>  5     4 Santana             Maria, Maria                     5
>  6     5 Hill, Faith         Breathe                          4
>  7     5 Santana             Maria, Maria                     4
>  8     6 Aaliyah             Try Again                        2
>  9     6 Anthony, Marc       You Sang To Me                   2
> 10     6 Hill, Faith         Breathe                          2
> 11     6 Santana             Maria, Maria                     2
> 12     6 Vertical Horizon    Everything You Want              2
> 13     7 Aaliyah             Try Again                        4
> 14     8 Sisqo               Incomplete                       4
> 15     8 matchbox twenty     Bent                             4
> 16     9 Janet               Doesn't Really Matte...          5
> 17    10 Madonna             Music                            4
> 18    11 Creed               With Arms Wide Open              4
> 19    12 Destiny's Child     Independent Women Pa...          5
  1. Pick one month of 2000 and visualize the entire charting trajectory of the songs that spent at least 1 week in the top 3 during that month. Hint: start with the data set created in question 3. An example of what this could look like for April is provided below.

Note: This example plot is an extremely specified and polished version of what this can look like. There are a number of ways this can look and you should not be graded nor grade lower for aesthetic features that many of you are still learning. The takeaway here is to challenge yourself to figure out the code to create something that can plot the entire trajectory of the songs that reached the top 3 in whatever month you choose. This will require a combination of the skills you’ve learned in this class thus far. Try this with the content available from lecture first. If you get stuck, there is a hint you can reveal below.

Note: This is how I did this problem but there are many approaches to every coding puzzle in R. If this skeleton code is useful, use it. If not, I’m happy to chat through your approach in office hours 🤓

Replace all instances of function, variable, and value with what you think the correct answer should be. Additional hints provided by hovering over the code annotation.

billboard_top3_month_viz <- billboard_tidy_date |> 
  mutate(month = function(variable),
         year = function(variable),
         top3 = if_else(variable <= value & variable == value, 1, 0)) |>
  mutate(month_peak = ifelse(variable > 0, variable, NA),
         .by = c(month, artist, track)) |> 
  filter(function(month_peak == "value"),
         .by = c(track, artist)) 

library(ggrepel)
ggplot(billboard_top3_month_viz, 
       aes(variable, variable, group = variable, color = variable)) +
  function +
  geom_label_repel(data = billboard_top3_month_viz |> function(variable, by = track),
                   mapping = aes(label = variable))
1
What month is associated with each row’s chart position?
2
Are there multiple years in this dataset?! Given that we’re interested in the Billboard Top 100 for 2000 it might be useful to have a variable that allows us to discriminate between years.
3
To create some indicator of top 3 status you’ll need to provide two conditions (one variable needs to be less than or equal to a certain value and another needs to equal a certain value)
4
Need to create a variable that reflects when a particular song charted in the top 3 and NA otherwise
5
In order to get the entire trajectory of a song we can’t simply filter for the month when it peaked. Then we’d only be able to plot that snippet of its trajectory. Hint: Which function returns TRUE if even 1 element it’s given is TRUE?
6
Load if you want to add labels
7
You want to visualize the ranking trajectory of a song over time. Hint: group is the variable you want to visualize.
8
What geometry would be appropriate here?
9
To properly label this plot you’ll need to subset the data, otherwise it will try to plot a label for every date available.
10
Which variable’s text are you trying to label?
Before you submit:

Have you remembered to add embed-resources: true to your YAML?

Due Dates

# Homework Due Peer Review Due
1 2 April 7 April
2 9 April 14 April
3 16 April 21 April
4 23 April 28 April
5 30 April 5 May
6 7 May 12 May
7 14 May 19 May
8 21 May 26 May
9 28 May 2 June

Footnotes

  1. You can make this a global option for your whole document by putting it directly in the YAML of your qmd:

    ---
    title: "My Document"
    execute:
      echo: true
    ---
    ↩︎
  2. If your project directory is different from the directory where you’ll be saving these two files, you should run setwd() in the console and give it a string of the file path to the folder where these two files are saved. This will allow you to run code interactively without error.↩︎

  3. For instance, if the date_entered is 1-13-2000 and week is 1, then when week is 2 date will have a value of 1-20-2000. Hint: Try using if_else() here.↩︎