Exercise 03

Modified

November 20, 2024

Exercise due on 2024-09-20

1 Overview

This week’s exercise comes directly from the data transformation chapter of R for Data Science. More typically, our exercises will always include spatial data but I wanted to use a more tried and tested exercise for this week’s material.

2 Setup

If you don’t already have the {nycflights13} package installed, go ahead and install it then restart before continuing with the exercise.

pak::pkg_install("nycflights13")

In addition to nycflights13, you will also need {dplyr} and {ggplot2}. Load the tidyverse library to make sure you have everything you need:

library(tidyverse)
library(nycflights13)

3 Exercises

3.1 Working with rows

In a single pipeline for each condition, find all flights that meet the condition:

  • Had an arrival delay of two or more hours
flights |> 
  ____
  • Flew to Houston (IAH or HOU)
flights |> 
  ____
  • Were operated by United, American, or Delta
flights |> 
  ____
  • Departed in summer (July, August, and September)
flights |> 
  ____
  • Arrived more than two hours late, but didn’t leave late
flights |> 
  ____
  • Were delayed by at least an hour, but made up over 30 minutes in flight
flights |> 
  ____

Sort flights to find the flights with longest departure delays. Find the flights that left earliest in the morning.

flights |> 
  arrange(____)

Sort flights to find the fastest flights. (Hint: Try including a math calculation inside of your function.)

flights |> 
  ____

Answer the following questions including code blocks showing the code used in determining your answer.

Was there a flight on every day of 2013? ____

Which flights traveled the farthest distance? ____

Which traveled the least distance? ____

Does it matter what order you used filter() and arrange() if you’re using both? Why/why not? Think about the results and how much work the functions would have to do.

____

Now is a good time to render, commit, and push your changes to GitHub with an informative commit message.

Make sure to commit and push all changed files so that your Git pane is empty afterwards.

3.2 Working with columns

Compare dep_time, sched_dep_time, and dep_delay. How would you expect those three numbers to be related?

____

Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.

select(flights, ____)

What happens if you specify the name of the same variable multiple times in a select() call?

select(flights, ____)

What does the any_of() function do? Why might it be helpful in conjunction with this vector?

variables <- c("year", "month", "day", "dep_delay", "arr_delay")

Does the result of running the following code surprise you? How do the select helpers deal with upper and lower case by default? How can you change that default?

flights |> select(contains("TIME"))

Rename air_time to air_time_min to indicate units of measurement and move it to the beginning of the data frame.

flights |> 
  rename(____)

Why doesn’t the following work, and what does the error mean?

flights |> 
  select(tailnum) |> 
  arrange(arr_delay)

Don’t forget to render, commit, and push your changes to GitHub with an informative commit message.

3.3 Working with groups

Which carrier has the worst average delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not? (Hint: think about flights |> group_by(carrier, dest) |> summarize(n()))

flights |> 
  ____

Find the flights that are most delayed upon departure from each destination.

flights |> 
  ____

How do delays vary over the course of the day. Illustrate your answer with a plot.

What happens if you supply a negative n to slice_min() and friends?

slice_min(flights, ____)

Explain what count() does in terms of the dplyr verbs you just learned. What does the sort argument to count() do?

count(flights, ____)

count(flights, ____, sort = ____)

Render, commit, and push your final changes to GitHub with a meaningful commit message.

Make sure to commit and push all changed files so that your Git pane is empty afterwards.