Weekly Updates

2024-09-04

Week 13

  • Final project feedback
    • Use formatting thoughtfully to organize your writing and code (including removing the instructions from the template)
    • Make sure to test your code (especially if you use ChatGPT or another LLM)
    • Don’t assume your data is good and tidy data (even if it comes from a reputable source)
    • A little bit of research is a good idea!
  • More tips
    • Try using mapview::mapview()
    • Good folder organization! Use a data folder and an R folder
    • Think about the spatial units and entity model
  • Combining multiple datasets
    • Make sure they share the same CRS
    • Make sure they share valid join columns
  • ggplot2 (part 2) and ggplot2 extension packages
  • interactive mapping packages
data <- sf::st_read("path to too big dataset - email file link in Drive to Eli")

Week 12

  • Baltimore City GIS Day (see the Discord for details!)
  • Final Project Proposal Extension: Friday, Nov. 15
  • Questions
  • Lecture: Geospatial metadata and documentation
  • Practice: Labelling variables and making data dictionaries with {labelled}
  • Practice: Is this data FAIR?

Questions

Are there any tips for transforming spreadsheet data structures that might not be tidy when importing into R?

The Spreadsheet Munging Strategies book by Duncan Garmonsway and the related packages tidyxl and unpivotr.

This blog post by Sophie Bennet is a helpful resource with tips on common issues: column names in multiple rows, variables in multiple columns, data in multiple sheets, and inconsistencies between sheets.

Helpful functions for cleaning messy spreadsheet data include dplyr::coalesce() (combining values from muliple columns into one), tidyr::fill() for copying values down across rows, and tidyr::separate_wider_delim() for splitting columns.

Is there any way to label a subset of items (instead of everything) with ggplot2?

You can pass a function in place of the data argument for any geom_ function to subset your data. You can also create manual annotations with the annotation() function or use a dedicated package like gghighlight. ggrepel is a more advanced labelling package that doesn’t have dedicated sf friendly functions — but can still work.

How can a team decide which documents are most critical to maintain if they have limited resources for documentation?

Week 11

Questions

What is the source() function and how does it work?

You can use source() to load and execute R code from a file. It is helpful for including combining code in multiple files as part of a single script or document.

How is osmextract different from osmdata, on a technical level?

You can use osmdata to access the Overpass API. Use osmextract to download a prepared file from a processed OSM data provider.

Are there cheat-sheets for reading in data as well, or is it simply something you have to remember for yourself depending on what data formats you are using?

Yes! See this Posit cheatsheet on data import.

Week 10

  • Reminders
    • Exercise 5 due next Friday, Nov. 1
    • Final project proposal due Wednesday, Nov. 13
  • tidycensus

Week 9

  • Reminders
    • Exercise 5 due next Friday, Nov. 1
    • Office hours on Tuesday, Oct. 29
    • Final project proposal due Wednesday, Nov. 13
  • Lessons from McConchie and Boeing
  • Open Street Map Lecture & Practice
  • Questions

Week 9

  • Exercise 5 due next Friday, Nov. 1
  • Office hours on Tuesday, Oct. 29
  • Final project proposal due Wednesday, Nov. 13
  • Lessons from McConchie and Boeing
  • Open Street Map Lecture & Practice

Lessons McConchie (2016)

McConchie introduces the concept of map-gardening: “the editing tasks that keep OSM going… the things that happen after that fun trailblazing phase of mapping all the streets in your neighborhood.”

McConchie asks: “How do we make sure that OSM is a healthy community that has gardening, has people who enjoy maintenance?

Lessons from Boeing (2020)

Why did Boeing make OSMnx open source? Making the tool open source:

  • “makes empirical work easier to review and reproduce.”
  • “allows anyone else to contribute to the tool’s ongoing development.”
  • “empowers others working in urban science and planning to advance their empirical research on real-world spatial networks with a reusable, accessible, theoretically-sound tool.”

Why did Boeing use OpenStreetMap data?

  • Google Maps data is unavailable (or un-affordable)
  • TIGER/Line roads shapefiles don’t include topological details
  • Open Street Map is free and has global coverage

Week 4

  • Exercise 2 solutions
  • Data transformation with dplyr
  • Parsons Problems

Parsons Problems with {dplyr}

  wind > 130
storms |>
filter(
library(dplyr)
  year == 2010,
)
slice_head(n = 10)
library(dplyr)
arrange(desc(wind_load)) |>
distinct(year, name, .keep_all = TRUE) |>
storms |>
mutate(wind_load = pressure * wind^2) |>
ggplot(aes(x = category, y = avg_wind)) +
storms |>
summarise(avg_wind = mean(wind)) |>
library(tidyverse)
filter(!is.na(category)) |>
geom_point()
group_by(category) |>

Week 3

  • Exercise submission process
  • Exercise 1 solutions
  • Mapping with ggplot2
  • Data transformation with dplyr
  • Questions

Week 2

  • Syllabus updates
  • Cheat sheets
  • Check-in on exercise 1 and GitHub Classroom
  • Week 2 in-class quiz
  • Visualizing spatial data with ggplot2

Syllabus updates

  • In-class exercises
  • No class on November 27

Cheat sheets

  • RStudio
  • sf
  • ggplot2