Areal data and creating high-quality graphics

Prof Ron Yurko

2024-09-30

Reminders, previously, and today…

  • Infographic draft is due Wednesday night! (more details later today…)

  • Your EDA report is due Friday Oct 4th by 11:59 PM ET (1 per group)

  • No lecture on Wednesday! But I will have virtual office hours during class time

  • Wrapped up basics of time series data

  • Introduction to spatial data and the different types

  • Walked through visualizing point-reference data

TODAY:

  • Visualizations for areal data

  • Discuss making high-quality graphics

Thinking about areal data

  • Areal Data: Geographic regions associated with one or more variables specific to those regions

  • Areal data will have the following form (example US states data from 1970s):

state_data |> dplyr::slice(1:3)
# A tibble: 3 × 9
  Population Income Illiteracy `Life Exp` Murder `HS Grad` Frost   Area state  
       <dbl>  <dbl>      <dbl>      <dbl>  <dbl>     <dbl> <dbl>  <dbl> <chr>  
1       3615   3624        2.1       69.0   15.1      41.3    20  50708 alabama
2        365   6315        1.5       69.3   11.3      66.7   152 566432 alaska 
3       2212   4530        1.8       70.6    7.8      58.1    15 113417 arizona

High-level overview of steps

  • Need to match the region with the actual geographic boundaries

  • Many geographic boundaries/features are stored as “shapefiles”

    • i.e., complicated polygons
  • Can contain the lines, points, etc. to represent any geographic feature

  • Shapefiles are readily available for countries, states, counties, etc.

Access shapefiles using map_data()

library(maps)
state_borders <- map_data("state") 
head(state_borders)
       long      lat group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>
  • For example: map_data("world"), map_data("state"), map_data("county") (need to install maps package)

  • Contains lat/lon coordinates to draw geographic boundaries

Typica workflow for plotting areal data

  1. Get state-specific data

  2. Get state boundaries

  3. Merge state-specific data with state boundaries (using left_join())

state_plot_data <- state_borders |>
  left_join(state_data, by = c("region" = "state"))
head(state_plot_data)
       long      lat group order  region subregion Population Income Illiteracy
1 -87.46201 30.38968     1     1 alabama      <NA>       3615   3624        2.1
2 -87.48493 30.37249     1     2 alabama      <NA>       3615   3624        2.1
3 -87.52503 30.37249     1     3 alabama      <NA>       3615   3624        2.1
4 -87.53076 30.33239     1     4 alabama      <NA>       3615   3624        2.1
5 -87.57087 30.32665     1     5 alabama      <NA>       3615   3624        2.1
6 -87.58806 30.32665     1     6 alabama      <NA>       3615   3624        2.1
  Life Exp Murder HS Grad Frost  Area
1    69.05   15.1    41.3    20 50708
2    69.05   15.1    41.3    20 50708
3    69.05   15.1    41.3    20 50708
4    69.05   15.1    41.3    20 50708
5    69.05   15.1    41.3    20 50708
6    69.05   15.1    41.3    20 50708
  1. Plot the data

Create a choropleth map with geom_polygon()

state_plot_data |>
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group, fill = Illiteracy), 
               color = "black") + 
  scale_fill_gradient2(low = "darkgreen", mid = "lightgrey", 
                       high = "darkorchid4", midpoint = 0.95) +
  theme_void() +
  coord_map("polyconic") + 
  labs(fill = "Illiteracy %") + 
  theme(legend.position = "bottom")

Create a choropleth map with geom_polygon()

Uniform size with statebins

library(statebins)
state_data$new_state <- str_to_title(state_data$state)
statebins(state_data = state_data, 
          state_col = "new_state", value_col = "Illiteracy") +
  theme_statebins()

Many choices for displaying maps…

Visual randomization test

Visual randomization test

Infographics draft and feedback assignment

  • You turn in via Gradescope and email a single page PDF draft of your infographic to your assigned partner with myself cc’ed by 11:59 PM Wednesday night (no code is necessary for this draft)

  • For only this draft submission, you are allowed to use something like google slides or powerpoint to create your draft PDF

  • Detailed grading rubric for your final submission (due Oct 11th by 11:59 PM ET) is posted on Canvas

  • You must provide feedback to your assigned infographics partner via email (see emails I sent this morning) by Saturday night 11:59 PM ET and turn in via Gradescope as well

  • Feedback template is available on Canvas (and is 10% of your grade!)

Creating compound figures

Two different scenarios we may face:

  1. Creating the same type of plot many times
  • e.g., using facet_wrap() or facet_grid()
  1. Combining several distinct plots into one cohesive display

Creating the same type of plot many times

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5) +
  facet_wrap(~species) +
  theme_light()

Creating the same type of plot many times

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5) +
  facet_grid(island ~ species) +
  theme_light()

Creating a single cohesive display of multiple plots

plot1 <- penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5)
plot1

Creating a single cohesive display of multiple plots

plot2 <- penguins |>
  ggplot(aes(x = species, y = bill_depth_mm)) +
  geom_violin(alpha = 0.5)
plot2

Using cowplot to arrange plots together

library(cowplot)
plot_grid(plot1, plot2)

Using cowplot to arrange plots together

library(cowplot)
plot_grid(plot1, plot2, labels = c('A', 'B'), label_size = 12)

Using patchwork to arrange plots together

library(patchwork)
plot1 + plot2

Using patchwork to arrange plots together

plot1 / plot2

Using patchwork to arrange plots together

plot1 / plot2 + plot_annotation(tag_levels = "A")

Using patchwork to arrange plots together

plot3 <- penguins |>
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm,
             color = species)) +
  geom_point(alpha = 0.5)
plot4 <- penguins |>
  ggplot(aes(x = bill_length_mm, y = body_mass_g,
             color = species)) +
  geom_point(alpha = 0.5)
(plot1 + plot2) / (plot3 + plot4) + plot_layout(guides = 'collect')

Using patchwork to arrange plots together

Using patchwork to arrange plots together

(plot1 + plot2) / (plot3 + plot4) + plot_layout(guides = 'collect') +
  plot_annotation(tag_levels = "A")

Using patchwork to arrange plots together

(plot1 + plot2) / (plot3 + plot4) + plot_layout(guides = 'collect') +
  plot_annotation(tag_levels = "A", title = "A plot about penguins",
                  subtitle = "With subtitle...", caption = "...and caption")

Infographics vs figures in papers/reports

  • Infographics should standalone, thus they must have a title along with a relevant subtitle and caption (located within the plot)

Infographics vs figures in papers/reports

  • Figures in papers/reports will have captions containing the information from the standalone title/subtitle/caption, see example:

Figure 1. Corruption and human development. The most developed countries experience the least corruption. Data sources: Transparency International & UN Human Development Report.

Thinking about themes…

See posted demo walking through color scales and customizing themes

Default choices tend to treat each element with equal weight, e.g., axes stand out as much as the data or background elements look the same as the points of emphasis

You want to design your plot with the visual hierarchy in mind:

  • Make elements of your plot that are more important look more important!

  • i.e., customize your plot so that the data is the focus, not the axes and grid lines!

  • Match visual weight to focus of the graphic you want to communicate

I tend to use theme_bw() or theme_light(), but there are other options from various packages such as ggthemes

Using patchwork to arrange plots together

(plot1 + plot2) / (plot3 + plot4) + plot_layout(guides = 'collect') +
  plot_annotation(tag_levels = "A", title = "A plot about penguins",
                  subtitle = "With subtitle...", caption = "...and caption") & 
  theme_minimal_grid()

Annotation

  • Using text can be a great way to highlight and explain aspects of a visualization when you’re not there to explain it

  • annotate() is an easy way to add text to ggplot objects or add rectangle layers for highlighting displays

mtcars |>
  ggplot(aes(x = wt, y = mpg)) + 
  geom_point() + 
  annotate("text", x = 4, y = 25, label = "Some text") +
  annotate("rect", xmin = 3, xmax = 4.2, ymin = 12, ymax = 21, alpha = .2)

Annotation tools

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width)) +
  geom_mark_rect(aes(fill = Species, label = Species)) +
  geom_point()

Saving plots

  • Default function for saving the last ggplot you created is ggsave

  • I tend to use the save_plot() function from cowplot since it has easier customization for handling panels of multiple figures

Recap and next steps