library(tidyverse) # ggplot, dplyr, %>%, and friends
library(janitor) # data cleaning tools
library(brms) # Bayesian models
library(sf) # For importing GIS shapefiles and plotting maps
library(plotly) # Interactive plots
library(tidycensus) # Downloading Census Data
library(naniar) # Visualize missing data
library(geomander) # Matching geographic areas of different sizes
library(spdep) # Identifying neighbors for the CAR models
library(tidybayes) # For Bayesian helper functions
# Custom ggplot theme to make pretty plots
# Get the News Cycle font at https://fonts.google.com/specimen/News+Cycle
theme_clean <- function() {
theme_minimal(base_family = "News Cycle") +
theme(panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold"),
strip.text = element_text(face = "bold", size = rel(1), hjust = 0),
strip.background = element_rect(fill = "grey80", color = NA),
legend.title = element_text(face = "bold"))
}
Now that the 2024 U.S. presidential election was a few months ago and we are feeling the brunt of the new administration’s actions, it is time to revisit those results to see what new information we can learn.
Hawaiʻi is a solidly blue state, but as some local news sources have reported (see for example this Civil Beat article), the vote share for Trump has risen substantially over time. In this post, I attempt to apply Bayesian Conditional Autoregressive (CAR) Models using brms to try to model the extent of support for Trump across the state among the electorate.
Why use Bayesian CAR models?
Elections are in part spatial phenomena. We know that for a variety of social and historical reasons, one’s geographic community in the U.S. tends to be correlated with one’s race/ethnicity, socioeconomic status, education level, occupation, income, access to government services, health care, and high-quality food, quality of live, and many other factors. Because of that, one’s geographic community is also correlated with one’s political views. We also know that while each community may tend to have specific political views, those political views also influence their neighboring communities and are in turn influenced by their neighbors as well.
Elections then are a measure of a given community’s underlying political views at a specific time. Since we want to know what the underlying views are, we need to create a model that takes into account the spatial (aka community-level) correlation of political views, as well as how nearby communities may influence those views as well.
To take into account the spatial correlations, we’ll use a version of CAR models called the Besag-York-Mollié model that is frequently used to model spatial incidence rates for diseases. This model includes two sets of spatial correlations. First, it models the correlation of a geographic area with all of the areas that border it. Then, it models a random effect for each geographic unit so that it captures the unique variation within each geographic unit.
But before we get too deep into the model, we need to gather the data first.
To get started we will be using these packages.
Prepping HI Election Data
The official precinct-level election results are available from the Hawaiʻi Office of Elections. The election results file contains results for all elections that happened in the state, so we want to import the data and then limit them to only the presidential results.
# import the data directly from the elections site
# need to change the encoding otherwise there will be an error
election_results <- read_csv("https://elections.hawaii.gov/wp-content/results/media.txt",
locale=locale(encoding = "UTF-16LE"),
skip = 1) %>%
# change columns to easier to use names
clean_names() %>%
# remove the final comma in the last column
mutate(in_person_votes = as.numeric(str_replace(in_person_votes,",",""))) %>%
# keep only the presidential results
filter(str_detect(contest_title,"President")==T)
head(election_results)
## # A tibble: 6 × 15
## number_precinct_name split_name precinct_split_id reg_voters ballots reporting contest_id contest_title contest_party choice_id candidate_name choice_party candidate_type mail_votes in_person_votes
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <lgl> <dbl> <chr> <lgl> <chr> <dbl> <dbl>
## 1 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 7 "(SL) DE LA CRUZ, Claudia \r\nFor PRESIDENT\r\nGARCIA, Karina \r\nFor VICE PRESIDENT" NA C 18 0
## 2 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 1 "(D) HARRIS, Kamala D. \r\nFor PRESIDENT\r\nWALZ, Tim \r\nFor VICE PRESIDENT" NA C 2748 0
## 3 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 5 "(L) OLIVER, Chase \r\nFor PRESIDENT\r\nTER MAAT, Mike \r\nFor VICE PRESIDENT" NA C 35 0
## 4 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 6 "(S) SONSKI, Peter \r\nFor PRESIDENT\r\nONAK, Lauren\r\nFor VICE PRESIDENT" NA C 4 0
## 5 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 3 "(G) STEIN, Jill \r\nFor PRESIDENT\r\nWARE, Rudolph\r\nFor VICE PRESIDENT" NA C 63 0
## 6 01-01 <NA> 1 6628 4074 1 283 President and Vice President NA 2 "(R) TRUMP, Donald J. \r\nFor PRESIDENT\r\nVANCE, JD \r\nFor VICE PRESIDENT" NA C 1160 0
Now that the data are imported, we need to remove unnecessary columns and convert one row for each candidate per precinct into one row per precinct with the results for each party.
election_results_2024 <- election_results %>%
rename(precinct_name = number_precinct_name) %>%
group_by(precinct_name,contest_id,contest_title,choice_id,candidate_name) %>%
summarise(total_reg_voters = sum(reg_voters),
total_ballots = sum(ballots),
total_votes = sum(mail_votes) + sum(in_person_votes)) %>%
ungroup() %>%
mutate(party = str_extract(candidate_name,"\\((\\w+)\\)",group=1)) %>%
select(precinct_name,contest_title,total_reg_voters:party) %>%
pivot_wider(names_from = party,
values_from = total_votes,
values_fill = 0) %>%
mutate(total_votes = D + R + L + G + S + SL) %>%
select(precinct_name,R,total_votes) %>%
# remove precincts with 0 votes
filter(total_votes > 0 ) %>%
rename(total_votes_2024 = total_votes,
R_2024 = R)
head(election_results_2024)
## # A tibble: 6 × 3
## precinct_name R_2024 total_votes_2024
## <chr> <dbl> <dbl>
## 1 01-01 1289 4240
## 2 01-02 839 2815
## 3 01-03 1144 3903
## 4 02-01 1736 6210
## 5 02-02 67 152
## 6 02-03 1578 5040
Let’s quickly look at the results for Trump as a percentage of the total votes.
election_results_2024 %>%
mutate(perc_trump = R_2024 / total_votes_2024) %>%
ggplot(aes(x=reorder(precinct_name,-perc_trump),
y = perc_trump)) +
geom_bar(stat="identity") +
theme_clean() +
theme(axis.text.x = element_blank(),
panel.grid = element_blank()) +
geom_hline(yintercept = 0.5,
linetype = 'dotted',
color = "#009E73") +
ylab("Percent Voted for Trump") +
scale_y_continuous(labels = scales::percent) +
xlab("Precinct")
We see that Trump only had 50% or more of the vote in only 31 out of 235 precincts, but had at least 25% of the vote in almost all.
To dig into this more, we need to download the election precinct shape files from the Hawaiʻi Statewide GIS Program and then link the precinct names to the shape file. Now we can create a map to show how each precinct voted.
precincts_shape_2024 <- st_read("data/Election_Precincts.shp", quiet = TRUE)
# join the election results to the shape file
precincts_results_2024 <- precincts_shape_2024 %>%
left_join(election_results_2024, by = c("dp"="precinct_name"))%>%
filter(total_votes_2024 > 0)
election_plot1 <- precincts_results_2024 %>%
mutate(`Percent Trump` = R_2024 / total_votes_2024) %>%
ggplot() +
geom_sf(aes(fill = `Percent Trump`)) +
theme_void() +
scale_fill_gradient2(labels = scales::label_percent(scale=100),
low = "#313695",
high = "#a50026",
midpoint=0.5,
mid = "#ffffbf")
ggplotly(election_plot1)