-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnc_example.Rmd
126 lines (94 loc) · 6.61 KB
/
nc_example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "redist 101 - NC Example"
author: "Melissa Wu"
date: "2022-09-12"
output:
html_document: default
pdf_document: default
---
# Setup
First, we load the required packages for analysis. If the packages are not installed yet, be sure to do so beforehand.
```{r setup, include=TRUE}
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
library(alarmdata)
library(redist)
library(geomander)
library(here)
library(sf)
```
# Download Simulation Data
We can access data from all 50 states in the US using the `alarmdata` package. Here, we use North Carolina as an example: `alarm_census_vest()` downloads geographic, demographic, and election data, `alarm_50state_map()` loads the `redist_map` object that contains precinct geometries and adjacencies, and `alarm_50state_plans()` downloads the `redist_plans` object that contains 5000 pre-simulated redistricting plans from the [50-State Redistricting Simulations](https://doi.org/10.48550/arXiv.2206.10763).
```{r}
data_nc <- alarm_census_vest("NC")
map_nc <- alarm_50state_map("NC")
plans_nc <- alarm_50state_plans("NC")
```
The code for how these plans were generated is available on the [50-State GitHub repository](https://github.com/alarm-redist/fifty-states/tree/main/analyses/NC_cd_2020).
Specifically in North Carolina, districts must: 1) be contiguous, 2) have equal populations, 3) be geographically compact, 4) preserve county boundaries as much as possible. Each of these redistricting requirements, in addition to compliance with the 1965 Voting Rights Act, were closely followed when conducting the simulations.
# Visualization & Analysis
We can plot some of the simulated plans to visualize the geographical boundaries of the simulated districts.
```{r}
redist.plot.plans(plans_nc, draws=1:4, shp=map_nc)
```
We can also plot the VI distance of each plan, which measures how different the simulated plans are from one another. In the following plot, the VI distances are mostly within the range 0.5-1.0, which means that the simulated plans include a diverse set of geographical arrangements.
```{r}
plan_div <- plans_diversity(plans_nc, n_max = 150)
qplot(plan_div, bins = I(40), xlab = "VI distance", main = "Plan diversity") + theme_bw()
```
We can also visualize the compactness of entire plans. The following plot shows the compactness of the simulated plans at the plan level, using the Fraction of Edges Kept measure. The compactness of the NC 2020 enacted plan is within the mid-upper range of the simulated plans, suggesting that the plans were constructed with appropriate compactness considerations.
```{r}
redist.plot.hist(plans_nc, qty = comp_edge) +
labs(x = "Fraction of Edges Kept", y = "Percentage of Plans") +
theme_bw()
```
The following plot displays the distribution of compactness across the simulated plans at the district level, using the Polsby-Popper compactness measure.
```{r}
redist.plot.distr_qtys(plans_nc, qty = comp_polsby, geom = "boxplot") +
labs(y = "Compactness: Polsby-Popper") +
theme_bw()
```
To evaluate the performance of the simulations, we can use `summary(plans)` to check for any bottlenecks or efficiency losses. Be sure to check that all R-hat values are below 1.05 and that the SMC runs have converged. For NC, 2 runs of 10,000 simulations is the typical size needed for convergence.
```{r}
summary(plans_nc)
```
We can also run `validate_analysis(plans, map)` to generate a set of useful simulation visualizations.
```{r}
# need to load
validate_analysis(plans_nc, map_nc)
```
The 50-State `redist_plans` objects include data for the 2020 enacted districting plans. However, analysts might want to compare other custom plans to the 50-State simulations. This can be done with `alarm_add_plan()`. The following example adds one of the previously proposed 2020 congressional maps for North Carolina as a reference plan to the `plans_nc` object. Many submitted maps are publicly available online on sites such as [Redistricting Data Hub](https://redistrictingdatahub.org/data/download-data/#state-menu), [All About Redistricting](https://redistricting.lls.edu/mapdownload/), or state government websites. The North Carolina enacted and proposed maps came from the [North Carolina General Assembly](https://www.ncleg.gov/Redistricting).
```{r}
# Download plan from NC General Assembly
dir.create("plans")
path_enacted <- here("plans", "SL 2021-174 Congress.shp")
if (!file.exists(path_enacted)) {
url <- "https://s3.amazonaws.com/dl.ncsbe.gov/ShapeFiles/USCongress/2021-11-04%20US_Congress_SL_2021-174.zip"
download.file(url, paste0(dirname(path_enacted), "/nc.zip"))
unzip(paste0(dirname(path_enacted), "/nc.zip"), exdir = dirname(path_enacted))
}
# Add plan to redist_map
dists <- read_sf(path_enacted)
dists <- st_transform(dists, st_crs(map_nc))
map_nc$cd_2020_ex <- as.integer(dists$DISTRICT)[geo_match(from = map_nc, to = dists, method = "area")]
# Add plan as reference using `alarm_add_plan()`
plans_nc <- plans_nc %>%
alarm_add_plan(ref_plan=map_nc$cd_2020_ex, map_nc, name = "cd_2020_ex")
```
After the old reference plan has been added, we can compare the simulation statistics to both the previously proposed plan, labeled `cd_2020_ex`, and the currently enacted plan, labeled `cd_2020`. The following code plots the number of county splits, and we can see that the current plan actually splits more counties than the previous plan.
```{r}
hist(plans_nc, county_splits) + labs(title = "County splits") + theme_bw()
```
We can compare distributions of different summary statistics across districts. The following boxplots display the distribution of Democratic vote percentage across each district. This plot shows that there are more majority-Democratic districts in the current plan than in the previously ratified plan, due to less packing of voters.
```{r}
redist.plot.distr_qtys(plans_nc, qty = pre_20_dem_bid / (pre_20_dem_bid + pre_20_rep_tru), geom = "boxplot") +
labs(y = "2020 Presidential Democratic Vote Percentage") +
theme_bw()
```
Similarly, we can visualize the minority voting age population (VAP) percentage in each district. This plot shows that voters are less packed in the highest MVAP districts, leading to more opportunity districts.
```{r}
redist.plot.distr_qtys(plans_nc, qty = (total_vap - vap_white) / (total_vap), geom = "jitter") +
labs(y = "2020 Minority VAP Percentage") +
theme_bw()
```
Overall, although the newly enacted plan split more counties than the previously proposed plan, the partisan and demographic metrics of the current plan are not among the far outliers of the simulation diagnostics, suggesting that the current plan is a more representative configuration given the specific redistricting requirements in North Carolina.