-
Notifications
You must be signed in to change notification settings - Fork 0
/
13b_spatial regression solutions.Rmd
275 lines (178 loc) · 6.26 KB
/
13b_spatial regression solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
title: "13_spatial regression solutions"
author: "Brenna Kelly"
date: "2024-09-10"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(tmap)
library(spdep)
library(spatialreg)
library(RColorBrewer)
```
Basic spatial regression workflow:
- **Inspect** the data
- Define the **spatial structure** (neighbors) and create the **weight matrix**
- Check for spatial autocorrelation in the outcome (Moran's I)
- **Model 0:** Create an aspatial null model
-- Check for SA in the M0 residuals
-- Interpret
- **Model 1:** Create an aspatial model with covariates
-- Check for SA in the M1 residuals
-- Interpret
- **Model 2:** Choose a spatial regression method
-- Lag vs. error vs. hierarchical
-- Check for SA in the M2 spatial
-- Interpret
#### Inspect the data
Read in the shapefile for Chicago communities (location: `data/comarea/ComArea_ACS14_f.shp`). We are interested in estimating the effect of poverty `Pov200P` on the preterm birth rate `PretBrth`. Plot the preterm birth rate with a color scale in the `RColorBrewer` library (using `display.brewer.all()` to choose one).
```{r}
com <- st_read("data/comarea/ComArea_ACS14_f.shp")
hist(com$PretBrth)
summary(com$PretBrth)
cor.test(com$PretBrth, com$Pov200P)
tm_shape(com) +
tm_polygons(col = "PretBrth", palette = "RdPu",
style = "cont", lwd = 0)
```
#### Spatial structure
Does it look like there is spatial autocorrelation in the outcome? To confirm this statistically using Moran's I, we first need to define the **neighbor structure** and create the **spatial weight matrix**.
```{r}
nb <- poly2nb(com)
com_nb_listw <- nb2listw(nb)
```
Visualize the spatial structure.
```{r}
coords <- st_coordinates(st_centroid(com))
plot(com$geometry)
plot(nb, coords, add = TRUE, col = "red")
```
#### SA in outcome
Calculate **Moran's I** using `moran.test` with the arguments `alternative = "two.sided"` and `randomisation = TRUE`. Compare the results against `moran.mc`, which uses a Monte Carlo method to estimate significance with improved precision.
```{r}
moran.test(com$PretBrth, com_nb_listw,
alternative = "two.sided",
randomisation = TRUE)
moran.mc(com$PretBrth, com_nb_listw, nsim = 999,
alternative = 'greater')
moran.plot(com$PretBrth, com_nb_listw, plot = TRUE,
xlab = "Preterm Birth Rate",
ylab = "Spatially Lagged Rate")
````
#### Null aspatial model
- Create a linear model of the preterm birth rate.
- Plot the spatial structure of the residuals.
- Check for SA in the residuals.
```{r}
m0 <- lm(PretBrth ~ 1, data = com)
summary(m0)
com$res_m0 <- m0$residuals
tm_shape(com) +
tm_polygons(col = "res_m0", palette = "Spectral",
style = "cont", lwd = 0)
moran.mc(com$res_m0, com_nb_listw, nsim = 999,
alternative = 'greater')
```
#### Aspatial model with covariate
- Scale the poverty variable for the regression.
- Create a linear model of the preterm birth rate with poverty as a covariate.
- Check for structure in the residuals.
- Plot the spatial structure of the residuals.
- Check for SA in the residuals.
```{r}
com$poverty_scaled <- com$Pov200P / 10
m1 <- lm(PretBrth ~ poverty_scaled, data = com)
summary(m1)
com$res_m1 <- m1$residuals
plot(com$res_m1, m1$fitted.values)
tm_shape(com) +
tm_polygons(col = "res_m1", palette = "Spectral",
style = "cont", size = 0.4, lwd = 0)
moran.mc(com$res_m1, com_nb_listw, nsim = 999)
```
#### Spatial econometric models
- **error:** unexplained spatial variation; error which is correlated across spatial units
- **lag:** influence of neighboring values on unit values; a diffusion process
- **combination**
#### Spatial error model
- Create a spatial error model of the preterm birth rate with poverty as a covariate.
- Plot the spatial structure of the residuals.
- Check for SA in the residuals.
```{r}
m2 = errorsarlm(PretBrth ~ poverty_scaled,
data = com,
com_nb_listw)
summary(m2)
com$res_m2 <- m2$residuals
tm_shape(com) +
tm_polygons(col = "res_m2", palette = "Spectral", style = "cont", lwd = 0)
moran.mc(com$res_m2, com_nb_listw, nsim = 999)
```
#### Spatial lag model
- Create a spatial lag model of the preterm birth rate with poverty as a covariate.
- Plot the spatial structure of the residuals.
- Check for SA in the residuals.
```{r}
m3 = lagsarlm(PretBrth ~ poverty_scaled,
data = com,
com_nb_listw)
summary(m3)
com$res_m3 <- m3$residuals
tm_shape(com) +
tm_polygons(col = "res_m3", palette = "Spectral", style = "cont", lwd = 0)
moran.mc(com$res_m3, com_nb_listw, nsim = 999)
```
#### Error *and* lag
- Using `sacsarlm`, create a model which uses both lag and error structures.
- Plot the spatial structure of the residuals.
- Check for SA in the residuals.
```{r}
m4 = sacsarlm(PretBrth ~ poverty_scaled,
data = com,
com_nb_listw)
summary(m4)
com$res_m4 <- m4$residuals
tm_shape(com) +
tm_polygons(col = "res_m4", palette = "Spectral", style = "cont", lwd = 0)
moran.mc(com$res_m4, com_nb_listw, nsim = 999)
```
#### Spatial Durbin error model
Within neighbors, correlation exists between errors and between covariates.
```{r}
m5 = errorsarlm(PretBrth ~ poverty_scaled,
data = com,
com_nb_listw,
etype = "emixed")
summary(m5)
com$res_m5 <- m5$residuals
tm_shape(com) +
tm_polygons(col = "res_m5", palette = "Spectral", style = "cont", lwd = 0)
moran.mc(com$res_m5, com_nb_listw, nsim = 999)
```
#### Spatial Durbin lag model
Within neighbors, correlation between outcome (diffusion) and correlation between covariates.
Modify the `lagsarlm` ode to include the argument `type = "mixed"`. Perform a visual check and statistical test of spatial structure in the residuals.
```{r}
m6 = lagsarlm(PretBrth ~ poverty_scaled,
data = com,
com_nb_listw,
type = "mixed")
summary(m6)
com$res_m6 <- m6$residuals
tm_shape(com) +
tm_polygons(col = "res_m6", palette = "Spectral", style = "cont", lwd = 0)
moran.mc(com$res_m6, com_nb_listw, nsim = 999)
```
Which model to interpret?
```{r}
AIC(m0)
AIC(m1)
AIC(m2)
AIC(m3)
AIC(m4)
AIC(m5)
AIC(m6)
```