forked from pmaurogut/SAE_course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathUnit_level.qmd
263 lines (196 loc) · 8.1 KB
/
Unit_level.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "Unit level models in forest inventories"
autor: "Bryce Frank and Francisco Mauro"
editor:
markdown:
wrap: 80
---
```{r, echo=FALSE, warnings=FALSE,results="hide"}
packs<-c("sf","terra","sae","nlme",
"dplyr","tidyr","ggplot2","gridExtra")
invisible(capture.output(
lapply(packs,library,character.only=TRUE,verbose=FALSE)
))
```
To work with unit level models we need plots with coordinates that we use to
extract their auxiliary information. In this case, we have already extracted two
auxiliary variables from lidar data. The lidar data was collected in 2009 and
plots were measured in 2010. The auxiliary variables are the 95th percentile of
the heights of lidar returns and the standard deviation of those heights.
The field P95 is the 95th percetile, the field SD_H is the standard deviation of
lidar heights and ID_SMA is the ID, of the stand in which the plot was
collected. On average there are 3-4 plots per stand.
```{r}
plots <- st_read("Field_plots.gpkg")
colnames(plots)[5] <- "QMD"
```
The object plots is an sf object, we create non spatial copy, called plots_df,
by removing the geometry (geom) field.
```{r}
plots_df <- plots
plots_df$geom <- NULL
head(plots_df[-c(1:3),c("Plot_ID","ID_SMA","QMD","V","P95","SD_H")])
```
Our small areas are the management units in the study area (stands). The field
ID_SMA is the identifier of the stand and matches the plots.
```{r}
stands <- st_read("Stands.gpkg")
```
Stand sample size and average sample size by stand
```{r}
n_by_stand <- plots_df |> group_by(ID_SMA) |> summarize(ni=n())
head(n_by_stand)
```
```{r}
mean(n_by_stand$ni)
```
We will start creating models for the quadratic mean diameter in the stands
(QMD). For that we will start with a small exploratory analysis. P95 and QMD
have a strong linear relationship.
```{r}
pairs(data.frame(plots)[, c("QMD", "V","P95", "SD_H")])
```
We can fit the basic unit level model using for example, the lme function from
nlme. To specify that we have stand random effects we use ,random = \~1\|ID_SMA.
```{r}
model <- lme(QMD ~ P95, random = ~ 1 | ID_SMA, data = plots)
```
A quick look at the model summary
```{r}
summary(model)
```
and to the residuals
```{r}
plot(model)
```
```{r}
par(mfrow = c(1, 2))
qqnorm(residuals(model, level = 1), main = "Normal Q-Q plot residuals")
qqline(residuals(model, level = 1))
qqnorm(model$coefficients$random$ID_SMA, main = expression("Normal Q-Q plot" ~ hat(v)[i]))
qqline(model$coefficients$random$ID_SMA)
```
The lme model can be used to obtain pixel level predictions.
```{r}
population_aux_info <- rast("Examples_SAE_IUFRO_2023.tif")
names(population_aux_info) <- c("P95", "SD_H", "ID_SMA")
preds <- predict(population_aux_info, model, level=1)
plot(preds)
```
We could aggregate these predictions using zonal stats, however, this approach
does not let us get unbiased estimates of the mse. To get point estimates of QMD
for stands and their associated uncertainty we can use the pbmseBHF function in
the sae package. This function needs five inputs
1. The model formula for the fixed effects,
2. The dom argument, the field that stores the stand identifier,
3. The meanxpop argument, a data frame with the average of the predictors
within stands. The first field is the stand identifier
4. The popnsize argument, a data frame with the stand ids in the first column
and the population sizes in the second column
5. pbmseBHF estimates the mse using parametric bootstrap, B is the number
replicates for the parametric bootstrap.
**IMPORTANT** To get 3) and 4) we need access to the auxiliary information for
the entire population. That information is in the raster file
"Examples_SAE_IUFRO_2023.tif". The first band contains P95, the second contains
SD_H and the third the stand IDs. we rename the bands so the names match the
plots data frame.
To get 3) we will use the zonal function of the terra package with the mean
function or zonal statistics (arc GIS or Q GIS).
```{r,echo = FALSE}
X_mean <- zonal(population_aux_info[[c("P95", "SD_H")]],
population_aux_info[["ID_SMA"]],
fun = mean, na.rm = TRUE
)
```
There are some unsample stands, we will only keep the ones that are sampled
```{r, echo = FALSE}
X_mean <- X_mean[X_mean$ID_SMA %in% plots$ID_SMA, c(1, 2)]
head(X_mean)
```
To get 4) we will use the zonal function of the terra package with the length
function
```{r}
Popn <- zonal(population_aux_info[["ID_SMA"]],
population_aux_info[["ID_SMA"]],
fun = length
)
Popn <- Popn[Popn$ID_SMA %in% plots$ID_SMA, ]
head(Popn)
```
Once we have 3) and 4) we can use the pbmseBHF in the sae package, attach the
plots object so pbmseBHF will find the information it needs. pnmseBHF obtain
point estimates using the basic unit level model and estimates their mean square
error using parametric bootstrap. In this case we will only do only 100 reps to
speed up the example.
```{r, echo = FALSE,results= 'hide', message=FALSE}
attach(plots_df)
invisible(capture.output(result <- pbmseBHF(QMD ~ P95, dom = ID_SMA,
meanxpop = X_mean, popnsize = Popn, B = 100)))
detach(plots_df)
```
Model fit and point estimates for stands are in the elements fit and eblup
(estimates) of the element est of result. They are stored as a list and a
data.frame respectively. Model fit:
```{r}
result$est$fit$summary
```
The column eblup stores the stand level estimates.
```{r}
head(result$est$eblup)
```
Estimated mean square errors are stored as a data.frame in the mse element of
the result. Both, estimates and mse can be merged. We can create a column with
the rmses to compute coefficients of variation. Once we merge estimates and
mses\\rmses we can get the CVs and relative errors. The field ID_SMA is renamed
"domain".
```{r, echo = TRUE}
eblups <- data.frame(result$est$eblup)
```
To generate outputs that we can share with gis users we are going to merge all
results in a data.frame
```{r}
mses <- result$mse
mses$rmse <- sqrt(mses$mse)
eblups_mse <- merge(eblups, mses, by = "domain")
eblups_mse$CV <- eblups_mse$rmse / eblups_mse$eblup
eblups_mse$RE <- 1.96*eblups_mse$CV
```
We can further merge these results with the stands and plot stand level QMD
estimates as maps.
```{r}
eblups_stands <- merge(stands, eblups_mse, by.x = "ID_SMA", by.y = "domain")
estimates_plot <- ggplot() +
geom_sf(data = eblups_stands, aes(fill = eblup), lwd = 0.5, color = "black") +
scale_fill_gradient("QMD (cm) ", low = "white", high = "darkgreen")
RE_plot <- ggplot() +
geom_sf(data = eblups_stands, aes(fill = RE), lwd = 0.5, color = "black") +
scale_fill_gradient("Rel error(%)", low = "white", high = "red", labels = scales::label_percent())
grid.arrange(estimates_plot, RE_plot, ncol = 1)
```
Or compare point estimates and uncertainties of direct estimators and eblups.
For that we combine in a data frame direct estimates and eblups and create some
helper columns.
```{r,fig.dim=c(5,5)}
direct_estimates <- group_by(plots_df,ID_SMA)|>
summarize(QMD_direct=mean(QMD),se_direct = sd(QMD)/sqrt(n()))
eblups_stands <- merge(eblups_stands,direct_estimates,by="ID_SMA")
eblups_stands$unit_lower <- eblups_stands$eblup - 1.96*eblups_stands$rmse
eblups_stands$unit_upper <- eblups_stands$eblup + 1.96*eblups_stands$rmse
eblups_stands$direct_lower <- eblups_stands$QMD_direct - 1.96*eblups_stands$se_direct
eblups_stands$direct_upper <- eblups_stands$QMD_direct + 1.96*eblups_stands$se_direct
scatter_with_whiskers <- ggplot(eblups_stands, aes(x = eblup,y = QMD_direct)) +
geom_point() + geom_errorbar(aes(ymin=direct_lower,ymax=direct_upper))+
geom_abline(intercept=0,slope=1)+xlim(10,40)+ylim(10,40)+
xlab(hat(mu)["U,i"]~(cm))+ylab(hat(mu)["D,i"]~(cm))
scatter_with_whiskers
```
We can compare mses of direct estimates and eblups as a function of the small
area sample size.
```{r, echo = FALSE, warnings=FALSE,fig.dim=c(5,5)}
error_by_n <- pivot_longer(eblups_stands,cols=c("se_direct","rmse"))
error_by_n$Method <- ifelse(error_by_n$name=="rmse","EBLUP","Direct")
ggplot(error_by_n[error_by_n$sampsize>1,], aes(x = sampsize,y = value)) +
geom_point(aes(shape=Method),color="black") +
geom_smooth(aes(lty=Method),color="black")+ theme(legend.position = "bottom")+
xlab(expression(n[i]))+ylab(expression(rmse~(cm)))
```