Unit_level.qmd

---
title: "Unit level models in forest inventories"
autor: "Bryce Frank and Francisco Mauro"
editor: 
  markdown: 
    wrap: 80
---

```{r, echo=FALSE, warnings=FALSE,results="hide"}
packs<-c("sf","terra","sae","nlme",
        "dplyr","tidyr","ggplot2","gridExtra")
invisible(capture.output(
  lapply(packs,library,character.only=TRUE,verbose=FALSE)
))

```

To work with unit level models we need plots with coordinates that we use to
extract their auxiliary information. In this case, we have already extracted two
auxiliary variables from lidar data. The lidar data was collected in 2009 and
plots were measured in 2010. The auxiliary variables are the 95th percentile of
the heights of lidar returns and the standard deviation of those heights.

The field P95 is the 95th percetile, the field SD_H is the standard deviation of
lidar heights and ID_SMA is the ID, of the stand in which the plot was
collected. On average there are 3-4 plots per stand.

```{r}
plots <- st_read("Field_plots.gpkg")
colnames(plots)[5] <- "QMD"

```

The object plots is an sf object, we create non spatial copy, called plots_df,
by removing the geometry (geom) field.

```{r}
plots_df <- plots
plots_df$geom <- NULL
head(plots_df[-c(1:3),c("Plot_ID","ID_SMA","QMD","V","P95","SD_H")])
```

Our small areas are the management units in the study area (stands). The field
ID_SMA is the identifier of the stand and matches the plots.

```{r}
stands <- st_read("Stands.gpkg")
```

Stand sample size and average sample size by stand

```{r}
n_by_stand  <- plots_df |> group_by(ID_SMA) |> summarize(ni=n())
head(n_by_stand)
```

```{r}
mean(n_by_stand$ni)
```

We will start creating models for the quadratic mean diameter in the stands
(QMD). For that we will start with a small exploratory analysis. P95 and QMD
have a strong linear relationship.

```{r}
pairs(data.frame(plots)[, c("QMD", "V","P95", "SD_H")])

```

We can fit the basic unit level model using for example, the lme function from
nlme. To specify that we have stand random effects we use ,random = \~1\|ID_SMA.

```{r}
model <- lme(QMD ~ P95, random = ~ 1 | ID_SMA, data = plots)
```

A quick look at the model summary

```{r}
summary(model)
```

and to the residuals

```{r}
plot(model)
```

```{r}
par(mfrow = c(1, 2))

qqnorm(residuals(model, level = 1), main = "Normal Q-Q plot residuals")
qqline(residuals(model, level = 1))

qqnorm(model$coefficients$random$ID_SMA, main = expression("Normal Q-Q plot" ~ hat(v)[i]))
qqline(model$coefficients$random$ID_SMA)
```

The lme model can be used to obtain pixel level predictions.

```{r}
population_aux_info <- rast("Examples_SAE_IUFRO_2023.tif")
names(population_aux_info) <- c("P95", "SD_H", "ID_SMA")
preds <- predict(population_aux_info, model, level=1)
plot(preds)
```

We could aggregate these predictions using zonal stats, however, this approach
does not let us get unbiased estimates of the mse. To get point estimates of QMD
for stands and their associated uncertainty we can use the pbmseBHF function in
the sae package. This function needs five inputs

1.  The model formula for the fixed effects,

2.  The dom argument, the field that stores the stand identifier,

3.  The meanxpop argument, a data frame with the average of the predictors
    within stands. The first field is the stand identifier

4.  The popnsize argument, a data frame with the stand ids in the first column
    and the population sizes in the second column

5.  pbmseBHF estimates the mse using parametric bootstrap, B is the number
    replicates for the parametric bootstrap.

**IMPORTANT** To get 3) and 4) we need access to the auxiliary information for
the entire population. That information is in the raster file
"Examples_SAE_IUFRO_2023.tif". The first band contains P95, the second contains
SD_H and the third the stand IDs. we rename the bands so the names match the
plots data frame.

To get 3) we will use the zonal function of the terra package with the mean
function or zonal statistics (arc GIS or Q GIS).

```{r,echo = FALSE}
X_mean <- zonal(population_aux_info[[c("P95", "SD_H")]],
  population_aux_info[["ID_SMA"]],
  fun = mean, na.rm = TRUE
)
```

There are some unsample stands, we will only keep the ones that are sampled

```{r, echo = FALSE}
X_mean <- X_mean[X_mean$ID_SMA %in% plots$ID_SMA, c(1, 2)]
head(X_mean)
```

To get 4) we will use the zonal function of the terra package with the length
function

```{r}
Popn <- zonal(population_aux_info[["ID_SMA"]],
  population_aux_info[["ID_SMA"]],
  fun = length
)

Popn <- Popn[Popn$ID_SMA %in% plots$ID_SMA, ]
head(Popn)
```

Once we have 3) and 4) we can use the pbmseBHF in the sae package, attach the
plots object so pbmseBHF will find the information it needs. pnmseBHF obtain
point estimates using the basic unit level model and estimates their mean square
error using parametric bootstrap. In this case we will only do only 100 reps to
speed up the example.

```{r, echo = FALSE,results= 'hide', message=FALSE}
attach(plots_df)
invisible(capture.output(result <- pbmseBHF(QMD ~ P95, dom = ID_SMA,
  meanxpop = X_mean, popnsize = Popn, B = 100)))
detach(plots_df)
```

Model fit and point estimates for stands are in the elements fit and eblup
(estimates) of the element est of result. They are stored as a list and a
data.frame respectively. Model fit:

```{r}
result$est$fit$summary
```

The column eblup stores the stand level estimates.

```{r}
head(result$est$eblup)
```

Estimated mean square errors are stored as a data.frame in the mse element of
the result. Both, estimates and mse can be merged. We can create a column with
the rmses to compute coefficients of variation. Once we merge estimates and
mses\\rmses we can get the CVs and relative errors. The field ID_SMA is renamed
"domain".

```{r, echo = TRUE}
eblups <- data.frame(result$est$eblup)
```

To generate outputs that we can share with gis users we are going to merge all
results in a data.frame

```{r}
mses <- result$mse
mses$rmse <- sqrt(mses$mse)
eblups_mse <- merge(eblups, mses, by = "domain")
eblups_mse$CV <- eblups_mse$rmse / eblups_mse$eblup
eblups_mse$RE <- 1.96*eblups_mse$CV
```

We can further merge these results with the stands and plot stand level QMD
estimates as maps.

```{r}
eblups_stands <- merge(stands, eblups_mse, by.x = "ID_SMA", by.y = "domain")
estimates_plot <- ggplot() +
  geom_sf(data = eblups_stands, aes(fill = eblup), lwd = 0.5, color = "black") +
  scale_fill_gradient("QMD (cm)    ", low = "white", high = "darkgreen")

RE_plot <- ggplot() +
  geom_sf(data = eblups_stands, aes(fill = RE), lwd = 0.5, color = "black") +
  scale_fill_gradient("Rel error(%)", low = "white", high = "red", labels = scales::label_percent())

grid.arrange(estimates_plot, RE_plot, ncol = 1)
```

Or compare point estimates and uncertainties of direct estimators and eblups.
For that we combine in a data frame direct estimates and eblups and create some
helper columns.

```{r,fig.dim=c(5,5)}
direct_estimates <- group_by(plots_df,ID_SMA)|>
  summarize(QMD_direct=mean(QMD),se_direct = sd(QMD)/sqrt(n()))

eblups_stands <- merge(eblups_stands,direct_estimates,by="ID_SMA")
eblups_stands$unit_lower <- eblups_stands$eblup - 1.96*eblups_stands$rmse
eblups_stands$unit_upper <- eblups_stands$eblup + 1.96*eblups_stands$rmse

eblups_stands$direct_lower <- eblups_stands$QMD_direct - 1.96*eblups_stands$se_direct
eblups_stands$direct_upper <- eblups_stands$QMD_direct + 1.96*eblups_stands$se_direct


scatter_with_whiskers <- ggplot(eblups_stands, aes(x = eblup,y = QMD_direct)) +
  geom_point() + geom_errorbar(aes(ymin=direct_lower,ymax=direct_upper))+
  geom_abline(intercept=0,slope=1)+xlim(10,40)+ylim(10,40)+
  xlab(hat(mu)["U,i"]~(cm))+ylab(hat(mu)["D,i"]~(cm))
scatter_with_whiskers

```

We can compare mses of direct estimates and eblups as a function of the small
area sample size.

```{r, echo = FALSE, warnings=FALSE,fig.dim=c(5,5)}

error_by_n <- pivot_longer(eblups_stands,cols=c("se_direct","rmse"))
error_by_n$Method <- ifelse(error_by_n$name=="rmse","EBLUP","Direct")
ggplot(error_by_n[error_by_n$sampsize>1,], aes(x = sampsize,y = value)) +
  geom_point(aes(shape=Method),color="black") + 
  geom_smooth(aes(lty=Method),color="black")+ theme(legend.position = "bottom")+
  xlab(expression(n[i]))+ylab(expression(rmse~(cm)))


```