36-Reading-Spatially-Continuous-Data-IV.Rmd

---
title: "Spatially Continuous Data IV"
output: html_notebook
---

# Spatially Continuous Data IV

*NOTE*: You can download the source files for this book from [here](https://github.com/paezha/Spatial-Statistics-Course). The source files are in the format of R Notebooks. Notebooks are pretty neat, because the allow you execute code within the notebook, so that you can work interactively with the notes. 

If you wish to work interactively with this chapter you will need the following:

* An R markdown notebook version of this document (the source file).

* A package called `geog4ga3`.

## Learning objectives

In the previous practice you were introduced to the concept of variographic analysis for fields/spatially continuous data. In this practice, we will learn:

1. How to use residual spatial pattern to estimate prediction errors.
2. Kriging: a method for optimal predictions.

## Suggested reading

- Bailey TC and Gatrell AC [-@Bailey1995] Interactive Spatial Data Analysis, Chapters 5 and 6. Longman: Essex.
- Bivand RS, Pebesma E, and Gomez-Rubio V [-@Bivand2008] Applied Spatial Data Analysis with R, Chapter 8. Springer: New York.
- Brunsdon C and Comber L [-@Brunsdon2015R] An Introduction to R for Spatial Analysis and Mapping, Chapter 6, Sections 6.7 and 6.8. Sage: Los Angeles.
- Isaaks EH and Srivastava RM  [-@Isaaks1989applied] An Introduction to Applied Geostatistics, **CHAPTERS**. Oxford University Press: Oxford.
- O'Sullivan D and Unwin D [-@Osullivan2010] Geographic Information Analysis, 2nd Edition, Chapters 9 and 10. John Wiley & Sons: New Jersey.

## Preliminaries

As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in `R` to clear the workspace is `rm` (for "remove"), followed by a list of items to be removed. To clear the workspace from _all_ objects, do the following:
```{r}
rm(list = ls())
```

Note that `ls()` lists all objects currently on the workspace.

Load the libraries you will use in this activity:
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(spdep)
library(plotly)
library(gstat)
library(geog4ga3)
```

Begin by loading the data file:
```{r}
data("Walker_Lake")
```

You can verify the contents of the dataframe:
```{r}
summary(Walker_Lake)
```

## Using residual spatial pattern to estimate prediction errors

Previously, in Chapter \@ref{spatially-continuous-data-ii} we discussed how to interpolate a field using trend surface analysis; we also saw how that method may lead to residuals that are not spatially independent. 

The implication of non-random residuals is that there is systematic residual pattern that the model did not capture; This, in turn, means that there is at least _some_ information that can still be extracted from the residuals. Again, we will use the case of Walker Lake to explore one way to do this.

As before, we first calculate the polynomial terms of the coordinates to fit a trend surface to the data:
```{r}
Walker_Lake <- mutate(Walker_Lake,
                        X3 = X^3, X2Y = X^2 * Y, X2 = X^2, 
                        XY = X * Y,
                        Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
```

Given the polynomial expansion, we can proceed to estimate the following cubic trend surface model, which we already know provided the best fit to the data:
```{r}
WL.trend3 <- lm(formula = V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, 
                data = Walker_Lake)
summary(WL.trend3)
```

We can next visualize the residuals which, as you can see, do not appear to be random 
```{r}
plot_ly(x = ~Walker_Lake$X, y = ~Walker_Lake$Y, z = ~WL.trend3$residuals, color = ~WL.trend3$residuals < 0, colors = c("blue", "red"), type = "scatter3d")
```

Now we will create an interpolation grid:
```{r}
# The function `sequence()` create a sequence of values from - to using by as the step increment. In this case, we generate a grid with points that are 2.5 m apart.
X.p <- seq(from = 0.1, to = 255.1, by = 2.5)
Y.p <- seq(from = 0.1, to = 295.1, by = 2.5)
df.p <- expand.grid(X = X.p, Y = Y.p)
```

WE can add the polynomial terms to this grid. Since our trend surface model was estimated using the cubic polynomial, we add those terms to the dataframe:
```{r}
df.p <- mutate(df.p, X3 = X^3, X2Y = X^2 * Y, X2 = X^2, 
               XY = X * Y, 
               Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
```

The interpolated cubic surface is obtained by using the model and the interpolation grid as `newdata`:
```{r}
# The function `predict()` is used to make predictions given a model and a possibly new dataset, different from the one used for estimation of the model.
WL.preds3 <- predict(WL.trend3, newdata = df.p, se.fit = TRUE, interval = "prediction", level = 0.95)
```

The surface is converted into a matrix for 3D plotting:
```{r}
z.p3 <- matrix(data = WL.preds3$fit[,1], nrow = length(Y.p), ncol = length(X.p), byrow = TRUE)
```

And plot:
```{r}
WL.plot3 <- plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p3, 
        type = "surface", colors = "YlOrRd") %>% 
  layout(scene = list(
    aspectmode = "manual", aspectratio = list(x = 1, y = 1, z = 1)))
WL.plot3
```

The trend surface provides a smooth estimate of the field. However, it is not sufficient to capture all systematic variation, and fails to produce random residuals.

A possible way of enhancing this approach to interpolation is to _exploit_ the information that remains in the residuals, for instance by the use of $k$-point means.

We  can illustrate this as follows. To interpolate the _residuals_, we first need the set of _target_ points (the points for the interpolation), as well as the _source_ (the observations):
```{r}
# We will use the prediction grid we used above to interpolate the residuals 
target_xy = expand.grid(x = X.p, y = Y.p)
source_xy = cbind(x = Walker_Lake$X, y = Walker_Lake$Y)
```

It is possible now to use the `kpointmean` function to interpolate the residuals, for instance using $k=5$ neighbors:
```{r}
kpoint.5 <- kpointmean(source_xy = source_xy, z = WL.trend3$residuals, target_xy = target_xy, k = 5)
```

Given the interpolated residuals, we can join them to the cubic trend surface, as follows:
```{r}
z.p3 <- matrix(data = WL.preds3$fit[,1] + kpoint.5$z,
               nrow = length(Y.p), ncol = length(X.p), byrow = TRUE)
```

This is now the interpolated field that combines the trend surface and the estimated residuals:
```{r}
WL.plot3 <- plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p3, 
        type = "surface", colors = "YlOrRd") %>% 
  layout(scene = list(
    aspectmode = "manual", aspectratio = list(x = 1, y = 1, z = 1)))
WL.plot3
```

Of all the approaches that we have seen so far, this is the first that provides a genuine estimate of the following:
$$
\hat{z}_p + \hat{\epsilon}_p
$$

With trend surface analysis providing a smooth estimator of the underlying field:
$$
\hat{z}_p = f(x_p, y_p)
$$

And $k$-point means providing an estimator of:
$$
\hat{\epsilon}_p
$$

A question is how to decide the number of neighbors to use in the calculation of the $k$-point means. As previously discussed, $k$=1 becomes identical to Voronoi polygons, and $k = n$ becomes the global mean.

A second question concerns the way the average is calculated. As variographic analysis demonstrates, it is possible to estimate the way in which spatial dependence weakens with distance. Why should more distant points be weighted equally? The answer is, there is no reason why they should, and in fact, variographic analysis elegantly solves this, as well the question of how many points to use: all of them, with varying weights.

Next, we will introduce kriging, a method for optimal prediction that is based on the use of variographic analysis.

## Kriging: a method for optimal prediction.

To introduce the method known as kriging, we will begin by positing a situation as follows:

$$
\hat{z}_p + \hat{\epsilon}_p = \hat{f}(x_p, y_p) + \hat{\epsilon}_p
$$

where $\hat{f}(x_p, y_p)$ is a smooth estimator of an underlying field.

We aim to predict $\hat{\epsilon}_p$ based on the observed residuals. We use an expression similar to the one used for IDW and $k$-point means in Chapter \@ref{spatially-continuous-data-i} (we will use $\lambda$ for the weights to avoid confusing the the weights in variographic analysis):

$$
\hat{\epsilon}_p = \sum_{i=1}^n {\lambda_{pi}\epsilon_i}
$$

That is, $\hat{\epsilon}_p$ is a linear combination of the prediction residuals from the trend:
$$
\epsilon_i = z_i - \hat{f}(x_i, y_i)
$$

It is possible to define the following _expected mean squared error_, or _prediction variance_:
$$
\sigma_{\epsilon}^2 = E[(\hat{\epsilon}_p - \epsilon_i)^2]
$$

The prediction variance measures how close, on average, the prediction error is to the residuals.

The prediction variance can be decomposed as follows:
$$
\sigma_{\epsilon}^2 = E[\hat{\epsilon}_p] + E[\epsilon_i] - 2E[\hat{\epsilon_i\epsilon}_p]
$$

It turns out (we will not show the detailed derivation, but it can be consulted [here](https://msu.edu/~ashton/classes/866/papers/gatrell_ordkrige.pdf)), that the expression for the prediction variance depends on the weights:
$$
\sigma_{\epsilon}^2 = \sum_{i=1}^n \sum_{j=1}^n{\lambda_{ip}\lambda_{jp}C_{ij}} + \sigma^2 + 2\sum_{i=1}^{n}{\lambda_{ip}C_{ip}}
$$
where $C_{ij}$ is the autocovariance between observations at $i$ and $j$, and $C_{ip}$ is the autocovariance between the observation at $i$ and prediction location $p$.

Fortunately for us, the semivariogram and the autocovariance is straightforward:
$$
C_{z}(h) =\sigma^2 - \hat{\gamma}_{z}(h)
$$

This means that, given the distance $h$ between $i$ and $j$, and $i$ and $p$, we can use a semivariogram to obtain the autocovariances needed to calculate the prediction variance. We are still missing, however, the weights $\lambda$, which are not known a priori.

These weights can be obtained if we use the following rules:

> The expectation of the prediction errors is zero (unbiassedness)
> Find the weights $lambda$ that minimize the prediction variance (optimal estimator).

This makes sense, since we would like our predictions to be unbiased (i.e., accurate) and as precise as possible, that is, to have the smallest variance (recall the discussion about accuracy and precision in Chapter \@ref{spatially-continuous-data-iii}).

Again, solving the minimization problem is beyond the scope of our presentation, but it suffices to say that the result is as follows:

$$
\mathbf{\lambda}_p = \mathbf{C}^{-1}\mathit{c}_{p}
$$

where $\mathbf{C}$ is the covariance matrix, and $\mathit{c}_{p}$ is the covariance vector for location $p$.

In summary, kriging is a method to optimally estimate the value of a variable at $p$ as a weighted sum of the observations of the same variable at locations $i$. This method is known to have the properties of **B**est (in the sense that it minimizes the variance) **L**inear (because predictions are a linear combination of weights) **U**nbiased (since the estimators of the prediction errors are zero) **P**redictor, or **BLUP**.

Kriging is implemented in the package `gstat` as follows.

To put kriging to work we must first conduct variographic analysis of the residuals. The function `variogram` uses as an argument a simple features object that we can create as follows:
```{r}
Walker_Lake.sf <- Walker_Lake %>%
  st_as_sf(coords = c("X", "Y"),
           remove = FALSE) # Remove set to false to retain the X and Y coordinates in the dataframe after they are converted to simple features
```

The variogram of the residuals can be obtained by specifying a trend surface in the formula:
```{r}
variogram_v <- variogram(V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, 
                         data = Walker_Lake.sf)
ggplot(data = variogram_v, aes(x = dist, y = gamma)) +
  geom_point() + 
  geom_text(aes(label = np), nudge_y = -1500) + # Nudge the labels away from the points
  xlab("Distance") + ylab("Semivariance")
```

You can verify that the semivariogram above corresponds to the residuals by repeating the analysis directly on the residuals. First join the residuals to the `SpatialPointsDataFrame`:
```{r}
Walker_Lake.sf$e <- WL.trend3$residuals
```

And then calculate the semivariogram and plot:
```{r}
variogram_e <- variogram(e ~ 1, 
                         data = Walker_Lake.sf)
ggplot(data = variogram_e, aes(x = dist, y = gamma)) +
  geom_point() + 
  geom_text(aes(label = np), nudge_y = -1500) +
  xlab("Distance") + ylab("Semivariance")
```

The empirical semivariogram is used to estimate a semivariogram function:
```{r}
variogram_v.t <- fit.variogram(variogram_v, model = vgm("Exp", "Sph", "Gau"))
variogram_v.t
```

The variogram function plots as follows:
```{r}
gamma.t <- variogramLine(variogram_v.t, maxdist = 130)
ggplot(data = variogram_v, aes(x = dist, y = gamma)) +
  geom_point(size = 3) + 
  geom_line(data = gamma.t, aes(x = dist, y = gamma)) +
  xlab("Distance") + ylab("Semivariance")
```

We will convert the prediction grid to a simple features object:
```{r}
df.sf <- df.p %>%
  st_as_sf(coords = c("X", "Y"),
           remove = FALSE)
```

Then, we can krige the field as follows (ensure that packages `sf` and `stars` are installed):
```{r}
V.kriged <- krige(V ~ X3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3,
              Walker_Lake.sf, df.sf, variogram_v.t)
```

Extract the predictions and prediction variance from the object `V.kriged`:
```{r}
V.km <- matrix(data = V.kriged$var1.pred,
               nrow = 119, ncol = 103, byrow = TRUE)
V.sm <- matrix(data = V.kriged$var1.var,
               nrow = 119, ncol = 103, byrow = TRUE)
```

We can now plot the interpolated field:
```{r}
V.km.plot <- plot_ly(x = ~X.p, y = ~Y.p, z = ~V.km, 
        type = "surface", colors = "YlOrRd") %>% 
  layout(scene = list(
    aspectmode = "manual", aspectratio = list(x = 1, y = 1, z = 1)))
V.km.plot
```

Also, we can plot the kriging standard errors (the square root of the prediction variance). This gives an estimate of the uncertainty in the predictions:
```{r}
V.sm.plot <- plot_ly(x = ~X.p, y = ~Y.p, z = ~sqrt(V.sm), 
        type = "surface", colors = "YlOrRd") %>% 
  layout(scene = list(
    aspectmode = "manual", aspectratio = list(x = 1, y = 1, z = 1)))
V.sm.plot
```

Where are predictions more/less precise?