02-spatial-operations.Rmd

```{r, echo=FALSE, purl=FALSE, message=FALSE}
knitr::opts_chunk$set(results='hide', comment = "#>", purl = FALSE)

## libraries needed for R code examples
library(sf)
library(terra)
library(tidyverse)

philly_sf <- st_read("data/Philly/")
HARV <- rast("data/HARV_RGB_Ortho.tif")
philly_homicides_sf <- st_read("data/PhillyHomicides/")
```

# Spatial data manipulation in R {#spatialops}

> Learning Objectives
>
> * Join attribute data to a polygon vector file
> * Reproject a vector file
> * Select polygons of a vector by location
> * Reproject a raster
> * Perform a raster calculation


------------

There are a wide variety of spatial, topological, and attribute data operations you can perform with R. [Lovelace et al's recent publication](https://r.geocompx.org/)[^11] goes into great depth about this and is highly recommended.

[^11]: Lovelace, R., Nowosad, J., & Muenchow, J. (2024). Geocomputation with R. CRC Press.

In this section we will look at a few examples for libraries and commands that allow us to process spatial data in R and perform a few commonly used operations.

## Attribute Join

An attribute join on vector data brings tabular data into a geographic context. It refers to the process of joining data in tabular format to data in a format that holds the geometries (polygon, line, or point). 

If you have done attribute joins of shapefiles in GIS software like _ArcGIS_ or _QGis_ you know that you need a __unique identifier__ in both the attribute table of the shapefile and the table to be joined. 

First we will load the CSV table `PhiladelphiaEduAttain.csv` into a dataframe in R and name it `ph_edu`.

```{r load-edu-csv}
ph_edu <- read_csv("data/PhiladelphiaEduAttain.csv")
names(ph_edu)
```


If you don't have the object still loaded read the the `PhillyTotalPopHHinc` shapefile into an object named `philly_sf`. Check out the column names of `philly_sf` and of `ph_edu` to determine which one might contain the unique identifier for the join.

```{r load-philly-sf}
# if you need to read in again:
# philly_sf <- st_read("data/Philly/")
names(philly_sf)
```

To join the `ph_edu` data frame with `philly_sf` we can use `merge` like this: 

```{r sf-attr-merge, results='show'}
philly_sf_merged <- left_join(philly_sf, ph_edu, by = c("GEOID10" = "GEOID"))
names(philly_sf_merged) 
```
We see the new attribute columns added, as well as the geometry column.


## Topological Subsetting: Select Polygons by Location

For the next example our goal is to select all Philadelphia census tracts that are approximately within a range of 2 kilometers from the city center.

> Think about this for a moment -- what might be the steps you'd follow?

```{r eval=FALSE}
## How about:

# 1. Get the census tract polygons.
# 2. Find the Philadelphia city center coordinates.
# 3. Create a buffer around the city center point.
# 4. Select all census tract polygons that intersect with the center buffer

```

We will use `philly_sf` for the census tract polygons.

In addition, we need to create a `sf` Point object with the Philadelphia city center coordinates:

$$x = 1750160$$ 
$$y = 467499.9$$ 

These coordinates are also in the _USA Contiguous Albers Equal Area Conic_ projected CRS, which is the same as CRS as `philly_sf`.

With this information, we create a object that holds the coordinates of the city center. Since we don’t have attributes we will just create it as a simple feature collection, `scf`.

```{r sf-intersect-point, results='show'}
# if you need to read in again:
# philly_sf <- st_read("data/Philly/", quiet = T)

# make a simple feature point with CRS
philly_ctr <- 
  st_point(c(1750160, 467499.9)) %>% # point coordinates
  st_sfc(crs = st_crs(philly_sf))  # create feature collection

```


For the spatial operations we can recur to the suite of geometric operations that come with the `sf` package.

We create a 2km buffer around the city center point:

```{r sf-buffer-point, results='show'}
philly_buf <- st_buffer(philly_ctr, 2000)
```

Ok. Now we can use that buffer to select all census tract polygons that intersect with the center buffer. The `sf` package has its own `st_filter` function that we can use here. 

```{r sf-intersects-subset, results='show'}
philly_sel <- st_filter(philly_sf, philly_buf)
```

Then we can plot what we created like this:

```{r sf-intersects-plot, results='show'}

plot(st_geometry(philly_sf), border="#aaaaaa", main="Census tracts that overlap with 2km buffer\naround city center")
plot(st_geometry(philly_sel), add=T, col="red")
plot(st_geometry(philly_buf), add=T, lwd = 2)
```

Note the difference to `st_intersection`, which performs an actual clippin operation and creates an `sfc` object that cuts out the area of the buffer from the census polygons:

```{r sf-intersection, results='show'}
philly_intersection <- st_intersection(philly_buf, philly_sf)
philly_intersection

plot(st_geometry(philly_sf), border="#aaaaaa", main="Census tracts around city center,\nclipped by 2km buffer ")
plot(philly_intersection, add=T, lwd = 2, border = "red")
```


## Reprojecting 

Occasionally you may have to change the coordinates of your spatial object into a new Coordinate Reference System (CRS). Functions to transform, or _reproject_  spatial objects typically take the following two arguments:

  * the spatial object to reproject
  * a CRS object with the new projection definition

You can reproject

  * a `sf` object with `st_transform()`  
  * a `SpatRaster` object with `project()`  

The perhaps trickiest part here is to determine the definition of the projection, which needs to be a character string in [proj4](http://trac.osgeo.org/proj/) format. You can [look it up online](http://www.spatialreference.org). For example for [UTM zone 33N (EPSG:32633)](http://spatialreference.org/ref/epsg/wgs-84-utm-zone-33n/) the string would be:

[`+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs`](http://spatialreference.org/ref/epsg/wgs-84-utm-zone-33n/proj4js/)

You can retrieve the CRS: 

  - from an `sf` object with `st_crs()`
  - from a `SpatRaster` object with `crs()`  

Let us go back to the `"PhillyHomicides"` shapefile we exported earlier. Let's read it back in and reproject it so it matches the projection of the Philadelphia Census tracts.

Now let us check the CRS for both files. 

```{r vector-check-proj-sf, results='show'}
#If you need to read the file back in:
#philly_homicides_sf <- st_read("data/PhillyHomicides/")

st_crs(philly_sf)$proj4string
st_crs(philly_homicides_sf)$proj4string
```

We see that the CRS are different: we have `+proj=aea...` and `+proj=longlat...`. AEA refers to *USA Contiguous Albers Equal Area Conic* which is a projected coordinate system with numeric units. We will need this below for our spatial operations, so we will make sure both files are in that same CRS. 

We use `st_transform` and assign the result to a new object. Note how we also use `str_crs` to extract the projection definition from `philly_sf`, so we don't have to type it out.

```{r vector-reproject-sf, results='show'}
philly_homicides_sf_aea <- st_transform(philly_homicides_sf, st_crs(philly_sf))
```

We can use the `range()` command from the R base package to compare the coordinates before and after reprojection and confirm that we actually have transformed them. `range()` returns the _min_ and _max_ value of a vector of numbers.

```{r, compare-coords-range-sf, results='show'}
range(st_coordinates(philly_homicides_sf))
range(st_coordinates(philly_homicides_sf_aea))
```

We can also compare them visually with:  

```{r compare-reproj-plots-sf, results='show'}
par(mfrow=c(1,2)) 
plot(st_geometry(philly_homicides_sf), axes=TRUE, main = "before transform - latlon")
plot(st_geometry(philly_homicides_sf_aea), axes=TRUE, main = "after transform - aea")
```

Lastly, let us save the reprojected file as `PhillyHomicides_aea` shapefile, as we will use it later on.

```{r write-reproj-sf, eval=FALSE}
st_write(philly_homicides_sf_aea, "data/PhillyHomicides_aea", driver = "ESRI Shapefile")
```


### Raster reprojection

Here is what it would look like to reproject the HARV raster used earlier to a WGS84 projection. We see that see that the original projection is in UTM.

```{r raster-reproject, tidy=FALSE, warning=FALSE, results='show'}
# if you need to load again:
#HARV <- raster("data/HARV_RGB_Ortho.tif")
crs(HARV, proj = TRUE)
HARV_WGS84 <- project(HARV, "+init=epsg:4326")
crs(HARV_WGS84, proj = TRUE)
```

Let's look at the coordinates to see the effect:

```{r raster-reproject-ext, tidy=FALSE, warning=FALSE, results='show'}
ext(HARV)
ext(HARV_WGS84)
```

Due to the reprojection the number of cells has also changed:

```{r raster-reproject-cells, tidy=FALSE, warning=FALSE, results='show'}
ncell(HARV)
ncell(HARV_WGS84)
```

And here is the visual proof:

```{r raster-reproject-plot1, tidy=FALSE, warning=FALSE, results='show'}
plot(HARV, main = "before transform - UTM")
```

```{r raster-reproject-plot2, tidy=FALSE, warning=FALSE, results='show'}
plot(HARV_WGS84, main = "after transform - WGS84")
```


## Spatial Aggregation: Points in Polygons

Now that we have both homicides and census tracts in the same projection we will forge ahead and ask for the density of homicides for **each census tract** in Philadelphia: $\frac{{homicides}}{area}$

To achieve this this we join the points of homicide incidence to the census tract polygon and count them up for each polygon. You might be familiar with this operation from other GIS packages.


We will use piping and build up our object in the following way. First we calculate the area for each tract. We use the `st_area` function on the geometry column and add the result.


```{r sf-hom-area, eval=FALSE}
philly_sf %>% 
  mutate(tract_area = st_area(geometry)) %>% 
  head()
```      
      
Next, we use st_join to perform a spatial join with the points:      
      
```{r sf-hom-join, eval=FALSE}
philly_sf %>% 
  mutate(tract_area = st_area(geometry)) %>% 
  st_join(philly_homicides_sf_aea) %>%
  head()
```      

Now we can group by a variable that uiquely identifies the census tracts, (we choose _GEOID10_) and use `summarize` to count the points for each tract and calculate the homicide rate. Since our units are in sq meter we multiply by by 1000000 to get sq km. We also need to carry over the area, which I do using `unique`.     

We also assign the output to a new object `philly_crimes_sf`.
      
```{r sf-hom-ratio, results='show'}

philly_crimes_sf <- philly_sf %>%
      mutate(tract_area = st_area(geometry)) %>%
      st_join(philly_homicides_sf_aea) %>%
      group_by(GEOID10) %>%
      summarize(n_homic = n(),
                tract_area = unique(tract_area),
                homic_rate = as.numeric(1e6 * (n_homic/tract_area))) 
```


Finally, we write this out for later:

```{r sf-homiciderate-write, eval=FALSE}
st_write(philly_crimes_sf, "data/PhillyCrimerate", driver = "ESRI Shapefile")
```

## Raster calculations with `terra`

We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.

First let's read in the two datasets.

```{r read-dsm-dtm, results = 'show'}
HARV_DTM <- rast("data/HARV_dtmCrop.tif")
HARV_DSM <- rast("data/HARV_dsmCrop.tif")
```

Now we can subtract the DTM from the DSM to create a Canopy Height Model. It will for each CHM pixel calculate the difference of the respective DTM and DSM pixels.

```{r chm, results = 'show'}
HARV_CHM <- HARV_DSM - HARV_DTM
par(mfrow = c(1, 2))
plot(HARV_CHM)
hist(HARV_CHM)
```


This works fine for the small rasters in this tutorial. However, the calculation above becomes less efficient when computations are more complex or file sizes become large.

Thet `terra` package contains a function called `lapp() `function to make processing more efficient. It takes two or more rasters and applies a function to them. The generic syntax is:

    outputRaster <- lapp(x, fun)

where *x* is a `SpatRasterDataset` and *fun* is a custom function for the operation we want to perform.

```{r lapp, results = 'show'}
CHM_ov_HARV <- lapp(sds(list(HARV_DSM, HARV_DTM)), # use sds to create a SpatRasterDataset
                    fun = function(r1, r2) { 
                      return( r1 - r2) 
                      })
```

As arguments for our `lapp` operation we use the `sds()` function and provide it with the list of rasters that we want to operate on. As custom function we provide the function with two arguments (`r1` and `r1`) that subtracts the second (`r2`) from the first (`r1`) and returns the difference. The output of `lapp` is a `SpatRaster` and we assign it to a new variable `CHM_ov_HARV.`

The two rasters should be the same.

```{r rast-ident, results = 'show'}
identical(HARV_CHM, CHM_ov_HARV)
```