forked from Envirometrix/PredictiveSoilMapping
-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-Software.Rmd
executable file
·312 lines (233 loc) · 15.6 KB
/
02-Software.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
# Software installation and first steps {#software}
*Edited by: T. Hengl*
This section contains instructions on how to install and use software to run predictive soil mapping and export results to GIS or web applications. It has been written (as has most of the book) for Linux users, but should not be too much of a problem to adapt to Microsoft Windows OS and/or Mac OS.
## List of software in use
```{r software-triangle, echo=FALSE, fig.cap="Software combination used in this book.", out.width="55%"}
knitr::include_graphics("figures/software_triangle.png")
```
For processing the covariates we used a combination of Open Source GIS
software, primarily SAGA GIS [@gmd-8-1991-2015], packages raster [@raster],
sp [@pebesma2005classes], and GDAL [@mitchell2014geospatial] for reprojecting,
mosaicking and merging tiles. GDAL and parallel packages in R are highly suitable for
processing large volumes of data.
Software required to run all exercises in the book includes:
* [R](http://cran.r-project.org/bin/windows/base/) or [MRO](https://mran.microsoft.com/download/);
* [RStudio](http://www.rstudio.com/products/RStudio/);
* R packages: GSIF, plotKML, aqp, ranger, caret, xgboost, plyr, raster, gstat, randomForest, ggplot2, e1071 (see: [how to install R package](http://www.r-bloggers.com/installing-r-packages/))
* [SAGA GIS](http://sourceforge.net/projects/saga-gis/) (on Windows machines run windows installer);
* Google Earth or Google Earth Pro;
* [QGIS](https://qgis.org/en/site/forusers/download.html);
* [GDAL v2.x](https://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries) for Windows machines use e.g. ["gdal-*-1800-x64-core.msi"](http://download.gisinternals.com/sdk/downloads/);
R script used in this tutorial can be downloaded from **[github](https://github.com/envirometrix/PredictiveSoilMapping)**. As a gentle introduction to the R programming language and to soil classes in R we recommend the section \@ref(variables-importing) on importing and using soil data. Some more examples of SAGA GIS + R usage can be found in the soil covariates chapter. To visualize spatial predictions in a web-browser or Google Earth you can try using plotKML package [@Hengl2014plotKML]. As a gentle introduction to the R programming language and spatial classes in R we recommend following [the Geocomputation with R book](https://geocompr.robinlovelace.net/). Obtaining the [R reference card](https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf) is also highly recommended.
## Installing software on Ubuntu OS
On Ubuntu (often the preferred standard for the GIS community) the main required software can be installed within 10--20 minutes. We start with installing GDAL, proj4 and some packages that you might need later on:
```{bash, eval=FALSE}
sudo apt-get install libgdal-dev libproj-dev
sudo apt-get install gdal-bin python-gdal
```
Next, we install R and RStudio. For R studio you can use the CRAN distribution or the optimized distribution provided by the former REvolution company (now owned by Microsoft):
```{bash, eval=FALSE}
wget https://mran.blob.core.windows.net/install/mro/3.4.3/microsoft-r-open-3.4.3.tar.gz
tar -xf microsoft-r-open-3.4.3.tar.gz
cd microsoft-r-open/
sudo ./install.sh
```
Note that R versions are constantly being updated so you will need to replace the URL above based on the most current information provided on the home page (http://mran.microsoft.com). Once you run ```install.sh``` you will have to accept the license terms twice before the installation can be completed. If everything completes successfully, you can get the session info by typing:
```{r}
sessionInfo()
```
```{r, eval=FALSE}
system("gdalinfo --version")
```
This shows, for example, that the this installation of R is based on the Ubuntu 16.* LTS operating system
and the version of GDAL is up to date. Using an optimized distribution of R
(read more about ["The Benefits of Multithreaded Performance with Microsoft R Open"](https://mran.microsoft.com/documents/rro/multithread)) is especially
important if you plan to use R for production purposes i.e. to optimize computing
and generation of soil maps for large numbers of pixels.
To install RStudio we can run:
```{bash, eval=FALSE}
sudo apt-get install gdebi-core
wget https://download1.rstudio.org/rstudio-1.1.447-amd64.deb
sudo gdebi rstudio-1.1.447-amd64.deb
sudo rm rstudio-1.1.447-amd64.deb
```
Again, RStudio is constantly updated so you might have to obtain the most recent RStudio version and distribution.
To learn more about doing first steps in R and RStudio and to learn to improve your scripting skills more efficiently, consider studying the following tutorials:
* Grolemund, G., (2014) [**Hands-On Programming with R**](https://rstudio-education.github.io/hopr/). O’Reilly, 236 pages.
* Gillespie, C., Lovelace, R., (2016) [**Efficient R programming**](https://csgillespie.github.io/efficientR/). O’Reilly, 222 pages.
* Wilke, C.O., (2019) [**Fundamentals of Data Visualization**](https://serialmentor.com/dataviz/). O’Reilly, in press.
## Installing GIS software
Predictive soil mapping is about making maps, and working with maps requires use of GIS software to open, view overlay and analyze the data spatially. GIS software recommended in this book for soil mapping consists of SAGA GIS, QGIS, GRASS GIS and Google Earth. QGIS comes with an [extensive literature](https://www.qgis.org/en/docs/) and can be used to publish maps and combine layers served by various organizations.
SAGA GIS, being implemented in C++, is highly suited for running geoprocessing on large data sets.
To install SAGA GIS on Ubuntu we can use:
```{bash, eval=FALSE}
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install saga
```
If installation is successful, you should be able to access SAGA command line also from R by using:
```{r, eval=FALSE}
system("saga_cmd --version")
```
To install QGIS (https://download.qgis.org/) you might first have to add the location of the debian libraries:
```{bash, eval=FALSE}
sudo sh -c 'echo "deb http://qgis.org/debian xenial main" >> /etc/apt/sources.list'
sudo sh -c 'echo "deb-src http://qgis.org/debian xenial main " >> /etc/apt/sources.list'
sudo apt-get update
sudo apt-get install qgis python-qgis qgis-plugin-grass
```
Other utility software that you might need include `htop` program that allows you to track processing progress:
```{bash, eval=FALSE}
sudo apt-get install htop iotop
```
and some additional libraries use `devtools`, `geoR` and similar, which can be installed via:
```{bash, eval=FALSE}
sudo apt-get install build-essential automake;
libcurl4-openssl-dev pkg-config libxml2-dev;
libfuse-dev mtools libpng-dev libudunits2-dev
```
You might also need the `7z` software for easier compression and `pigz` for parallelized compression:
```{bash, eval=FALSE}
sudo apt-get install pigz zip unzip p7zip-full
```
## WhiteboxTools {#Whitebox}
WhiteboxTools (http://www.uoguelph.ca/~hydrogeo/WhiteboxTools/), contributed by John Lindsay, is an extensive suite of functions and tools for DEM analysis which is especially useful for extending the hydrological and morphometric analysis tools available in SAGA GIS and GRASS GIS [@lindsay2016whitebox]. Probably the easiest way to use WhiteboxTools is to install a QGIS plugin (kindly maintained by Alexander Bruy: https://plugins.bruy.me/) and then learn and extend the WhiteboxTools scripting language by testing things out in QGIS (see below).
```{r whiteboxtools-preview, echo=FALSE, fig.cap="Calling WhiteboxTools from QGIS via the WhiteboxTools plugin.", out.width="100%"}
knitr::include_graphics("figures/whiteboxtools-preview.jpg")
```
The function `FlowAccumulationFullWorkflow` is, for example, a wrapper function to filter out all spurious sinks and to derive a hydrological flow accumulation map in one step. To run it from command line we can use:
```{r, eval=FALSE}
system(paste0('"/home/tomislav/software/WBT/whitebox_tools" ',
'--run=FlowAccumulationFullWorkflow --dem="./extdata/DEMTOPx.tif" ',
'--out_type="Specific Contributing Area" --log="False" --clip="False" ',
'--esri_pntr="False" ',
'--out_dem="./extdata/DEMTOPx_out.tif" ',
'--out_pntr="./extdata/DEMTOPx_pntr.tif" ',
'--out_accum="./extdata/DEMTOPx_accum.tif" -v'))
```
```{r eberg-hydroflow-preview-3d, echo=FALSE, fig.cap="Hydrological flow accummulation map based on the Ebergotzen DEM derived using WhiteboxTools.", out.width="100%"}
knitr::include_graphics("figures/eberg_hydroflow_preview_3d.jpg")
```
This produces a number of maps, from which the hydrological flow accumulation map is usually the most useful. It is highly recommended that, before running analysis on large DEM's using WhiteboxTools and/or SAGA GIS, you test functionality using smaller data sets i.e. either a subset of the original data or using a DEM at very coarse resolution (so that width and height of a DEM are only few hundred pixels). Also note that WhiteboxTools do not presently work with GeoTIFs that use the `COMPRESS=DEFLATE` creation options.
## RStudio {#Rstudio}
RStudio is, in principle, the main R scripting environment and can be used to control all other software used in this tutorial. A more detailed RStudio tutorial is available at: [RStudio — Online Learning](http://www.rstudio.com/resources/training/online-learning/). Consider also following some spatial data tutorials e.g. by James Cheshire (http://spatial.ly/r/). Below is an example of an RStudio session with R editor on right and R console on left.
```{r rstudio-example, echo=FALSE, fig.cap="RStudio is a commonly used R editor written in C++.", out.width="100%"}
knitr::include_graphics("figures/rstudio_example.png")
```
To install all required R packages used in this tutorial at once, you can use:
```{r, eval=FALSE}
ls <- c("reshape", "Hmisc", "rgdal", "raster", "sf", "GSIF", "plotKML",
"nnet", "plyr", "ROCR", "randomForest", "quantregForest",
"psych", "mda", "h2o", "h2oEnsemble", "dismo", "grDevices",
"snowfall", "hexbin", "lattice", "ranger",
"soiltexture", "aqp", "colorspace", "Cubist",
"randomForestSRC", "ggRandomForests", "scales",
"xgboost", "parallel", "doParallel", "caret",
"gam", "glmnet", "matrixStats", "SuperLearner",
"quantregForest", "intamap", "fasterize", "viridis")
new.packages <- ls[!(ls %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```
This will basically check if any package is installed already, then install it only if it is missing. You can put this line at the top of each R script that you share so that anybody using that script will automatically obtain all required packages.
Note that the h2o package requires Java libraries, so you will also have to install Java by using e.g.:
```{bash, eval=FALSE}
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version
```
## plotKML and GSIF packages
Many examples in this tutorial rely on the top 5 most commonly used packages for spatial data: (1) [sp and rgdal](https://cran.r-project.org/web/views/Spatial.html), (2) [raster](https://cran.r-project.org/web/packages/raster/), (3) [plotKML](http://plotkml.r-forge.r-project.org/) and (4) [GSIF](http://gsif.r-forge.r-project.org/). To install the most up-to-date version of plotKML/GSIF, you can also use the R-Forge versions of the package:
```{r}
if(!require(GSIF)){
install.packages("GSIF", repos=c("http://R-Forge.R-project.org"),
type = "source", dependencies = TRUE)
}
```
A copy of the most-up-to-date and stable versions of plotKML and GSIF is also available on [github](https://github.com/cran/GSIF). To run only some specific function from the GSIF package you can do e.g.:
```{r, eval=FALSE}
source_https <- function(url, ...) {
# load package
require(RCurl)
# download:
cat(getURL(url, followlocation = TRUE,
cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")),
file = basename(url))
source(basename(url))
}
source_https("https://raw.githubusercontent.com/cran/GSIF/master/R/OCSKGM.R")
```
To test if these packages work properly, create soil maps and visualize them in Google Earth by running the following lines of code (see also function: [fit.gstatModel](http://gsif.r-forge.r-project.org/fit.gstatModel.html)):
```{r}
library(GSIF)
library(sp)
library(boot)
library(aqp)
library(plyr)
library(rpart)
library(splines)
library(gstat)
library(quantregForest)
library(plotKML)
demo(meuse, echo=FALSE)
omm <- fit.gstatModel(meuse, om~dist+ffreq, meuse.grid, method="quantregForest")
om.rk <- predict(omm, meuse.grid)
om.rk
#plotKML(om.rk)
```
```{r ge-preview, echo=FALSE, fig.cap="Example of a plotKML output for geostatistical model and prediction.", out.width="90%"}
knitr::include_graphics("figures/ge_preview.jpg")
```
## Connecting R and SAGA GIS
SAGA GIS provides comprehensive GIS geoprocessing software with over [600 functions](http://www.saga-gis.org/saga_tool_doc/index.html).
SAGA GIS can not be installed from RStudio (it is not a package for R).
Instead, you need to install SAGA GIS using the installation instructions from the [software homepage](https://sourceforge.net/projects/saga-gis/).
After you have installed SAGA GIS, you can send processes from
R to SAGA GIS by using the ```saga_cmd``` command line interface:
```{r}
if(Sys.info()['sysname']=="Windows"){
saga_cmd = "C:/Progra~1/SAGA-GIS/saga_cmd.exe"
} else {
saga_cmd = "saga_cmd"
}
system(paste(saga_cmd, "-v"))
```
To use some SAGA GIS function you need to carefully follow
the [SAGA GIS command line arguments](http://www.saga-gis.org/saga_tool_doc/index.html). For example,
```{r}
library(plotKML)
library(rgdal)
library(raster)
data("eberg_grid")
gridded(eberg_grid) <- ~x+y
proj4string(eberg_grid) <- CRS("+init=epsg:31467")
writeGDAL(eberg_grid["DEMSRT6"], "./extdata/DEMSRT6.sdat", "SAGA")
system(paste(saga_cmd, 'ta_lighting 0 -ELEVATION "./extdata/DEMSRT6.sgrd"
-SHADE "./extdata/hillshade.sgrd" -EXAGGERATION 2'))
```
```{r rstudio-saga-gis, echo=FALSE, fig.cap="Deriving hillshading using SAGA GIS and then visualizing the result in R.", out.width="100%"}
knitr::include_graphics("figures/rstudio_saga_gis.png")
```
## Connecting R and GDAL
GDAL is another very important software tool for handling spatial data (and especially for exchanging / converting spatial data). GDAL also needs to be installed separately (for Windows machines use e.g. ["gdal-201-1800-x64-core.msi"](http://download.gisinternals.com/sdk/downloads/)) and then can be called from command line:
```{r}
if(.Platform$OS.type == "windows"){
gdal.dir <- shortPathName("C:/Program files/GDAL")
gdal_translate <- paste0(gdal.dir, "/gdal_translate.exe")
gdalwarp <- paste0(gdal.dir, "/gdalwarp.exe")
} else {
gdal_translate = "gdal_translate"
gdalwarp = "gdalwarp"
}
system(paste(gdalwarp, "--help"))
```
We can use GDAL to reproject the grid from the previous example:
```{r plot-eberg-ll, fig.width=7, fig.cap="Ebergotzen DEM reprojected in geographical coordinates.", out.width="80%"}
system(paste('gdalwarp ./extdata/DEMSRT6.sdat ./extdata/DEMSRT6_ll.tif',
'-t_srs \"+proj=longlat +datum=WGS84\"'))
library(raster)
plot(raster("./extdata/DEMSRT6_ll.tif"))
```
The following books are highly recommended for improving programming skills in R and specially for the purpose of geographical computing:
* Bivand, R., Pebesma, E., Rubio, V., (2013) [**Applied Spatial Data Analysis with R**](http://www.asdar-book.org/). Use R Series, Springer, Heidelberg, 2nd Ed. 400 pages.
* Lovelace, R., Nowosad, J., Muenchow, J., (2018) [**Geocomputation with R**](https://geocompr.robinlovelace.net/). R Series, CRC Press, 338 pages.