-
Notifications
You must be signed in to change notification settings - Fork 41
/
05-Statistical_theory.Rmd
executable file
·2699 lines (2249 loc) · 134 KB
/
05-Statistical_theory.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Statistical theory for predictive soil mapping {#statistical-theory}
*Edited by: Hengl T., Heuvelink G.B.M and MacMillan R. A.*
## Aspects of spatial variability of soil variables {#aspects-variability}
In this chapter we review the statistical theory for soil
mapping. We focus on models considered most suitable for practical
implementation and use with soil profile data and gridded covariates,
and we provide the mathematical-statistical details of the selected
models. We start by revisiting some basic statistical aspects of soil
mapping, and conclude by illustrating a proposed framework for reproducible,
semi-automated mapping of soil variables using simple, real-world
examples.
The code and examples are provided only for illustration. More complex predictive modeling
is described in chapter \@ref(soilmapping-using-mla). To install and optimize
all packages used in this chapter please refer to section \@ref(Rstudio).
### Modelling soil variability
Soils vary spatially in a way that is often only partially understood.
The main (deterministic) causes of soil spatial variation are the
well-known causal factors — climate, organisms, relief, parent
material and time — but how these factors jointly shape the soil over time is a
very complex process that is (still) extremely difficult to model
mechanistically. Moreover, mechanistic modelling approaches require
large sets of input data that are realistically not available in practice. Some
initial steps have been made, notably for mechanistic modelling of
vertical soil variation (see e.g. @Finke2008462,
@Sommer2008480, @Minasny2008140, and @vanwalleghem2010spatial), but
existing approaches are still rudimentary and cannot be used for
operational soil mapping. Mainstream soil mapping therefore takes an
empirical approach in which the relationship between the soil variable
of interest and causal factors (or their proxies) is modelled
statistically, using various types of regression models. The explanatory
variables used in regression are also known as *covariates* (a list of
common covariates used in soil mapping is provided in chapter \@ref(soil-covs-chapter)).
Regression models explain only part of the variation (i.e. variance) of
the soil variable of interest, because:
- *The structure of the regression model does not represent the true
mechanistic relationship between the soil and its causal factors*.
- *The regression model includes only a few of the many causal factors
that formed the soil*.
- *The covariates used in regression are often only incomplete proxies of the
true soil forming factors*.
- *The covariates often contain measurement errors and/or are measured
at a much coarser scale (i.e. support) than that of the soil that
needs to be mapped*.
As a result, soil spatial regression models will often display a
substantial amount of residual variance, which may well be larger than
the amount of variance explained by the regression itself. The residual
variation can subsequently be analysed on spatial structure through a
variogram analysis. If there is spatial structure, then kriging the
residual and incorporating the result of this in mapping can improve the
accuracy of soil predictions [@hengl2007regression].
### Universal model of soil variation {#umsv}
From a statistical point of view, it is convenient to distinguish
between three major components of soil variation: (1) deterministic
component (trend), (2) spatially correlated component and (3) pure
noise. This is the basis of the *universal model of soil variation*
[@Burrough1998OUP; @Webster2001Wiley p.133]:
\begin{equation}
Z({s}) = m({s}) + \varepsilon '({s}) + \varepsilon ''({s})
(\#eq:ukm)
\end{equation}
where $s$ is two-dimensional location, $m({s})$ is the
deterministic component, $\varepsilon '({s})$ is the spatially
correlated stochastic component and $\varepsilon ''({s})$ is the
pure noise (micro-scale variation and measurement error). This model was
probably first introduced by @Matheron1969PhD, and has been used as a
general framework for spatial prediction of quantities in a variety of
environmental research disciplines.
```{block2, type="rmdnote"}
The *universal model of soil variation* assumes that there are three
major components of soil variation: (1) the deterministic component
(function of covariates), (2) spatially correlated component (treated as
stochastic) and (3) pure noise.
```
The universal model of soil variation model (Eq.\@ref(eq:ukm)) can be
further generalised to three-dimensional space and the spatio-temporal
domain (3D+T) by letting the variables also depend on depth and time:
\begin{equation}
Z({s}, d, t) = m({s}, d, t) + \varepsilon '({s}, d, t) + \varepsilon ''({s}, d, t)
(\#eq:ukm3DT)
\end{equation}
where $d$ is depth expressed in meters downward from the land surface
and $t$ is time. The deterministic component $m$ may be further
decomposed into parts that are purely spatial, purely temporal, purely
depth-related or mixtures of all three. Space-time statistical soil
models are discussed by @Grunwald2005CRCPress, but this area of soil
mapping is still rather experimental.
In this chapter, we mainly focus on purely 2D models but also present
some theory for 3D models, while 2D+T and 3D+T models of soil variation
are significantly more complex (Fig. \@ref(fig:scheme-2D-3D-maps)).
```{r scheme-2D-3D-maps, echo=FALSE, fig.cap="Number of variogram parameters assuming an exponential model, minimum number of samples and corresponding increase in number of prediction locations for 2D, 3D, 2D+T and 3D+T models of soil variation. Here “altitude” refers to vertical distance from the land surface, which is in case of soil mapping often expressed as negative vertical distance from the land surface.", out.width="60%"}
knitr::include_graphics("figures/Fig_2D_3DT_maps.png")
```
One of the reasons why 2D+T and 3D+T models of soil variations are rare
is because there are very few point data sets that satisfy the
requirements for analysis. One national soil data set that could be
analyzed using space-time geostatistics is, for example, the Swiss
soil-monitoring network (NABO) data set [@JPLN:JPLN200900269], but even
this data set does not contain complete profile descriptions following
international standards. At regional and global scales it would be even
more difficult to find enough data to fit space-time models (and to fit
3D+T variogram models could be even more difficult). For catchments and
plots, space-time datasets of soil moisture have been recorded and used
in space-time geostatistical modelling (see e.g. @snepvangers2003soil and
@jost2005analysing).
Statistical modelling of the spatial distribution of soils requires field
observations because most statistical methods are data-driven. The
minimum recommended number of points required to fit 2D geostatistical
models, for example, is in the range 50–100 points, but this number
increases with any increase in spatial or temporal dimension
(Fig. \@ref(fig:scheme-2D-3D-maps)). The Cookfarm data set for example
contains hundreds of thousands of observations, although the study area
is relatively small and there are only ca. 50 station locations
[@Gasch2015SPASTA].
The deterministic and stochastic components of soil spatial variation
are separately described in more detail in subsequent sections, but
before we do this, we first address soil vertical variability and how it
can be modelled statistically.
### Modelling the variation of soil with depth {#soil-depth-models}
Soil properties vary with depth, in some cases much more than in the
horizontal direction. There is an increasing awareness that the vertical
dimension is important and needs to be incorporated in soil mapping. For
example, many spatial prediction models are built using ambiguous
vertical reference frames such as predicted soil property for
*“top-soil”* or *“A-horizon”*. Top-soil can refer to different depths /
thicknesses and so can the A-horizon range from a few centimeters to
over one meter. Hence before fitting a 2D spatial model to soil profile
data, it is a good idea to standardize values to standard depths,
otherwise soil observation depth becomes an additional source of
uncertainty. For example soil organic carbon content is strongly
controlled by soil depth, so combining values from two A horizons one
thick and the other thin, would increase the complexity of 2D soil mapping
because a fraction of the variance is controlled by the depth, which is
ignored.
The concept of perfectly homogeneous soil horizons is often too
restrictive and can be better replaced with continuous representations
of soil vertical variation i.e. *soil-depth functions* or curves.
Variation of soil properties with depth is typically modelled using one
of two approaches (Fig. \@ref(fig:soil-depth-examples)):
1. *Continuous vertical variation* — This assumes that soil variables
change continuously with depth. The soil-depth relationship is
modelled using either:
1. *Parametric model* — The relationship is modelled using
mathematical functions such as logarithmic or exponential
decay functions.
2. *Non-parametric model* — The soil property changes continuously
but without obvious regularity with depth. Changes in values are
modelled using locally fitted functions such as piecewise linear
functions or splines.
2. *Abrupt or stratified vertical variation* — This assumes that soil
horizons are distinct and homogeneous bodies of soil material and
that soil properties are constant within horizons and change
abruptly at boundaries between horizons.
Combinations of the two approaches are also possible, such as the use of
exponential decay functions per soil horizon [@Kempen2011Geoderma].
Parametric continuous models are chosen to reflect pedological knowledge
e.g. knowledge of soil forming processes. For example, organic carbon
usually originates from plant production i.e. litter or roots. Generally,
the upper layers of the soil tend to have greater organic carbon
content, which decreases continuously with depth, so that the soil-depth
relationship can be modelled with a negative-exponential function:
\begin{equation}
{\texttt{ORC}} (d) = {\texttt{ORC}} (d_0) \cdot \exp(-\tau \cdot d)
(\#eq:SOMdepth)
\end{equation}
where $\texttt{ORC}(d)$ is the soil organic carbon content at depth
($d$), ${\texttt{ORC}} (d_0)$ is the organic carbon content at the soil
surface and $\tau$ is the rate of decrease with depth. This model has
only two parameters that must be chosen such that model averages over
sampling horizons match those of the observations as closely as
possible. Once the model parameters have been estimated, we can easily
predict concentrations for any depth interval.
Consider for example this sample profile from Nigeria:
```{r}
lon = 3.90; lat = 7.50; id = "ISRIC:NG0017"; FAO1988 = "LXp"
top = c(0, 18, 36, 65, 87, 127)
bottom = c(18, 36, 65, 87, 127, 181)
ORCDRC = c(18.4, 4.4, 3.6, 3.6, 3.2, 1.2)
munsell = c("7.5YR3/2", "7.5YR4/4", "2.5YR5/6", "5YR5/8", "5YR5/4", "10YR7/3")
## prepare a SoilProfileCollection:
prof1 <- plyr::join(data.frame(id, top, bottom, ORCDRC, munsell),
data.frame(id, lon, lat, FAO1988), type='inner')
prof1$mdepth <- prof1$top+(prof1$bottom-prof1$top)/2
```
we can fit a log-log model by using e.g.:
```{r}
d.lm <- glm(ORCDRC ~ log(mdepth), data=prof1, family=gaussian(log))
options(list(scipen=3, digits=2))
d.lm$fitted.values
```
which shows that the log-log fit comes relatively close to the actual values.
Another possibility would be to fit a power-law model:
\begin{equation}
{\texttt{ORC}} (d) = a \cdot d^b
(\#eq:loglog)
\end{equation}
A disadvantage of a single parametric soil property-depth model along
the entire soil profile is that these completely ignore stratigraphy and
abrupt changes at the boundaries between soil horizons. For example,
@Kempen2011Geoderma show that there are many cases where highly
contrasting layers of peat can be found buried below the surface due to
cultivation practices or holocene drift sand. The model given by
Eq.\@ref(eq:loglog) illustrated in Fig. \@ref(fig:soil-depth-examples)
(left) will not be able to represent such abrupt changes.
```{block2, type="rmdnote"}
Before fitting a 2D spatial prediction model to soil profile data, it is
important to standardize values to standard depths, otherwise soil
observation depth can be an additional source of uncertainty.
```
Non-parametric soil-depth functions are more flexible and can represent
observations of soil property averages for sampling layers or horizons
more accurately. One such technique that is particularly interesting is
*equal-area or mass-preserving splines*
[@Bishop1999Geoderma; @Malone2009Geoderma] because it ensures that, for
each sampling layer (usually a soil horizon), the average of the spline
function equals the measured value for the horizon. Disadvantages of the
spline model are that it may not fit well if there are few observations
along the soil profile and that it may create unrealistic values
(through overshoots or extrapolation) in some instances, for example
near the surface. Also, mass-preserving splines cannot accommodate
discontinuities unless, of course, separate spline functions are fitted
above and below the discontinuity.
```{r soil-depth-examples, echo=FALSE, fig.cap="Vertical variation in soil carbon modelled using a logarithmic function (left) and a mass-preserving spline (right) with abrupt changes by horizon ilustrated with solid lines.", out.width="100%"}
knitr::include_graphics("figures/Fig_soil_depth_examples.png")
```
To fit mass preserving splines we can use:
```{r}
library(aqp)
library(rgdal)
library(GSIF)
prof1.spc <- prof1
depths(prof1.spc) <- id ~ top + bottom
site(prof1.spc) <- ~ lon + lat + FAO1988
coordinates(prof1.spc) <- ~ lon + lat
proj4string(prof1.spc) <- CRS("+proj=longlat +datum=WGS84")
## fit a spline:
ORCDRC.s <- mpspline(prof1.spc, var.name="ORCDRC", show.progress=FALSE)
ORCDRC.s$var.std
```
where `var.std` shows average fitted values for standard depth intervals
(i.e. those given in the *GlobalSoilMap* specifications), and `var.1cm`
are the values fitted at 1–cm increments
(Fig. \@ref(fig:soil-depth-examples)).
A disadvantage of using mathematical functions to convert soil
observations at specific depth intervals to continuous values along the
whole profile is that these values are only estimates with associated
estimation errors. If estimates are treated as if these were
observations then an important source of error is ignored, which may
jeopardize the quality of the final soil predictions and in particular
the associated uncertainty (see further
section \@ref(accuracy-assessment)). This problem can be avoided
by taking, for example, a 3D modelling approach
[@poggio2014national; @Hengl2015AfSoilGrids250m], in which model
calibration and spatial interpolation are based on the original soil
observations directly (although proper use of this requires that the
differences in vertical support between measurements are taken into
account also). We will address this also in later sections of this
chapter, among others in section \@ref(prediction-3D).
```{block2, type="rmdnote"}
Soil property-depth relationships are commonly modelled using various
types of mathematical functions. Mass-preserving splines, which ensure
that the average of the spline function equals the measured value for
each sampling layer or horizon, can be used to convert measurements per
layer to point values along the profile. Because soils can show both
abrupt and continuous transitions within the same profile, no simple
spline model is universally valid and case-dependent adjustments often
need to be made.
```
### Vertical aggregation of soil properties {#vertical-aggregation}
As mentioned previously, soil variables refer to aggregate values over
specific depth intervals (see Fig. \@ref(fig:soil-depth-examples)).
For example, the organic carbon content is typically observed per soil
horizon with values in e.g. g/kg or permilles
[@Conant2010; @Rainer2010; @Panagos2013439]. The *Soil Organic Carbon
Storage* (or *Soil Organic Carbon Stock*) in the whole profile can be
calculated by using Eq \@ref(eq:ocs). Once we have determined soil
organic carbon storage ($\mathtt{OCS}$) per horizon, we can derive the
total organic carbon in the soil by summing over all ($H$) horizons:
\begin{equation}
\mathtt{OCS} = \sum\limits_{h = 1}^H { \mathtt{OCS}_h }
(\#eq:ORGCsum)
\end{equation}
Obviously, the horizon-specific soil organic carbon content
($\mathtt{ORC}_h$) and total soil organic carbon content
($\mathtt{OCS}$) are NOT the same variables and need to be analysed and
mapped separately.
In the case of pH ($\mathtt{PHI}$) we usually do not aim at estimating
the actual mass or quantity of hydrogen ions. To represent a soil
profile with a single number, we may take a weighted mean of the
measured pH values per horizon:
\begin{equation}
\mathtt{PHI} = \sum\limits_{h = 1}^H { w_h \cdot \mathtt{PHI}_h }; \qquad \sum\limits_{h = 1}^H{w_h} = 1
(\#eq:pHmean)
\end{equation}
where the weights can be chosen proportional to the horizon thickness:
\begin{equation}
w _h = \frac{{\mathtt{HSIZE}_h}}{\sum\limits_{h = 1}^H {{\mathtt{HSIZE}}_h}}
\end{equation}
Thus, it is important to be aware that all soil variables: (A) can be
expressed as relative (percentages) or absolute (mass / quantities)
values, and (B) refer to specific horizons or depth intervals or to the
whole soil profile.
Similar *spatial support*-effects show up in the horizontal,
because soil observations at *point* locations are not the same as
average or *bulk soil samples* taken by averaging a large number of
point observations on a site or plot [@Webster2001Wiley].
```{block2, type="rmdnote"}
Soil variables can refer to a specific depth interval or to the whole
profile. The differences in spatial patterns between variables
representing fundamentally the same feature (e.g. soil organic carbon in
of a specific soil horizon or soil layer and total organic carbon stock
in of the whole profile), but at different spatial and vertical support,
can be significant.
```
In order to avoid misinterpretation of the results of mapping, we
recommend that any delivered map of soil properties should specify the
support size in the vertical and lateral directions, the analysis method
(detection limit) and measurement units. Such information can be
included in the metadata and/or in any key visualization or plot.
Likewise, any end-user of soil data should specify whether estimates of
the relative or total organic carbon, aggregated or at 2D/3D point
support are required.
## Spatial prediction of soil variables
### Main principles
*“Pragmatically, the goal of a model is to predict, and at the same time
scientists want to incorporate their understanding of how the world
works into their models”* [@cressie2011statistics]. In general terms,
spatial prediction consists of the following seven steps
(Fig. \@ref(fig:general-sp-process)):
1. *Select the target variable, scale (spatial resolution) and
associated geographical region of interest*;
2. *Define a model of spatial variation for the target variable*;
3. *Prepare a sampling plan and collect samples and relevant
explanatory variables*;
4. *Estimate the model parameters using the collected data*;
5. *Derive and apply the spatial prediction method associated with the
selected model*;
6. *Evaluate the spatial prediction outputs and collect new data / run
alternative models if necessary*;
7. *Use the outputs of the spatial prediction process for decision
making and scenario testing*.
```{r general-sp-process, echo=FALSE, fig.cap="From data to knowledge and back: the general spatial prediction scheme applicable to many environmental sciences.", out.width="85%"}
knitr::include_graphics("figures/Fig_general_SP_process.png")
```
The spatial prediction process is repeated at all nodes of a grid
covering $D$ (or a space-time domain in case of spatiotemporal
prediction) and produces three main outputs:
1. Estimates of the model parameters (e.g., regression coefficients and
variogram parameters), i.e. the **model**;
2. Predictions at new locations, i.e. a **prediction map**;
3. Estimate of uncertainty associated with the predictions, i.e. a
**prediction error map**.
It is clear from Fig. \@ref(fig:general-sp-process) that
the key steps in the mapping procedure are: (a) *choice of the sampling scheme* (e.g. @Ng2018 and @BRUS2019464),
(b) *choice of the model of spatial variation* (e.g. @Diggle2007Springer), and
(c) *choice of the parameter estimation technique* (e.g. @lark2006spatial). When the sampling scheme is
given and cannot be changed, the focus of optimization of the spatial
prediction process is then on selecting and fine-tuning the best
performing spatial prediction method.
In a geostatistical framework, spatial prediction is estimation of
values of some target variable $Z$ at a new location (${s}_0$)
given the input data:
\begin{equation}
\hat Z({s}_0) = E\left\{ Z({s}_0)|z({s}_i), \; {{X}}({s}_0), \; i=1,...,n \right\}
(\#eq:sp)
\end{equation}
where the $z({s}_i)$ are the input set of observations of the
target variable, ${s}_i$ is a geographical location, $n$ is the
number of observations and ${{X}}({s}_0)$ is a list of
*covariates* or explanatory variables, available at all prediction
locations within the study area of interest (${s} \in \mathbb{A}$).
To emphasise that the model parameters also influence the outcome of the
prediction process, this can be made explicit by writing
[@cressie2011statistics]:
\begin{equation}
[Z|Y,{{\theta}} ]
(\#eq:datamodel)
\end{equation}
where $Z$ is the data, $Y$ is the (hidden) process that we are
predicting, and ${{\theta}}$ is a list of model parameters e.g. trend
coefficients and variogram parameters.
There are many spatial prediction methods for generating spatial
predictions from soil samples and covariate information. All differ in
the underlying statistical model of spatial variation, although this
model is not always made explicit and different methods may use the same
statistical model. A review of currently used digital soil mapping
methods is given, for example, in @McBratney2011HSS, while the most
extensive review can be found in @McBratney2003Geoderma and @mcbratney2018pedometrics. @LiHeap2010EI list 40+ spatial prediction / spatial interpolation
techniques. Many spatial prediction methods are often
just different names for essentially the same thing.
What is often known under a single name, in the statistical,
or mathematical literature, can be implemented through
different computational frameworks, and lead to different outputs
(mainly because many models are not written out in the finest detail and leave
flexibility for actual implementation).
### Soil sampling
A *soil sample* is a collection of field observations, usually
represented as points. Statistical aspects of sampling methods and
approaches are discussed in detail by @schabenberger2005statistical and
@deGruijter2006sampling, while some more practical suggestions for soil
sampling can be found in @pansu2001soil
@Webster2001Wiley, @tan2005soil, @Legros2006SP and @BRUS2019464. Some general
recommendations for soil sampling are:
1. *Points need to cover the entire geographical area of interest and
not overrepresent specific subareas that have much different
characteristics than the main area.*
2. *Soil observations at point locations should be made using
consistent measurement methods. Replicates should ideally be taken
to quantify the measurement error.*
3. *Bulk sampling is recommended when short-distance spatial variation
is expected to be large and not of interest to the map user.*
4. *If a variogram is to be estimated then the sample size should be
>50 and there should be sufficient point pairs with small
separation distances.*
5. *If trend coefficients are to be estimated then the covariates at
sampling points should cover the entire feature space of
each covariate.*
The sampling design or rationale used to decide where to locate soil
profile observations, or sampling points, is often not clear and may vary
from case to case. Therefore, there is no guarantee that available legacy
point data used as input to geostatistical modelling will satisfy the
recommendations listed above. Many of the legacy profile data locations in the
world were selected using convenience sampling. In fact, many
points in traditional soil surveys may have been selected and sampled to
capture information about unusual conditions or to locate boundaries at
points of transition and maximum confusion about soil properties
[@Legros2006SP]. Once a soil becomes recognized as being widely
distributed and dominant in the landscape, field surveyors often choose
not to record observations when that soil is encountered, preferring to
focus instead on recording unusual sites or areas where soil transition
occurs. Thus the population of available soil point observations may not
be representative of the true population of soils, with some soils being
either over or under-represented.
```{r eberg-sampling-locs, echo=FALSE, fig.cap="Occurrence probabilities derived for the actual sampling locations (left), and for a purely random sample design with exactly the same number of points (right). Probabilities derived using the `spsample.prob` function from the GSIF package. The shaded area on the left indicates which areas (in the environmental space) have been systematically represented, while the white colour indicates areas which have been systematically omitted (and which is not by chance).", out.width="100%"}
knitr::include_graphics("figures/Fig_eberg_sampling_locs.png")
```
Fig. \@ref(fig:eberg-sampling-locs) (the Ebergötzen study area)
illustrates a problem of dealing with clustered samples and omission of
environmental features. Using the actual samples shown in the plot on
the left of Fig. \@ref(fig:eberg-sampling-locs) we would like to map the
whole area inside the rectangle. This is technically possible, but the
user should be aware that the actual Ebergötzen points systematically
miss sampling some environmental features: in this case natural forests / rolling
hills that were not of interest to the survey project. This does not
mean that the Ebergötzen point data are not applicable for
geostatistical analyses. It simply means that the sampling bias and
under-representation of specific environmental conditions will lead to
spatial predictions that may be biased and highly uncertain under these
conditions [@Brus2007Geoderma].
### Knowledge-driven soil mapping {#sec:expertsystems}
As mentioned previously in section \@ref(tacit-knowledge), knowledge-driven
mapping is often based on unstated and unformalized rules and
understanding that exists mainly in the minds and memories of the
individual soil surveyors who conducted field studies and mapping.
Expert, or knowledge-based, information can be converted to mapping
algorithms by applying conceptual rules to decision trees and/or
statistical models [@MacMillan2005CJSS; @Walter2006DSS; @Liu2009].
For example, a surveyor can define the classification rules
subjectively, i.e. based on his/her knowledge of the area, then
iteratively adjust the model until the output maps fit his/her
expectation of the distribution of soils [@MacMillan2010DSM].
In areas where few, or no, field observations of soil properties are
available, the most common way to produce estimates is to rely on expert
knowledge, or to base estimates on data from other, similar areas. This
is a kind of *‘knowledge transfer’* system. The best example of a knowledge
transfer system is the concept of *soil series* in the USA
[@Simonson1968AA]. Soil series (+phases) are the lowest (most detailed) level classes of
soil types typically mapped. Each soil series should consist of pedons
having soil horizons that are similar in colour, texture, structure, pH,
consistence, mineral and chemical composition, and arrangement in the
soil profile.
If one finds the same type of soil series repeatedly at similar
locations, then there is little need to sample the soil again at additional,
similar, locations and, consequently, soil survey field costs can be
reduced. This sounds like an attractive approach because one can
minimize the survey costs by focusing on delineating the distribution of
soil series only. The problem is that there are >15,000 soil series in the
USA [@Smith1986SMSS], which obviously means that it is not easy to
recognize the same soil series just by doing rapid field observations.
In addition, the accuracy with which one can consistently recognize a soil series may
well fail on standard kappa statistics tests, indicating that there may
be substantial confusion between soil series (e.g. large measurement
error).
Large parts of the world basically contain very few (sparce) field records and hence
one will need to *improvise* to be able to produce soil predictions. One
idea to map such areas is to build attribute tables for representative
soil types, then map the distribution of these soil types in areas
without using local field samples. @Mallavan2010PSS refer to soil
classes that can be predicted far away from the actual sampling
locations as *homosoils*. The homosoils concept is based on the
assumption that locations that share similar environments (e.g. soil-forming factors) are
likely to exhibit similar soils and soil properties also.
```{r cross-section-catena, echo=FALSE, fig.cap="Landform positions and location of a prediction point for the Maungawhau data set.", out.width="100%"}
knitr::include_graphics("figures/Fig_cross_section_catena.png")
```
Expert-based systems also rely on using standard mapping paradigms such
as the concept of relating soil series occurrance to landscape position along a toposequence, or catena .
Fig. \@ref(fig:cross-section-catena), for example, shows a cross-section
derived using the elevation data in Fig. \@ref(fig:catena-maungawhau-3d). An
experienced soil surveyor would visit the area and attempt to produce a diagram
showing a sequence of soil types positioned along this cross-section. This expert
knowledge can be subsequently utilized as manual mapping rules, provided that it
is representative of the area, that it can be formalized through
repeatable procedures and that it can be tested using real observations.
```{r catena-maungawhau-3d, echo=FALSE, fig.cap="A cross-section for the Maungawhau volcano dataset commonly used in R to illustrate DEM and image analysis techniques.", out.width="60%"}
knitr::include_graphics("figures/Fig_catena_Maungawhau_A.jpg")
```
If relevant auxiliary information, such as a Digital Elevation Model (DEM), is
available for the study area, one can derive a number of DEM parameters
that can help to quantify landforms and geomorphological processes.
Landforms can also automatically be classified by computing various DEM
parameters per pixel, or by using knowledge from,
Fig. \@ref(fig:catena-maungawhau) (a sample of the study area) to
objectively extract landforms and associated soils in an area. Such
auxiliary landform information can be informative about the spatial
distribution of the soil, which is the key principle of, for example,
the SOTER methodology [@VanEngelen2012].
The mapping process of knowledge-driven soil mapping can be summarized
as follows [@MacMillan2005CJSS; @MacMillan2010DSM]:
1. *Sample the study area using transects oriented along topographic cross-sections*;
2. *Assign soil types to each landform position and at each sample location*;
3. *Derive DEM parameters and other auxiliary data sets*;
4. *Develop (fuzzy) rules relating the distribution of soil classes to the auxiliary (mainly topographic) variables*;
5. *Implement (fuzzy) rules to allocate soil classes (or compute class probabi;ities) for each grid location*;
6. *Generate soil property values for each soil class using representative observations (class centers)*;
7. *Estimate values of the target soil variable at each grid location using a weighted average of allocated soil class or membership values and central soil property values for each soil class*;
```{r catena-maungawhau, echo=FALSE, fig.cap="Associated values of DEM-based covariates: TWI — Topographic Wetness Index and Valley depth for the cross-section from the previous figure.", out.width="90%"}
knitr::include_graphics("figures/Fig_catena_Maungawhau_B.png")
```
In mathematical terms, soil property prediction based on fuzzy soil
classification values using the SOLIM approach @Zhu2001
[@Zhu2010Geoderma] works as follows:
\begin{equation}
\begin{aligned}
\hat z({s}_0) = \sum\limits_{c_j = 1}^{c_p} {\nu _{c_j} ({s}_0) \cdot z_{c_j} }; & \hspace{.6cm}
\sum\limits_{c_j = 1}^{c_p} {\nu _j ({s}_0)} = 1\end{aligned}
(\#eq:solim)
\end{equation}
where $\hat z({s}_0)$ is the predicted soil attribute at
${s}_0$, $\nu _{c_j} ({s}_0)$ is the membership value of class
$c_j$ at location ${s}_0$, and $z_{c_j}$ is the modal (or best
representative) value of the inferred soil attribute of the $c_j$-th
category. The predicted soil attribute is mapped directly from
membership maps using a linear additive weighing function. Consider the
example of six soil classes `A`, `B`, `C`, `D`, `E` and `F`. The
attribute table indicates that soil type `A` has 10%, `B` 10%, `C` 30%,
`D` 40%, `E` 25%, and `F` 35% of clay. If the membership values at a
grid position are 0.6, 0.2, 0.1, 0.05, 0.00 and 0.00, then
Eq.\@ref(eq:solim) predicts the clay content as 13.5%.
It is obvious from this work flow that the critical aspects that
determine the accuracy of the final predictions are the selection of
where we locate the cross-sections and the *representative soil
profiles* and the strength of the relationship between the resulting
soil classes and target soil properties. @Qi2006Geoderma, for example,
recommended that the most representative values for soil classes can be
identified, if many soil profiles are available, by finding the sampling
location that occurs at the grid cell with highest similarity value for a
particular soil class. Soil mappers are now increasingly looking for
ways to combine expert systems with statistical data mining and
regression modelling techniques.
One problem of using a supervised mapping system, as described above, is
that it is difficult to get an objective estimate of the prediction
error (or at least a robust statistical theory for this has not yet been
developed). The only possibility to assess the accuracy of such maps would
be to collect independent validation samples and estimate the mapping
accuracy following the methods described in
section \@ref(accuracy-assessment). So, in fact, expert-based
systems also depend on statistical sampling and inference for evaluation of
the accuracy of the resulting map.
### Geostatistics-driven soil mapping (pedometric mapping) {#regression-kriging}
Pedometric mapping is based on using statistical models to predict soil
properties, which leads us to the field of geostatistics. Geostatistics
treats the soil as a realization of a *random process*
[@Webster2001Wiley]. It uses the point observations and gridded covariates to predict
the random process at unobserved locations, which yields conditional
probability distributions, whose spread (i.e. standard deviation, width
of prediction intervals) explicitly characterizes the uncertainty
associated with the predictions. As mentioned previously in
section \@ref(pedometric-mapping), geostatistics is a data-driven approach to
soil mapping in which georeferenced point samples are the key input to
map production.
Traditional geostatistics has basically been identified with various
ways of variogram modeling and kriging [@Haining2010GEAN780].
Contemporary geostatistics extends linear models and plain kriging
techniques to non-linear and hybrid models; it also extends purely
spatial models (2D) to 3D and space-time models
[@schabenberger2005statistical; @Bivand2008Springer; @Diggle2007Springer; @cressie2011statistics].
Implementation of more sophisticated geostatistical models for soil
mapping is an ongoing activity and is quite challenging
(computationally), especially in the case of fine-resolution mapping of
large areas [@Hengl2017SoilGrids250m].
Note also that geostatistical mapping is often restricted to
quantitative soil properties. Soil prediction models that predict
categorical soil variables such as soil type or soil colour class are
often quite complex (see e.g. @Hengl2007Geoderma and @Kempen2009Geoderma
for a discussion). Most large scale soil mapping projects also require
predictions in 3D, or at least 2D predictions (layers) for several depth
intervals. This can be done by treating each layer separately in a 2D
analysis, possibly by taking vertical correlations into account, but
also by direct 3D geostatistical modelling. Both approaches are reviewed
in the following sections.
Over the last decade statisticians have recommended using
*model-based geostatistics* as the most reliable framework for spatial predictions.
The essence of model-based statistics is that *“the statistical methods
are derived by applying general principles of statistical inference based
on an explicitly declared stochastic model of the data generating mechanism”*
[@Diggle2007Springer; @Brown2014JSS]. This avoids *ad hoc*, heuristic
solution methods and has the advantage that it yields generic and
portable solutions. Some examples of diverse geostatistical models are
given in @Brown2014JSS.
The basic geostatistical model treats the soil property of interest as
the sum of a deterministic trend and a stochastic residual:
\begin{equation}
Z({s}) = m({s}) + \varepsilon({s})
(\#eq:ukm-gstat)
\end{equation}
where $\varepsilon$ and hence $Z$ are normally distributed stochastic
processes. This is the same model as that presented in Eq.\@ref(eq:ukm),
with in this case $\varepsilon = \varepsilon ' + \varepsilon ''$ being
the sum of the spatially correlated and spatially uncorrelated
stochastic components. The mean of $\varepsilon$ is taken to be zero.
Note that we use capital letter $Z$ because we use a probabilistic
model, i.e. we treat the soil property as an outcome of a stochastic
process and define a model of that stochastic process.
Ideally, the spatial variation of the stochastic residual
of Eq.\@ref(eq:ukm-gstat) is much less than that of
the dependent variable.
When the assumption of normality is not realistic, such as when the
frequency distribution of the residuals at observation locations is very
skewed, the easiest solution is to take a Transformed Gaussian approach
[@Diggle2007Springer ch3.8] in which the Gaussian geostatistical model
is formulated for a transformation of the dependent variable (e.g. logarithmic,
logit, square root, Box-Cox transform). A more advanced
approach would drop the normal distribution approach entirely and assume
a *Generalized Linear Geostatistical Model*
[@Diggle2007Springer; @Brown2014JSS] but this complicates the
statistical analysis and prediction process dramatically. The
Transformed Gaussian approach is nearly as simple as the Gaussian
approach although the back-transformation requires attention, especially
when the spatial prediction includes a change of support (leading to block kriging).
If this is the case, it may be necessary to
use a stochastic simulation approach and derive the predictions and
associated uncertainty (i.e. the conditional probability distribution)
using numerical simulations.
```{block2, type="rmdnote"}
Model-based geostatistics is based on using an explicitly declared
stochastic model of the data generating mechanism. One basic
geostatistical model of soil variation is to treat the soil property of
interest as the sum of a deterministic trend (modelled via some
regression function) and a zero-mean stochastic residual.
```
The trend part of Eq.\@ref(eq:ukm-gstat) (i.e. $m$) can take many forms.
In the simplest case it would be a constant but usually it is taken as
some function of known, exhaustively available covariates. This is where
soil mapping can benefit from other sources of information and can
implement Jenny’s *State Factor Model of soil formation*
[@Jenny1968; @jenny1994factors; @Heuvelink2001Geoderma; @McBratney2011HSS],
which has been known from the time of Dokuchaev [@Florinsky2012Dokuchaev].
The covariates are often maps of environmental properties that are known
to be related to the soil property of interest (e.g. elevation, land
cover, geology) but could also be the outcome of a mechanistic soil
process model (such as a soil acidification model, a soil nutrient
leaching model or a soil genesis model). In the case of the latter one
might opt for taking $m$ equal to the output of the deterministic model,
but when the covariates are related environmental properties one must
define a structure for $m$ and introduce parameters to be estimated from
paired observations of the soil property and covariates. One of the
simplest approaches is to use *multiple linear regression* to predict
values at some new location ${s}_0$ [@kutner2005applied]:
\begin{equation}
m({s}_0 ) = \sum\limits_{j = 0}^p { \beta _j \cdot X_j ({s}_0 )}
(\#eq:MRK2D)
\end{equation}
where $\beta _j$ are the regression model coefficients, $\beta _0$ is
the intercept, $j=1,\ldots,p$ are *covariates* or explanatory variables
(available at all locations within the study area of interest
$\mathbb{A}$), and $p$ is the number of covariates. Eq.\@ref(eq:MRK2D)
can also include categorical covariates (e.g. maps of land cover,
geology, soil type) by representing these by as many binary dummy
variables as there are categories (minus one, to be precise, since an
intercept is included in the model). In addition, transformed covariates
may also be included or interactions between covariates. The latter is
achieved by extending the set of covariates with products or other
mixtures of covariates. However, note that this will dramatically
increase the number of covariates. The risk of considering a large number
of covariates is that it may become difficult to obtain reliable
estimates of the regression coefficients. Also one may run the
risk of *multicollinearity* — the property of covariates being mutually
strongly correlated (as indicated by @Jenny1968 already in
[-@Jenny1968]).
The advantage of Eq.\@ref(eq:MRK2D) is that it is linear in the unknown
coefficients, which makes their estimation relatively straightforward
and also permits derivation of the uncertainty about the regression
coefficients ($\beta$). However, in many practical cases, the linear
formulation may be too restrictive and that is why alternative
structures have been extensively developed to establish the relationship
between the dependent and covariates. Examples of these so-called
*‘statistical learning’* and/or *‘machine learning’* approaches are:
- *artificial neural networks* [@yegnanarayana2004artificial],
- *classification and regression trees* [@breiman1993classification],
- *support vector machines* [@hearst1998support],
- *computer-based expert systems*,
- *random forests* [@breiman2001random; @meinshausen2006quantile],
Statistical treatment of many of these methods is given in @hastie2009elements and @kuhn2013applied.
Care needs to be taken when using machine learning techniques, such as random forest,
because such techniques are more sensitive to noise and blunders in the data.
Most methods listed above require appropriate levels of expertise to
avoid pitfalls and incorrect use but, when feasible and used properly,
these methods should extract maximal information about the target
variable from the covariates [@Statnikov2008; @kanevski2009machine].
The trend ($m$) relates covariates to soil properties and for this it
uses a soil-environment correlation model — the so-called *CLORPT
model*, which was formulated by Jenny in 1941 (a [-@jenny1994factors]
reprint from that book is also available). @McBratney2003Geoderma
further formulated an extension of the CLORPT model
known as the *“SCORPAN”* model.
The CLORPT model may be written as [@jenny1994factors; @Florinsky2012Dokuchaev]:
\begin{equation}
S = f (cl, o, r, p, t)
(\#eq:clorpt2)
\end{equation}
where $S$ stands for soil (properties and classes), $cl$ for climate,
$o$ for organisms (including humans), $r$ is relief, $p$ is parent
material or geology and $t$ is time. In other words, we can assume that
the distribution of both soil and vegetation (at least in a natural system)
can be at least partially explained by environmental conditions.
Eq.\@ref(eq:clorpt2) suggests that soil is a result of environmental
factors, while in reality there are many feedbacks and soil, in turn, influences
many of the factors on the right-hand side of Eq.\@ref(eq:clorpt2), such
as $cl$, $o$ and $r$.
Uncertainty about the estimation errors of model coefficients can fairly
easily be taken into account in the subsequent prediction analysis if
the model is linear in the coefficients, such as in Eq.\@ref(eq:MRK2D).
In this book we therefore restrict ourselves to this case but allow that
the $X_j$’s in Eq.\@ref(eq:MRK2D) are derived in various ways.
Since the stochastic residual of Eq.\@ref(eq:ukm-gstat) is normally
distributed and has zero mean, only its variance-covariance remains to
be specified:
\begin{equation}
C\left[Z({s}),Z({s}+{h})\right] = \sigma (s) \cdot \sigma(s+{h}) \cdot \rho ({h})
\end{equation}
where ${{h}}$ is the separation distance between two locations. Note
that here we assumed that the correlation function $\rho$ is invariant
to geographic translation (i.e., it only depends on the distance
${h}$ between locations and not on the locations themselves). If in
addition the standard deviation $\sigma$ would be spatially invariant
then $C$ would be *second-order stationary*. These type of simplifying
assumptions are needed to be able to estimate the variance-covariance
structure of $C$ from the observations. If the standard deviation is
allowed to vary with location, then it could be defined in a similar way
as in Eq.\@ref(eq:MRK2D). The correlation function $\rho$ would be
parameterised to a common form (e.g. exponential, spherical, Matérn),
thus ensuring that the model is statistically valid and
*positive-definite*. It is also quite common to assume isotropy, meaning
that two-dimensional geographic distance ${{h}}$ can be reduced to
one-dimensional Euclidean distance $h$.
Once the model has been defined, its parameters must be estimated from
the data. These are the regression coefficients of the trend (when
applicable) and the parameters of the variance-covariance structure of
the stochastic residual. Commonly used estimation methods are least
squares and maximum likelihood. Both methods have been extensively
described in the literature (e.g. @Webster2001Wiley and
@Diggle2007Springer). More complex trend models may also use the same
techniques to estimate their parameters, although they might also need
to rely on more complex parameter estimation methods such as genetic
algorithms and *simulated annealing* [@lark2003fitting].
```{block2, type="rmdnote"}
Spatial prediction under the linear Gaussian model with a trend boils
down to *regression-kriging* when the trend coefficients are determined
prior to kriging i.e. to *universal kriging* or *kriging with external drift* when they are estimated together with kriging weights. Both
computational approaches — regression-kriging, kriging with external
drift or universal kriging — yield exactly the same predictions if run
using the same inputs and assuming the same (global) geostatistical
model [@hengl2007regression].
```
The optimal spatial prediction in the case of a model
Eq.\@ref(eq:ukm-gstat) with a linear trend Eq.\@ref(eq:MRK2D) and a
normally distributed residual is given by the well-kown *Best Linear
Unbiased Predictor* (BLUP):
\begin{equation}
\hat z({{{s}}_0}) = {{X}}_{{0}}^{{T}}\cdot \hat{{\beta}} + \hat{{\lambda}}_{{0}}^{{T}}\cdot({{z}} - {{X}}\cdot \hat{{\beta}} )
(\#eq:BLUP)
\end{equation}
where the regression coefficients and kriging weights are estimated
using:
\begin{equation}
\begin{aligned}
\hat{{\beta}} &= {\left( {{{{X}}^{{T}}}\cdot{{{C}}^{ - {{1}}}}\cdot{{X}}} \right)^{ - {{1}}}}\cdot{{{X}}^{{T}}}\cdot{{{C}}^{ - {{1}}}}\cdot{{z}} \\
\hat{{\lambda}}_{{0}} &= {C}^{ - {{1}}} \cdot {{c}}_{{0}} \notag\end{aligned}
(\#eq:betas)
\end{equation}
and where ${{X}}$ is the matrix of $p$ predictors at the $n$ sampling
locations, $\hat{{\beta}}$ is the vector of estimated regression
coefficients, ${C}$ is the $n$$n$ variance-covariance matrix of
residuals, ${c}_{{0}}$ is the vector of $n$$1$ covariances at the
prediction location, and ${\lambda}_{{0}}$ is the vector of $n$
kriging weights used to interpolate the residuals. Derivation of BLUP
for spatial data can be found in many standard statistical books e.g.
@Stein1999Springer, @Christensen2001Springer [p.277],
@Venables2002Springer [p.425–430] and/or @schabenberger2005statistical.
Any form of kriging computes the conditional distribution of
$Z({{s}}_0)$ at an unobserved location ${{s}}_0$ from the
observations $z({{s}}_1 )$, $z({{s}}_2 ), \ldots , z({{s}}_n )$
and the covariates ${{X}}({{s}}_0)$ (matrix of size $p \times n$).
From a statistical perspective this is straightforward for the case of a
linear model and normally distributed residuals. However, solving large
matrices and more sophisticated model fitting algorithms such as
restricted maximum likelihood can take a significant amount of time if
the number of observations is large and/or the prediction grid dense.
Pragmatic approaches to addressing constraints imposed by large data
sets are to constrain the observation data set to local neighbourhoods
or to take a multiscale nested approach.
Kriging not only yields optimal predictions but also quantifies the
prediction error with the kriging standard deviation. Prediction
intervals can be computed easily because the prediction errors are
normally distributed. Alternatively, uncertainty in spatial predictions
can also be quantified with spatial stochastic simulation. While kriging
yields the *‘optimal’* prediction of the soil property at any one
location, spatial stochastic simulation yields a series of possible
values by sampling from the conditional probability distribution. In
this way a large number of *‘realizations’* can be generated, which can
be useful when the resulting map needs to be back-transformed or when it
is used in a spatial uncertainty propagation analysis. Spatial
stochastic simulation of the linear Gaussian model can be done using a
technique known as sequential Gaussian simulation
[@Goovaerts1997Oxford; @Yamamoto2008]. It is not, in principal, more
difficult than kriging but it is certainly numerically more demanding i.e.
takes significantly more time to compute.
### Regression-kriging (generic model) {#RK-generic}
Ignoring the assumptions about the cross-correlation between the trend
and residual components, we can extend the regression-kriging model and
use any type of (non-linear) regression to predict values (e.g.
regression trees, artificial neural networks and other machine learning
models), calculate residuals at observation locations, fit a variogram
for these residuals, interpolate the residuals using ordinary or simple
kriging, and add the result to the predicted regression part. This means
that RK can, in general, be formulated as:
\begin{equation}
{\rm prediction} \; = \;
\begin{matrix}