forked from Envirometrix/PredictiveSoilMapping
-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-Introduction.Rmd
executable file
·2213 lines (1841 loc) · 117 KB
/
01-Introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Soil resource inventories and soil maps {#soil-introduction}
*Edited by: Hengl T. & MacMillan R.A.*
## Introduction
This chapter presents a description and discussion of soils and
conventional soil inventories framed within the context of Predictive Soil Mapping (PSM). Soils,
their associated properties, and their spatial and temporal distributions are the
central focus of PSM. We discuss how the products and
methods associated with conventional soil mapping relate to new, and
emerging, methods of PSM and automated soil mapping. We discuss similarities and
differences, strengths and weaknesses of conventional soil mapping (and
its inputs and products) relative to PSM.
The universal model of soil variation presented further in detail in
chapter \@ref(statistical-theory) is adopted as a framework for comparison of
conventional soil mapping and PSM. Our aim is to show how the products
and methods of conventional soil mapping can complement, and contribute to,
PSM and equally, how the theories and methods of
PSM can extend and strengthen conventional soil mapping.
PSM aims to implement tools and methods that can be supportive of
growth, change and improvement in soil mapping and that can stimulate a
rebirth and reinvigoration of soil inventory activity globally.
## Soils and soil inventories
### Soil: a definition
Soil is a natural body composed of biota and air, water and minerals,
developed from unconsolidated or semi-consolidated material that forms the
topmost layer of the Earth’s surface [@chesworth2008encyclopedia]. The
upper limit of the soil is either air, shallow water, live plants or
plant materials that have not begun to decompose. The lower limit is
defined by the presence of hard rock or the lower limit of biologic
activity [@Richter1995; @SSDS1993]. Although soil profiles up to tens
of meters in depth can be found in some tropical areas [@Richter1995], for
soil classification and mapping purposes, the lower limit of soil is often
arbitrarily set to 2 m (http://soils.usda.gov/education/facts/soil.html).
Soils are rarely described to depths beyond 2 m and many soil sampling projects
put a primary focus on the upper (0–100 cm) depths.
The chemical, physical and biological properties of the soil differ from
those of unaltered (unconsolidated) parent material, from which the soil
is derived over a period of time under the influence of climate, organisms
and relief effects. Soil should show a capacity to support life,
otherwise we are dealing with inert unconsolidated parent material. Hence, for
purposes of developing statistical models to predict soil
properties using PSM, it proves useful to distinguish between *actual*
and *potential* soil areas (see further section \@ref(soil-covariates)).
A significant aspect of the accepted definition of soil is that it is
seen as a *natural body* that merits study, description,
*classification* and interpretation in, and of, itself. As a *natural
body* soil is viewed as an object that occupies space, has defined
physical dimensions and that is more than the sum of its individual
properties or attributes. This concept requires that all properties of
soils be considered collectively and simultaneously in terms of a completely integrated
natural body [@SSDS1993]. A consequence of this, is that one must
generally assume that all soil properties covary in space in lockstep
with specific named soils and that different soil properties do not
exhibit different patterns of spatial variation independently.
From a management point of view, soil can be seen from at least three
perspectives. It is a:
- *Resource* of materials — It contains quantities of unconsolidated
materials, rock fragments, texture fractions, organic carbon,
nutrients, minerals and metals, water and so on.
- *Stabilizing medium / ecosystem* — It acts as a medium that supports
both global and local processes from carbon and nitrogen fixation to
retention and transmission of water, to provision of nutrients and
minerals and so on.
- *Production system* — Soil is the foundation for plant growth. In
fact, it is the basis of all sustainable terrestrial
ecosystem services. It is also a source of livelihood for people
that grow crops and livestock.
According to @frossard2006function there are six key functions of soil:
1. *food and other biomass production*,
2. *storage, filtering, and transformation of water, gases and
minerals*,
3. *biological habitat and gene pool*,
4. *source of raw materials*,
5. *physical and cultural heritage* and
6. *platform for man-made structures: buildings, highways*.
Soil is the Earth’s biggest carbon store containing 82% of total terrestrial
organic carbon [@Lal2004Science].
### Soil variables
Knowledge about soil is often assembled and catalogued through *soil
resource inventories*. Conventional soil resource inventories describe
the geographic distribution of *soil bodies* i.e. *polypedons*
[@Wysocki2005Geoderma]. The spatial distribution of soil properties is
typically recorded and described through reference to mapped soil
individuals and not through separate mapping of individual soil
properties. In fact, the definition of a soil map in the US Soil Survey
Manual specifically *“excludes maps showing the distribution of a single
soil property such as texture, slope, or depth, alone or in limited
combinations; maps that show the distribution of soil qualities such as
productivity or erodibility; and maps of soil-forming factors, such as
climate, topography, vegetation, or geologic material”* [@SSDS1993].
In contrast to conventional soil mapping, PSM is primarily interested
in representing the spatial distribution of *soil variables* — measurable
or descriptive attributes commonly collected through field sampling
and then either measured *in-situ* or *a posteriori* in a laboratory.
Soil variables can be roughly grouped into:
1. *quantities of some material* ($y \in [0 \rightarrow +\infty]$);
2. *transformed or standardized quantities* such as pH
($y \in [-\infty \rightarrow +\infty]$)
3. *relative percentages* such as mass or volume percentages
($y \in [0 \rightarrow 1]$);
4. *boolean values e.g. showing occurrence and/or non-occurrence* of
qualitative soil attributes or objects ($y \in [0,1]$);
5. *categories* (i.e. factors) such as soil classes
($y \in [a,b,\ldots,x]$);
6. *probabilities* e.g. probabilities of occurrence of some class or object ($p(y) \in [0 \rightarrow 1]$).
7. *censored values* e.g. depth to bedrock which is often observed only up to 2 m.
The nature of a soil variable determines how the attribute is modeled
and presented on a map in PSM. Some soil variables are
normally described as discrete entities (or classes), but classes can also be
depicted as continuous quantities on a map in the form of
probabilities or memberships
[@DeGruijter1997Geoderma; @McBratney2003Geoderma; @Kempen2009Geoderma; @Odgers201130].
For example, a binary soil variable (e.g. the presence/absence of a
specific layer or horizon) can be modeled as a binomial random variable
with a logistic regression model. Spatial prediction (mapping) with this
model gives a map depicting (continuous) probabilities in the range of
0–1. These probabilities can be used to determine the most likely presence/absence
of a class at each prediction location, resulting, then, in a discrete
representation of the soil attribute variation.
In that context, the aims of most soil resource inventories consist of the
identification, measurement, modelling, mapping and interpretation of
soil variables that represent transformed or standardized quantities of
some material, relative percentages, occurrence and/or non-occurrence of
qualitative attributes or objects, and/or soil categories.
### Primary and secondary soil variables
Soil properties can be *primary* or *inferred* (see further section \@ref(soil-variables-chapter)).
Primary properties are properties that can be measured directly in the
field or in the laboratory. Inferred properties are properties that
cannot be measured directly (or are difficult or too expensive to
measure) but can be inferred from primary properties, for example through
pedotransfer functions [@Wosten2001JH; @wosten2013soil].
@Dobos2006digital also distinguish between primary and secondary soil
properties and *‘functional’* soil properties representing *soil
functions* or *soil threats*. Such soil properties can be directly used
for financial assessment or for decision making. For example, soil
organic carbon content in grams per kilogram of soil is the primary soil
property, while organic carbon sequestration rate in kilograms per unit
area per year is a *functional* soil property.
## Soil mapping
### What are soil resource inventories?
Soil resource inventories describe the types, attributes and geographic
distributions of soils in a given area. They can consist of spatially
explicit maps or of non-spatial lists. Lists simply itemize the kinds
and amounts of different soils that occupy an area to address questions
about what soils and soil properties occur in an area.
Maps attempt to portray, with some degree of detail, the patterns of
spatial variation in soils and soil properties, within limits imposed
by mapping scale and resources.
According to the USDA Manual of Soil Survey [@SSDS1993], a soil survey:
- describes the characteristics of the soils in a given area,
- classifies the soils according to a standard system of
classification,
- plots the boundaries of the soils on a map, and
- makes predictions about the behavior of soils.
The information collected in a soil survey helps in the development of
land-use plans and evaluates and predicts the effects of land use on the
environment. Hence, the different uses of the soils and how the response
of management affects them need to be considered.
This attribute of conventional soil mapping (*soil individuals*) represents a significant
difference compared to PSM, where the object of study is
frequently an individual soil property and the objective is to map the
pattern of spatial distribution of that property (over some depth
interval), and independent from consideration of the spatial distribution
of soil individuals or other soil properties.
Soil maps give answers to three basic questions: (1) what is mapped?
(2) what is the predicted value? and (3) where is it? Thematic accuracy
of a map tells us how accurate predictions of targeted soil properties
are overall, while the spatial resolution helps us locate features
with some specified level of spatial precision.
```{block2 type="rmdnote"}
The most common output of a soil resource inventory is a *soil map*. Soil maps convey information
about the geographic distribution of named soil types in a given area.
They are meant to help answer the questions *“what is here”* and *“where is what”* [@Burrough1998OUP].
```
Any map is an abstraction and generalization of reality. The only
perfect one-to-one representation of reality is reality itself. To fully
describe reality one would need a model at 1:1 scale at which 1 m$^2$ of reality
was represented by 1 m$^2$ of the model. Since this is not feasible, we condense
and abstract reality in such a way that we hope to describe the major
differences in true space at a much reduced scale in model (map) space.
When this is done for soil maps, it needs to be understood that a soil map can
only describe that portion of the total variation that is systematic and
has structure and occurs over distances that are as large as, or larger
than, the smallest area that can be feasibly portrayed and described at
any given scale. Issues of scale and resolution are discussed in greater
detail in section \@ref(downscaling-upscaling).
An important functionality of PSM is the production and distribution of
maps depicting the spatial distribution of soils and, more specifically,
soil attributes. In this chapter we, therefore, concentrate on
describing processes for producing maps as spatial depictions of the
patterns of arrangement of soil attributes and soil types.
### Soil mapping approaches and concepts
As mentioned previously, spatial information about the distribution of
soil properties or attributes, i.e. soil maps or GIS layers focused on
soil, are produced through soil resource inventories, also known as soil
surveys or soil mapping projects
[@Burrough1971; @Avery1987; @Wysocki2005Geoderma; @Legros2006SP]. The
main idea of soil survey is, thus, the production and dissemination of soil
information for an area of interest, usually to address a specific
question or questions of interest i.e. production of soil maps and soil
geographical databases. Although soil surveyors are usually not *per se*
responsible for final use of soil information, how soil survey information
is used is increasingly important.
In statistical terms, the main objective of soil mapping is to describe
the spatial variability i.e. spatial complexity of soils, then represent
this complexity using maps, summary measures, mathematical models and
simulations. Some known **sources of spatial variability** in soil variables
are:
1. *Natural spatial variability in 2D (different at various scales),
mainly due to climate, parent material, land cover and land use*;
2. *Variation by depth*;
3. *Temporal variation due to regular or periodic changes in the
ecosystem*;
4. *Measurement error (in situ or in lab)*;
5. *Spatial location error*;
6. *Small scale variation*;
```{block2 type="rmdnote"}
In statistical terms, the main objective of
soil mapping is to describe the spatial complexity of soils, then
represent this complexity using maps, summary measures, mathematical
models and simulations. From the application point of view, the main
application objective of soil mapping is to accurately predict response of a
soil(-plant) ecosystem to various soil management strategies.
```
Soil mappers do their best to try to explain the first two items above and
minimize, or exclude from modelling, the remaining components: temporal
variation, measurement error, spatial location error and small scale
variation.
```{r soil-crop-model-scheme, echo=FALSE, fig.cap="Inputs to soil-plant, soil-hydrology or soil-ecology models and their relationship.", out.width="100%", out.extra="angle=0"}
knitr::include_graphics("figures/Fig_soil_crop_model_scheme.png")
```
From the application point of view, the main objective of soil mapping
is to accurately predict soil properties and their response to possible
or actual management practices
(Fig. \@ref(fig:soil-crop-model-scheme)). In other words, if the soil
mapping system is efficient, we should be able to accurately predict
the behavior of soil-plant, soil-hydrology or similar ecosystems to various
soil management strategies, and hence provide useful advice to
agronomists, engineers, environmental modelers, ecologists and similar.
We elect here to recognize two main variants of soil mapping which we
refer to as *conventional soil mapping* and *pedometric* or *predictive soil mapping* as
described and discussed below (Fig. \@ref(fig:comparison-dsm)).
```{r comparison-dsm, echo=FALSE, fig.cap="Matrix comparison between traditional (primarily expert-based) and automated (data-driven) soil mapping.", out.width="90%"}
knitr::include_graphics("figures/Table_comparison_DSM.png")
```
### Theoretical basis of soil mapping: in context of the universal model of spatial variation {#soil-mapping-theory}
Stated simply, *“the scientific basis of soil mapping is that the
locations of soils in the landscape have a degree of predictability”*
[@Miller1979]. According to the USDA Soil Survey Manual, *“The
properties of soil vary from place to place, but this variation is not
random. Natural soil bodies are the result of climate and living
organisms acting on parent material, with topography or local relief
exerting a modifying influence and with time required for soil-forming
processes to act. For the most part, soils are the same wherever all
elements of these five factors are the same. Under similar environments in
different places, soils are expected to be similar. This regularity permits prediction
of the location of many different kinds of soil”* [@SSDS1993].
@Hudson2000SSSAJ considers that this *soil-landscape paradigm* provides
the fundamental scientific basis for soil survey.
In the most general sense, both conventional soil mapping and PSM
represent ways of applying the *soil-landscape paradigm* via the universal model of spatial
variation, which is explained in greater detail in
chapter \@ref(statistical-theory). @Burrough1998OUP [p.133] described the
universal model of soil variation as a special case of the universal
model of spatial variation. This model distinguishes between three major
components of soil variation: (1) a deterministic component (trend), (2)
a spatially correlated component and (3) pure noise.
\begin{equation}
Z({\bf{s}}) = m({\bf{s}}) + \varepsilon '({\bf{s}}) + \varepsilon ''({\bf{s}})
(\#eq:univ-var)
\end{equation}
where $\bf{s}$ is the two-dimensional location, $m({\bf{s}})$ is the
deterministic component, $\varepsilon '({\bf{s}})$ is the spatially
correlated stochastic component and $\varepsilon ''({\bf{s}})$ is the
pure noise (micro-scale variation and measurement error).
```{block2 type="rmdnote"}
The *universal model of soil variation* assumes that
there are three major components of soil variation: (1) a
deterministic component (function of covariates), (2) a spatially
correlated component (treated as stochastic) and (3) pure noise.
```
The deterministic part of the equation describes that part of the
variation in soils and soil properties that can be explained by
reference to some model that relates observed and measured variation to
readily observable and interpretable factors that control or influence
this spatial variation. In conventional soil mapping, this model is the
empirical and knowledge-based *soil-landscape paradygm*
[@Hudson2000SSSAJ]. In PSM, a wide variety of statistical and machine learning
models have been used to capture and apply the soil-landscape paradigm
in a quantitative and optimal fashion using the CLORPT model:
\begin{equation}
S = f (cl, o, r, p, t)
(\#eq:clorpt)
\end{equation}
where $S$ stands for soil (properties and classes), $cl$ for climate,
$o$ for organisms (including humans), $r$ is relief, $p$ is parent
material or geology and $t$ is time. The Eq. \@ref(eq:clorpt) is the
CLORPT model originally presented by Jenny [-@jenny1994factors].
@McBratney2003Geoderma re-conceptualized and extended the CLORPT model via the
*“scorpan”* model in which soil properties are modeled as a function of:
- (auxiliary) **s**oil classes or properties,
- **c**limate,
- **o**rganisms, vegetation, fauna or human activity,
- **r**elief,
- **p**arent material,
- **a**ge i.e. the time factor,
- **n** space, spatial context or spatial position,
Pedometric models are quantitative in that they capture
relationships between observed soils, or soil properties, and
controlling environmental influences (as represented by environmental
co-variates) using statistically-formulated expressions. Pedometric
models are seen as optimum because, by design, they minimize the
variance between observed and predicted values at all locations with
known values. So, no better model of prediction exists for that
particular set of observed values at that specific set of locations.
Both conventional and pedometric soil mapping use models to explain
the deterministic part of the spatial variation in soils and soil properties.
These models differ mainly in terms of whether they are empirical and
subjective (conventional) or quantitative and objective (pedometric).
Both can be effective and the empirical and subjective models based on expert knowledge have, until
recently, proven to be the most cost effective and widely applied for
production of soil maps by conventional means.
```{block2 type="rmdnote"}
In its essence, the objective
of PSM is to produce optimal unbiased predictions of a mean value at some new location along with the uncertainty associated with the prediction, at the finest possible resolution.
```
One way in which PSM differs significantly from
conventional soil mapping in terms of the universal model of soil
variation is in the use of geostatistics or machine learning to
quantitatively correct for error in predictions, defined as the
difference between predicted and observed values at locations with known
values. Conventional soil mapping has no formal or quantitative
mechanism for correcting an initial set of predicted values by computing
the difference between predicted and observed values at sampled
locations and then correcting initial values at all locations in
response to these observed differences. PSM uses
geostatistics to determine (via the semi-variogram) if the differences between predicted and
observed values (the residuals) exhibit spatial structure (e.g. are
predictable). If they do exhibit spatial structure, then it is useful
and reasonable to interpolate the computed error at known locations to
predict the likely magnitude of error of predictions at all locations
[@hengl2007regression].
Neither conventional soil mapping nor PSM can do more
than simply describe and quantify the amount of variation that is not
predictable and has to be treated as pure noise. Conventional soil maps
can be criticized for ignoring this component of the total variation and
typically treating it as if it did not exist. For many soil properties,
short range, local variation in soil properties that cannot be explained
by either the deterministic or stochastic components of the universal
model of soil variation can often approach, or even exceed,
a significant proportion (e.g. 30–40%) of the
total observed range of variation in any given soil property. Such
variation is simply not mappable but it exists and should be identified
and quantified. We do our users and clients a disservice when we fail to
alert them to the presence, and the magnitude, of spatial variation that
is not predictable. In cases where the local spatial variation is not
predictable (or mappable) the best estimate for any property of interest
is the mean value for that local area or spatial entity (hence not a map).
### Traditional (conventional) soil mapping {#conventional-mapping}
Traditional soil resource inventories are largely based on manual
application of expert tacit knowledge through the soil-landscape
paradigm [@Burrough1971; @Hudson2000SSSAJ]. In this approach, soil
surveyors develop and apply conceptual models of where and how soils
vary in the landscape through a combination of field inspections to
establish spatial patterns and photo-interpretation to extrapolate the
patterns to similar portions of the landscape
(Fig. \@ref(fig:soilsurvey-scheme)). Traditional soil mapping
procedures mainly address the deterministic part of the universal model
of soil variation.
```{r soilsurvey-scheme, echo=FALSE, fig.cap="Typical soil survey phases and intermediate and final products.", out.width="100%", fig.pos="h"}
knitr::include_graphics("figures/Fig_soilsurvey_scheme.png")
```
Conventional (traditional) manual soil mapping typically adheres to the
following sequence of steps, with minor variations
[@McBratney2003Geoderma]:
1. *Specify the objective(s) to be served by the soil survey and
resulting map*;
2. *Identify which attributes of the soil or land need to be observed,
described and mapped to meet the specified objectives*;
3. *Identify the minimum sized area that must be described and the
corresponding scale of mapping to meet the specified objectives*;
4. *Collate and interpret existing relevant land resource information
(geology, vegetation, climate, imagery) for the survey area*;
5. *Conduct preliminary field reconnaissance and use these observations
to construct a preliminary legend of conceptual mapping units
(described in terms of soil individuals)*;
6. *Apply the preliminary conceptual legend using available source
information to delineate initial map unit boundaries (pre-typing)*;
7. *Plan and implement a field program to collect samples and
observations to obtain values of the target soil attributes
(usually classes) at known locations to test and refine initial
conceptual prediction models*;
8. *Using field observations, refine the conceptual models and finalize
map unit legends and boundaries to generate conventional area–class
soil maps*;
9. *Conduct a field correlation exercise to match mapping with adjacent
areas and to confirm mapping standards were adhered to*;
10. *Select and analyse representative soil profile site data to
characterize each mapped soil type and soil map unit*;
11. *Prepare final documentation describing all mapped soils and
soil map units (legends) according to an accepted format*;
12. *Publish and distribute the soil information in the form of maps,
geographical databases and reports*;
Expert knowledge about soil-landform patterns is generally used to
produce manually drawn polygon maps that outline areas of different
dominant soils or combinations of soils — *soil map units* (see
Figs. \@ref(fig:smu-aggregation) and \@ref(fig:from-photointerpretation-to-soilmap)). Soil
map units (polygons of different soil types) are described in terms of the
composition of soil classes (and often also landscape attributes) within
each unit, with various soil physical and chemical variables attached to
each class. Most commonly, the objective of conventional soil mapping is
to delineate recognizable portions of a landscape (soil–landform units)
as polygons in which the variation of soils and soil properties is
describable and usually (but not always) more limited than between polygons. Because most
soil mapping projects have limited resources and time, soil surveyors
can not typically afford to survey areas in great detail (e.g. 1:5000)
so as to map actual *polypedons* (complex of contiguous pedons).
As a compromise, the survey team generally has to choose some best achievable
target scale (e.g. 1:10,000 – 1:50,000).
Maps produced at some initial scale can be further generalized, depending
on the application and user demands [@Wysocki2005Geoderma].
```{r smu-aggregation, echo=FALSE, fig.cap="Three basic conceptual scales in soil mapping: (left) most detailed scale showing the actual distribution of soil bodies, (center) target scale i.e. scale achievable by the soil survey budget, (right) generalized intermediate scale or coarse resolution maps. In a conventional soil survey, soils are described and conceptualized as groups of similar pedons (smallest elements of 1–10 square-m), called “polypedons” — the smallest mappable entity. These can then be further generalized to soil map units, which can be various combinations (systematic or random) of dominant and contrasting soils (inclusions).", out.width="85%"}
knitr::include_graphics("figures/Fig_SMU_aggregation.png")
```
Where variation within a polygon is systematic and predictable, the
pattern of variation in soils within any given polygon is often
described in terms of the most common position, or positions, in the
landscape occupied by each named soil class [@MacMillan2005CJSS]. In other cases, soil
patterns are not clearly related to systematic variations in observable
landscape attributes and it is not possible to describe where each named
soil type is most likely to occur within any polygon or why.
Conventional soil mapping has some limitations related to the fact that
mapping concepts (mental models) are not always applied consistently by different mappers.
Application of conceptual models is largely manual and it is difficult to automate.
In addition, conventional soil survey methods differ from country to country, and even within a single
region, depending largely on the scope and level-of-detail of the
inventory [@Schelling1970Geoderma; @SSS1983USDA; @Rossiter2001]. The key
advantages of conventional soil maps, on the other hand, are that:
- *they portray the spatial distribution of stable, recognizable and
repeating patterns of soils that usually occupy identifiable portions of the landscape*, and
- *these patterns can be extracted from legends and maps to model (predict) the
most likely soil at any other location in the landscape using expert
knowledge alone* [@Zhu2001].
Resource inventories, and in particular soil surveys, have been
notoriously reluctant, or unable, to provide objective quantitative
assessments of the accuracy of their products. For example, most soil
survey maps have only been subjected to qualitative assessments of map
accuracy through visual inspection and subjective correlation exercises.
In the very few examples of quantitative evaluation
[@Marsman1986ALTERRA; @Finke2006Elsevier], the assessments have
typically focused on measuring the degree with which predictions of
soil classes at specific locations on a map, or within polygonal areas
on a map, agreed with on-the-ground assessments of the soil class at
these same locations or within these same polygons. Measurement error
can be large in assessing the accuracy of soil class maps.
@MacMillan2010DSM, for example, demonstrated that experts disagreed
with each other regarding the correct classification of ecological site
types at the same locations about as often as they disagreed with the
classifications reported by a map produced using a predictive model.
### Variants of soil maps
In the last 20–30 years, soil maps have evolved from purely 2D polygon
maps showing the distribution of soil poly-pedons i.e. named soil
classes, to dynamic 3D maps representing predicted or simulated values
of various primary or inferred soil properties and/or classes
(Fig. \@ref(fig:soilmap-types)). Examples of 2D+T (2D space + time) and/or 3D+T soil maps
are less common but increasingly popular (see e.g.
@Rosenbaum2012WRCR and @Gasch2015SPASTA). In general, we expect that demand for
spatio-temporal soil data is likely to grow.
```{r soilmap-types, echo=FALSE, fig.cap="Classification of types of soil maps based on spatial representation and variable type.", out.width="85%"}
knitr::include_graphics("figures/Fig_soilmap_types.png")
```
```{block2 type="rmdnote"}
A soil map can represent 2D, 3D, 2D+T
and/or 3D+T distribution of quantitative soil properties or soil
classes. It can show predicted or simulated values of target soil
properties and/or classes, or inferred soil-functions.
```
The spatial model increasingly used to represent soil spatial
information is the *gridded or raster data model*, where most of the
technical properties are defined by the grid cell size i.e. the ground
resolution. In practice,
vector-based polygon maps can be converted to gridded maps and *vice
versa*, so in practical terms there are really few meaningful differences
between the two models. In this book, to avoid any ambiguity, when
mentioning soil maps we will often refer to the spatio-temporal
reference and support size of the maps at the finest possible level of
detail. Below, for example, is a full list of specifications attached to
a *soil map* produced for the African continent [@Hengl2015AfSoilGrids250m]:
- *target variable*: soil organic carbon in permille;
- *values presented*: predictions (mean value);
- *prediction method*: 3D regression-kriging;
- *prediction depths*: 6 standard layers (0–5, 5–15, 15–30, 30–60,
60–100, 100–200 cm);
- *temporal domain (period)*: 1950–2005;
- *spatial support (resolution) of covariate layers*: 250 m;
- *spatial support of predictions*: point support (center of a grid
cell);
- *amount of variation explained by the spatial prediction model*: 45%;
Until recently, maps of individual soil properties, or of soil functions
or soil interpretations, were not considered to be true soil maps, but
rather, to be single-factor derivative maps or interpretive maps. This
is beginning to change and maps of the spatial pattern of distribution
of individual soil properties are increasingly being viewed as a
legitimate form of soil mapping.
### Predictive and automated soil mapping {#pedometric-mapping}
In contrast to traditional soil mapping, which is primarily based on
applying qualitative expert knowledge, the emerging, *‘predictive’* approach to soil
mapping is generally more quantitative and data-driven and based on the use of
statistical methods and technology
[@grunwald2005environmental; @Lagacherie2006Elsevier; @Hartemink2008Springer; @Boettinger2010Springer].
The emergence of new soil mapping methods is undoubtedly a reflection of new
developing technologies and newly available global data layers, especially
those that are free and publicly distributed such as MODIS products,
SRTM DEM and similar (Fig. \@ref(fig:new-technologies)). PSM can be compared to, and shares similar concepts with, other applications of statistics and machine learning in physical geography, for example Predictive Vegetation Mapping [@Fran01; @Hengl2018PNV].
```{r new-technologies, echo=FALSE, fig.cap="Evolution of digital soil mapping parallels the emergence of new technologies and global, publicly available data sources.", out.width="100%",out.extra="angle=0"}
knitr::include_graphics("figures/Fig_new_technologies.png")
```
The objective of using pedometric techniques for soil mapping is to
develop and apply objective and optimal sets of rules to predict the
spatial distribution of soil properties and/or soil classes. Most
typically, rules are developed by fitting statistical relationships
between digital databases representing the spatial distribution of
selected environmental covariates and observed instances of a soil class
or soil property at geo-referenced sample locations. The environmental
covariate databases are selected as predictors of the soil attributes on
the basis of either expert knowledge of known relationships to soil
patterns or through objective assessment of meaningful correlations with
observed soil occurrences. The whole process is amenable to complete
automation and documentation so that it allows for *reproducible
research* (http://en.wikipedia.org/wiki/Reproducibility).
Pedometric soil mapping typically follows six steps as outlined by
@McBratney2003Geoderma:
1. *Select soil variables (or classes) of interest and suitable
measurement techniques (decide what to map and describe)*;
2. *Prepare a sampling design (select the spatial locations of sampling
points and define a sampling intensity)*;
3. *Collect samples in the field and then estimate values of the target soil
variables at unknown locations to test and refine prediction
models*;
4. *Select and implement the most effective spatial prediction (or extrapolation)
models and use these to generate soil maps*;
5. *Select the most representative data model and distribution system*;
6. *Publish and distribute the soil information in the form of maps,
geographical databases and reports (and provide support to users)*;
```{block2 type="rmdnote"}
Differences among *conventional soil mapping* and *digital soil mapping* (or
*technology-driven or data-driven mapping*) relate primarily to the
degree of use of robust statistical methods in developing prediction
models to support the mapping process.
```
We recognize four classes of advanced soil mapping methods (B, C, D and E in
Fig. \@ref(fig:pedometric-mapping-vs-dsm)) which all belong to a
continuum of *digital soil mapping* methods [@malone2016using; @mcbratney2018pedometrics].
We promote in this book specifically the Class E soil mapping approach
i.e. which we refer to as the *predictive* and/or *automated soil mapping*.
```{r pedometric-mapping-vs-dsm, echo=FALSE, fig.cap="A classification of approaches to soil mapping: from purely expert driven (Class A), to various types of digital soil mapping including fully automated soil mapping (Class E).", out.width="85%"}
knitr::include_graphics("figures/Fig_pedometric_mapping_vs_DSM.png")
```
Some key advantages of the pedometric (statistical) approach to soil
mapping are that it is objective, systematic, repeatable, updatable and
represents an optimal expression of statistically validated
understanding of soil-environmental relationships in terms of the
currently available data.
There are, of course, also limitations with pedometric methods that
still require improvement. Firstly, the number of accurately
georeferenced locations of reliable soil observations (particularly with
analytical data) is often not sufficient to completely capture and
describe all significant patterns of soil variation in an area. There
may be too few sampled points and the exact location of available point
data may not be well recorded. Thus, data-driven soil mapping is
field-data demanding and collecting field data can require significant
expenditures of time, effort and money.
With legacy soil point data the sampling design, or rationale, used to
decide where to locate soil profile observation or sampling points is
often not clear and may vary from project to project or point to point.
Therefore there is no guarantee that available point data are actually
representative of the dominant patterns and soil forming conditions in
any area. Points may have been selected and sampled to capture
information about unusual conditions or to locate boundaries at points
of transition and maximum confusion about soil properties. Once a soil
becomes recognized as being widely distributed and dominant in the
landscape, many conventional field surveys elect not to record
observations when that soil is encountered, preferring to focus instead
on recording unusual or transition soils. Thus the population of
available legacy soil point observations may not be representative of
the true population of soils, with some soils being either over or
under-represented.
```{block2 type="rmdnote"}
We define automated or predictive soil mapping as
a data-driven approach to soil mapping with little or no human
interaction, commonly based on using optimal (where possible)
statistical methods that elucidate relationships between target soil
variables (sampled in the field and geolocated) and covariate layers,
primarily coming from remote sensing data.
```
A second key limitation of the automated approach to soil
mapping is that there may be no obvious relationship between observed
patterns of soil variation and the available environmental covariates.
This may occur when a soil property of interest does, indeed, strongly covary
with some mappable environmental covariate (e.g. soil clay content with
airborne radiometric data) but data for that environmental covariate are
not available for an area. It may also transpire that the pattern of
soil variation is essentially not predictable or related to any known
environmental covariate, available or not. In such cases, only closely
spaced, direct field observation and sampling is capable of detecting
the spatial pattern of variation in soils because there is no, or only a
very weak, correlation with available covariates [@kondolf2003tools].
### Comparison of conventional and pedometric or predictive soil mapping {#comparison-conventional-pm}
There has been a tendency to view conventional soil mapping and
automated soil mapping as competing and non-complementary approaches. In
fact, they share more similarities than differences. Indeed, they can be
viewed as end members of a logical continuum. Both rely on applying the
underlying idea that the distribution of soils in the landscape is
largely predictable (the deterministic part) and, where it is not
predictable, it must be revealed through intensive observation, sampling
and interpolation (the stochastic part).
In most cases, the basis of prediction is to relate the distribution of
soils, or soil properties, in the landscape to observable environmental
factors such as topographic position, slope, aspect, underlying parent
material, drainage conditions, patterns of climate, vegetation or land
use and so on. This is done manually and empirically (subjectively) in
conventional soil survey, while in automated soil mapping it is done
objectively and mostly in an automated fashion. At the time it was
developed, conventional soil survey lacked both the digital data sets of
environmental covariates and the statistical tools required to
objectively analyze relationships between observed soil properties and
environmental covariates. So, these relationships were, out of necessity,
developed empirically and expressed conceptually as expert knowledge.
In general, we suggest that next generation soil surveyors will
increasingly benefit from having a solid background in statistics and computer
science, especially in Machine Learning and A.I. However, effective selection and application of
appropriate statistical sampling and analysis techniques can also benefit from
consideration of expert knowledge.
### Top-down versus bottom-up approaches: subdivision versus agglomeration {#top-down}
There are two fundamentally different ways to approach the production of
soil maps for areas of larger extent, whether by conventional or
pedometric means. For ease of understanding we refer to these two
alternatives here as *“bottom-up”* versus *“top-down”*. @Rossiter2001
refers to a synthetic approach that he calls the *“bottom-up”* or *“name
and then group”* approach versus an analytic approach that he calls the
*“top-down”* or *“divide and then name”* approach.
The bottom up approach is agglomerative and synthetic. It is implemented
by first collecting observations and making maps at the finest possible
resolution and with the greatest possible level of detail. Once all
facts are collected and all possible soils and soil properties, and
their respective patterns of spatial distribution, are recorded, these
detailed data are generalized at successively coarser levels of
generalization to detect, analyse and describe broader scale (regional
to continental) patterns and trends. The fine detail synthesized to
extract broader patterns leads to the identification and formulation of
generalizations, theories and concepts about how and why soils organize
themselves spatially. The bottom-up approach makes little-to-no-use of
generalizations and theories as tools to aid in the conceptualization
and delineation of mapping entities. Rather, it waits until all the
facts are in before making generalizations. The bottom-up approach tends
to be applied by countries and organizations that have sufficient
resources (people and finances) to make detailed field surveys feasible
to complete for entire areas of jurisdiction. Soil survey activities of
the US national cooperative soil survey (NCSS) primarily adopt this
bottom-up approach. Other smaller countries with significant resources
for field surveys have also adopted this approach (e.g. Netherlands,
Denmark, Cuba). The bottom-up approach was, for example, used in the
development and elaboration of the US Soil Taxonomy system of
classification and of the US SSURGO (1:20,000) and STATSGO (1:250,000)
soil maps [@ZHONG2011491].
The top-down approach is synoptic, analytic and divisive. It is
implemented by first collecting just enough observations and data to
permit construction of generalizations and theoretical concepts about
how soils arrange themselves in the landscape in response to controlling
environmental variables. Once general theories are developed about how
environmental factors influence how soils arrange themselves spatially,
these concepts and theories are tested by using them to predict what
types of soils are likely to occur under similar conditions at
previously unvisited sites. The theories and concepts are adjusted in
response to initial application and testing until such time as they are
deemed to be reliable enough to use for production mapping. Production
mapping proceeds in a divisive manner by stratifying areas of interest
into successively smaller, and presumably more homogeneous, areas or
regions through application of the concepts and theories to available
environmental data sets. The procedures begin with a synoptic overview
of the environmental conditions that characterize an entire area of
interest. These conditions are then interpreted to impose a hierarchical
subdivision of the whole area into smaller, and more homogeneous
subareas. This hierarchical subdivision approach owes its origins to
early Russian efforts to explain soil patterns in terms of the
geographical distribution of observed soils and vegetation. The top-down approach tends
to be applied preferentially by countries and agencies that need to
produce maps for very large areas but that lack the people and resources
to conduct detailed field programs everywhere (see e.g.
@Henderson2004Geoderma and @Mansuy201459). Many of these divisive
hierarchical approaches adopt principals and methods associated with the
ideas of Ecological Land Classification [@rowe1981ecological] (in
Canada) or Land Systems Mapping [@gibbons1964study; @rowan1990land] (in
Australia).
As observed by @Rossiter2001 *“neither approach is usually applied in
its pure form”* and most approaches to soil mapping use both approaches
simultaneously, to varying degrees. Similarly, it can be argued that PSM provides
support for both approaches to soil mapping. PSM implements two
activities that bear similarities to bottom-up mapping. Firstly, PSM
uses *all* available soil profile data globally as input to initial
global predictions at coarser resolutions (*“top-down”* mapping).
Secondly, PSM is set up to ingest finer resolution maps produced via
detailed *“bottom-up”* mapping methods and to merge these more detailed
maps with initial, coarser-resolution predictions [@ramcharan2018soil].
## Sources of soil data for soil mapping
### Soil data sources targeted by PSM
PSM aims at integrating and facilitating exchange of global soil data.
Most (global) soil mapping initiatives currently rely on capture and use
of *legacy soil data*. This raises several questions. What is meant by
legacy soil data? What kinds of legacy soil data exist? What are the
advantages and limitations of the main kinds of legacy soil data?
In its most general sense, a legacy is something of value bequeathed
from one generation to the next. It can be said that global soil legacy
data consists of the sum of soil data and knowledge accumulated since
the first soil investigations 100 or more years ago [@arrouays2017soil]. More specifically,
the concept of a legacy is usually accompanied by an understanding that
there is an obligation and duty of the recipient generation to not
simply protect the legacy but to make positive and constructive use of
it.
```{block2 type="rmdnote"}
Four main groups of legacy data of
interest for global soil mapping are: (1) soil field records, (2) soil
polygon maps and legends, (3) soil-landscape diagrams and sketches, (4)
soil (profile) photographs.
```
In the context of soils, legacy soil data consist of the sum total of
data, information and knowledge about soils accumulated since soils were
first studied as independent natural objects. At its broadest, this
includes information about soil characteristics and classification, soil
use and management, soil fertility, soil bio-chemistry, soil formation,
soil geography and many other sub-disciplines.
In the more focused context of PSM, we are primarily interested in
four main kinds of legacy soil data:
- *Soil field observations and measurements* — Observations and
analytical data obtained for soils at point locations represent a
primary type of legacy soil data. These point source data provide
objective evidence of observed soil characteristics at known
locations that can be used to develop knowledge and rules about how
soils, or individual soil properties, vary across the landscape. The
quality and precision of these data can vary greatly. Some data
points might be accurately located, or geo-referenced, while others
might have very coarse geo-referencing (for example coordinates
rounded in decimal minutes or kilometers). Some point data might
only have a rough indication of the location obtained from a report
(for example *‘2 km south of village A’*), or might even
lack geo-referencing. Soil profile descriptions can be obtained from
pits (relatively accurate) or auger bores (less accurate). Soil
attributes can be determined in the laboratory (relatively accurate)
or by hand-estimation in the field (less accurate). Legacy point
data is characterized by great variation in precision, accuracy,
completeness, relevance and age. It needs to be used with caution
and with understanding of how these issues affect its potential use.
- *Soil (polygon) maps and legends* — Soil maps and legends are one of
the primary means by which information and knowledge about how soils
vary spatially have been observed, distilled, recorded and presented
to users. Soil maps provide lists, or inventories, of soils that
occur in mapped regions, illustrate the dominant spatial patterns
displayed by these listed soils and provide information to
characterize the main properties of these soils. Soil maps can themselves
be used as sources of evidence to develop knowledge and quantitative rules about how soils,
or individual soil properties, vary across the landscape. On the
other hand, similar to soil observations, soil maps can also exhibit
significant errors with respect to measurement, classification,
generalization, interpretation and spatial interpolation.
- *Tacit expert soil knowledge* — In the context of soils, tacit
expert knowledge represents a diffuse domain of information about
the characteristics and spatial distribution of soils that has not
been captured and recorded formally or explicitly. It may reside in
the minds and memories of experts who have conducted field and
laboratory studies but have been unable to record all their
observations in a formal way. It may be captured informally and
partially in maps, legends, conceptual diagrams, block diagrams,
generalized decision rules and so on. Tacit knowledge represents
soft data, in comparison to the more hard data of point observations
and maps.
- *Photographs* — Traditional soil survey is heavily based on use of
aerial photographs. Older aerial photographs (even if
not stereoscopic) are an important resource for land degradation
monitoring and vegetation succession studies. Field photographs of
soil profiles, soil sites and soil processes are another important
source of information that has been under-used for soil mapping.
ISRIC for example has an archive of over 30 thousand photographs
from various continents. Most of these can be geo-coded and
distributed via image sharing web-services such as WikiMedia,
Instagram and/or Flickr. In theory, even a single photograph of a
soil profile could be used to (automatically?) identify soil types,
even extract analytical soil properties. Although it is very likely
that prediction by using photographs-only would be fairly imprecise,
such data could potentially help fill large gaps for areas where
there are simply no soil observations.
### Field observations of soil properties {#field-observations}
Perhaps the most significant, but certainly the most reliable, inputs to
soil mapping are the *field observations* (usually at point locations)
of descriptive and analytical soil properties
[@SSDS1993; @Schoeneberger1998]. This is the *hard data* or *ground
truth* in soil mapping [@Rossiter2001]. Field observations are also the
main input to spatial prediction modelling and the basis for assessment
of mapping accuracy. Other synthetically or empirically generated
estimates of values of target variables in the field are considered as
*soft data* (data based on qualitative information or quick observations).
Soft data are less desirable as the primary input to model
estimation, but sometimes there is no alternative. It is in any case
important to recognize differences between *hard* and *soft* data and to
suggest ways to access the uncertainty of models that are based on
either or both.
The object of observation and description of a soil is almost always a
soil profile or *pedon*. Officially, a soil pedon is defined as a body
of soil having a limited horizontal extent of no more than 1–2 m in
horizontal and a vertical dimension ($d$) that typically extends to only
1–2 m but may occasionally extend to greater depths. In practice, the vast
majority of soil profile data pertain to soil observations and samples
collected over very limited horizontal dimensions (10–50 cm) and down to
maximum depths of 1–2 m.
In geostatistical terms, soil observations are most commonly collected at
point support, meaning that they are representative of a point in space
with very limited horizontal extent. It is relatively rare to encounter
legacy soil profile data collected over larger horizontal extents and
bulked to create a sample representative of a larger volume of soil that
can be treated as providing block support for statistical purposes. On
the other hand, there is an increasing interest in soil predictions at
varying support sizes e.g. 1 ha for which composite sampling can be used.
In the vertical dimension, soil profiles are usually described and
sampled with respect to *genetic soil horizons*, which are identifiable
layers in the soil that reflect differences in soil development or
depositional environments. Less frequently, soils are described and
sampled in the vertical dimension with respect to arbitrary depth
intervals or layers e.g. at fixed depths intervals e.g. 10, 20, 30, 40,
$\ldots$ cm.
```{block2 type="rmdnote"}
A soil profile record is a set of field
observations of the soil at a location — a collection of descriptive and
analytical soil properties attached to a specific location, depth and