Summary This study introduces a hybrid spatial modelling framework, which accounts for spatial non‐stationarity, spatial autocorrelation and environmental correlation. A set of geographic spatially autocorrelated Euclidean distance fields (EDF) was used to provide additional spatially relevant predictors to the environmental covariates commonly used for mapping. The approach was used in combination with machine‐learning methods, so we called the method Euclidean distance fields in machine‐learning (EDM). This method provides advantages over other prediction methods that integrate spatial dependence and state factor models, for example, regression kriging (RK) and geographically weighted regression (GWR). We used seven generic (EDFs) and several commonly used predictors with different regression algorithms in two digital soil mapping (DSM) case studies and compared the results to those achieved with ordinary kriging (OK), RK and GWR as well as the multiscale methods ConMap, ConStat and contextual spatial modelling (CSM). The algorithms tested in EDM were a linear model, bagged multivariate adaptive regression splines (MARS), radial basis function support vector machines (SVM), Cubist, random forest (RF) and a neural network (NN) ensemble. The study demonstrated that DSM with EDM provided results comparable to RK and to the contextual multiscale methods. Best results were obtained with Cubist, RF and bagged MARS. Because the tree‐based approaches produce discontinuous response surfaces, the resulting maps can show visible artefacts when only the EDFs are used as predictors (i.e. no additional environmental covariates). Artefacts were not obvious for SVM and NN and to a lesser extent bagged MARS. An advantage of EDM is that it accounts for spatial non‐stationarity and spatial autocorrelation when using a small set of additional predictors. The EDM is a new method that provides a practical alternative to more conventional spatial modelling and thus it enhances the DSM toolbox. Highlights We present a hybrid mapping approach that accounts for spatial dependence and environmental correlation. The approach is based on a set of generic Euclidean distance fields (EDF). Our Euclidean distance fields in machine learning (EDM) can model non‐stationarity and spatial autocorrelation. The EDM approach eliminates the need for kriging of residuals and produces accurate digital soil maps.
As limited resources, soils are the largest terrestrial sinks of organic carbon. In this respect, 3D modelling of soil organic carbon (SOC) offers substantial improvements in the understanding and assessment of the spatial distribution of SOC stocks. Previous three-dimensional SOC modelling approaches usually averaged each depth increment for multi-layer two-dimensional predictions. Therefore, these models are limited in their vertical resolution and thus in the interpretability of the soil as a volume as well as in the accuracy of the SOC stock predictions. So far, only few approaches used spatially modelled depth functions for SOC predictions. This study implemented and evaluated an approach that compared polynomial, logarithmic and exponential depth functions using non-linear machine learning techniques, i.e. multivariate adaptive regression splines, random forests and support vector machines to quantify SOC stocks spatially and depth-related in the context of biodiversity and ecosystem functioning research. The legacy datasets used for modelling include profile data for SOC and bulk density (BD), sampled at five depth increments (0-5, 5-10, 10-20, 20-30, 30-50 cm). The samples were taken in an experimental forest in the Chinese subtropics as part of the biodiversity and ecosystem functioning (BEF) China experiment. Here we compared the depth functions by means of the results of the different machine learning approaches obtained based on multi-layer 2D models as well as 3D models. The main findings were (i) that 3 rd degree polynomials provided the best results for SOC and BD (R 2 = 0.99 and R 2 = 0.98; RMSE = 0.36% and 0.07 g cm -3 ). However, they did not adequately describe the general asymptotic trend of SOC and BD. In this respect the exponential (SOC: R 2 = 0.94; RMSE = 0.56%) and logarithmic (BD: R 2 = 84; RMSE = 0.21 g cm -3 ) functions provided more reliable estimates. (ii) random forests with the exponential function for SOC correlated better with the corresponding 2.5D predictions (R 2 : 0.96 to 0.75), compared to the 3 rd degree polynomials (R 2 : 0.89 to 0.15) which support vector machines fitted best. We recommend not to use polynomial functions with sparsely sampled profiles, as they have many turning points and tend to overfit the data on a given profile. This may limit the spatial prediction capacities. Instead, less adaptive functions with a higher degree of generalisation such as exponential and logarithmic functions should be used to spatially map sparse vertical soil profile datasets. We conclude that spatial prediction of SOC using exponential depth functions, in conjunction with random forests is well suited for 3D SOC stock modelling, and provides much finer vertical resolutions compared to 2.5D approaches.
Soil organic C (SOC) and soil moisture (SM) affect the agricultural productivity of soils. For sustainable food production, knowledge of the horizontal as well as vertical variability of SOC and SM at field scale is crucial. Machine learning models using depth-related data from multiple electromagnetic induction (EMI) sensors and a gamma-ray spectrometer can provide insights into this variability of SOC and SM. In this work, we applied weighted conditioned Latin hypercube sampling to calculate 25 representative soil profile locations based on geophysical measurements on the surveyed agricultural field, for sampling and modeling. Ten additional random profiles were used for independent model validation. Soil samples were taken from four equal depth increments of 15 cm each. These were used to approximate polynomial and exponential functions to reproduce the vertical trends of SOC and SM as soil depth functions. We modeled the function coefficients of the soil depth functions spatially with Cubist and random forests with the geophysical measurements as environmental covariates. The spatial prediction of the depth functions provides three-dimensional (3D) maps of the field scale. The main findings are (a) the 3D models of SOC and SM had low errors; (b) the polynomial function provided better results than the exponential function, as the vertical trends of SOC and SM did not decrease uniformly; and (c) the spatial prediction of SOC and SM with Cubist provided slightly lower error than with random forests. Hence, we recommend modeling the second-degree polynomial with Cubist for 3D prediction of SOC and SM at field scale.
The soil organic carbon (SOC) pool of the Northern Hemisphere contains about half of the global SOC stored in soils. As the Arctic is exceptionally sensitive to global warming, temperature rise and prolonged summer lead to deeper thawing of permafrost-affected soils and might contribute to increasing greenhouse gas emissions progressively. To assess the overall feedback of soil organic carbon stocks (SOCS) to global warming in permafrost-affected regions the spatial variation in SOCS at different environmental scales is of great interest. However, sparse and unequally distributed soil data sets at various scales in such regions result in highly uncertain estimations of SOCS of the Northern Hemisphere and here particularly in Greenland. The objectives of this study are to compare and evaluate three controlling factors for SOCS distribution (vegetation, landscape, aspect) at two different scales (local, regional). The regional scale reflects the different environmental conditions between the two study areas at the coast and the ice margin. On the local scale, characteristics of each controlling factor in form of defined units (vegetation units, landscape units, aspect units) are used to describe the variation in the SOCS over short distances within each study area, where the variation in SOCS is high. On a regional scale, we investigate the variation in SOCS by comparing the same units between the study areas. The results show for both study areas that SOCS are with 8 kg m -2 in the uppermost 25 cm and 16 kg m -2 in the first 100 cm of the soil, i.e., 3 to 6 kg m -2 (37.5%) higher than existing large scale estimations of SOCS in West Greenland. Our approach allows to rank the scale-dependent importance of the controlling factors within and between the study areas. However, vegetation and aspect better explain variations in SOCS than landscape units. Therefore, we recommend vegetation and aspect for determining the variation in SOCS in West Greenland on both scales.
<p>Over the last decades, a progressive glacier melting has been detected induced by climate change which cause a rapid enlargement of ice-free areas in glacier forelands in Arctic, Antarctic and Alpine regions. These recently deglaciated areas represent highly dynamic environments in terms of vegetation development and soil formation. Tundra plant communities of glacier forelands mainly consist of cryptogamic species forming biological soil crusts (BSCs) on the surface. These BSCs are known to promote the accumulation of aeolian particles and organic material being relevant to soil formation. It is important to understand both BSC development and soil formation in glacier forelands as fundamental to future development of mature tundra which contributes to an increase in soil organic carbon (SOC) and nitrogen (N) stocks in soil. The heterogeneous terrain of glacier forelands affects the spatial variation in both soil and vegetation characteristics which are additionally influenced by the distance to the glacier terminus. This study focuses on the spatial variation in soil and BSC characteristics in Arctic glacier forelands of Svalbard using multi-scale contextual soil mapping (CSM) and Euclidean distance fields (EDF). The data set comprises of soil (SOC, N, texture) and BSC characteristics (species composition, percent cover) from 168 sampling locations as well as terrain covariates (elevation, slope, aspect, curvature) at several scales using CSM and spatial covariates (EDF). Random forests (RF) are used to analyse the relationships between the covariates and soil and BSC characteristics, respectively.</p><p>Preliminary results show a good quality of the RF models (R&#178;/RMSE) which is similar for SOC (0.41/6.19) and N (0.44/0.22). Elevation, curvature and slope at large scales are the most important covariates to explain the spatial variation in SOC and N. On large scales, these covariates represent the distance to the glacier terminus and generally explain the increase in SOC and N with increasing distance from the glacier terminus.&#160; Additionally, elevation at small scales represents relevant issues of predominant geomorphologic features signature (e.g. moraine topography) to soil formation and BSC development. Analyses of the spatial variation and interrelationships of soil and BSC characteristics are still ongoing and further results will be presented at EGU 2020.</p>
<p>Land cover information plays an essential role for resource development, environmental monitoring and protection. Amongst other natural resources, soils and soil properties are strongly affected by land cover and land cover change, which can lead to soil degradation. Remote sensing techniques are very suitable for spatio-temporal mapping of land cover mapping and change detection. With remote sensing programs vast data archives were established. Machine learning applications provide appropriate algorithms to analyse such amounts of data efficiently and with accurate results. However, machine learning methods require specific sampling techniques and are usually made for balanced datasets with an even training sample frequency. Though, most real-world datasets are imbalanced and methods to reduce the imbalance of datasets with synthetic sampling are required. Synthetic sampling methods increase the number of samples in the minority class and/or decrease the number in the majority class to achieve higher model accuracy. The Synthetic Minority Over-Sampling Technique (SMOTE) is a method to generate synthetic samples and balance the dataset used in many machine learning applications. In the middle Guadalquivir basin, Andalusia, Spain, we used random forests with Landsat images from 1984 to 2018 as covariates to map the land cover change with the Google Earth Engine. The sampling design was based on stratified random sampling according to the CORINE land cover classification of 2012. The land cover classes in our study were arable land, permanent crops (plantations), pastures/grassland, forest and shrub. Artificial surfaces and water bodies were excluded from modelling. However, the number of the 130 training samples was imbalanced. The classes pasture (7&#160;samples) and shrub (13&#160;samples) show a lower number than the other classes (48, 47 and 16&#160;samples). This led to misclassifications and negatively affected the classification accuracy. Therefore, we applied SMOTE to increase the number of samples and the classification accuracy of the model. Preliminary results are promising and show an increase of the classification accuracy, especially the accuracy of the previously underrepresented classes pasture and shrub. This corresponds to the results of studies with other objectives which also see the use of synthetic sampling methods as an improvement for the performance of classification frameworks.</p>
<p>Soils and soil functions are recognized as a key resource for human well-being throughout time. In an agricultural and forestry perspective, soil functions contribute to food and timber production. Other soil functions are related to freshwater security and energy provisioning. In general, the capacity of a soil to function within specific boundaries is summarised as soil quality. Knowledge about the spatial distribution of soil quality is crucial for sustainable land use and the protection of soils and their functions. This spatial knowledge can be obtained with accurate and efficient machine-learning-based soil mapping approaches, which allow the estimation of the soil quality at distinct locations. However, the vertical distribution of soil properties is usually neglected when assessing soil quality at distinct locations. To overcome such limitations, the depth function of soil properties needs to be incorporated in the modelling. This is not only important to get a better estimation of the overall soil quality throughout the rooting zone, but also to identify factors that limit plant growth, such as strong acidity or alkalinity, and the water holding capacity. Thus, the objective of this study was to model and map the soil quality indicators pH, soil organic carbon, sand, silt and clay content as a volumetric entity. The study area is located in southern Spain in the Province of Seville at the Guadalquivir river. It covers 1,000&#160;km<sup>2</sup> of farmland, citrus and olive plantations, pastures and wood pasture (Dehesa) in the Sierra Morena mountain range, at the Guadalquivir flood plain and tertiary terraces. Soil samples were taken at 130 soil profiles in five depths (or less at shallow soils). The profiles were randomly stratified depending on slope position and land cover. We used a subset of 99 samples from representative soil profiles to assess the overall 513 samples with FT-IR spectroscopy and machine learning methods to model equal-area spline, polynomial and exponential depth functions for each soil quality indicator at each of the 130 profiles. These depth functions were modelled and predicted spatially with a comprehensive set of environmental covariates from remote sensing data, multi-scale terrain analysis and geological maps. By solving the spatially predicted depth functions with a vertical resolution of 5&#160;cm, we obtained a volumetric, i.e. three-dimensional, map of pH, soil organic carbon content and soil texture. Preliminary results are promising for volumetric soil mapping and the estimation of soil quality and limiting factors in three-dimensional space.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.