Abstract. In this paper, we present and analyze a novel global database of
soil infiltration measurements, the Soil Water Infiltration Global (SWIG)
database. In total, 5023 infiltration curves were collected across all
continents in the SWIG database. These data were either provided and quality
checked by the scientists who performed the experiments or they were
digitized from published articles. Data from 54 different countries were
included in the database with major contributions from Iran, China, and the USA.
In addition to its extensive geographical coverage, the collected
infiltration curves cover research from 1976 to late 2017. Basic information
on measurement location and method, soil properties, and land use was
gathered along with the infiltration data, making the database valuable for
the development of pedotransfer functions (PTFs) for estimating soil hydraulic
properties, for the evaluation of infiltration measurement methods, and for
developing and validating infiltration models. Soil textural information
(clay, silt, and sand content) is available for 3842 out of 5023 infiltration
measurements (∼ 76%) covering nearly all soil USDA textural classes
except for the sandy clay and silt classes. Information on land use is
available for 76 % of the experimental sites with agricultural land use as
the dominant type (∼ 40%). We are convinced that the SWIG database
will allow for a better parameterization of the infiltration process in land
surface models and for testing infiltration models. All collected data and
related soil characteristics are provided online in
*.xlsx and *.csv formats for reference, and we add a disclaimer that the
database is for public domain use only and can be copied freely by
referencing it. Supplementary data are available at
https://doi.org/10.1594/PANGAEA.885492 (Rahmati et al., 2018). Data
quality assessment is strongly advised prior to any use of this database.
Finally, we would like to encourage scientists to extend and update the SWIG database
by uploading new data to it.
Estimation of the soil organic carbon (SOC) content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines (SVM), artificial neural networks (ANN), regression tree, random forest (RF), extreme gradient boosting (XGBoost), and conventional deep neural network (DNN) for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 14.9% of SOC spatial variability followed by the normalized difference vegetation index (12.5%), day temperature index of moderate resolution imaging spectroradiometer (10.6%), multiresolution valley bottom flatness (8.7%) and land use (8.2%), respectively. Based on 10-fold cross-validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 0.59%, a root mean squared error of 0.75%, a coefficient of determination of 0.65, and Lin’s concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 3.71%, followed by the aquic (2.45%) and xeric (2.10%) classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN (hidden layers = 7, and size = 50) is a promising algorithm for handling large numbers of auxiliary data at a province-scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC base-line map and minimal uncertainty.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.