Accurate understanding of spatial distribution and variability of soil total nitrogen (TN) is critical for the site-specific nitrogen management. Based on 4337 newly obtained soil observations and 33 covariates, this study applied the random forest (RF) algorithm and modified regression kriging (RF combined with residual kriging: RFK, hereafter) model to spatially predict and map topsoil TN content in agricultural areas of Henan Province, central China. According to the RFK prediction, topsoil TN content ranged from 0.52 to 1.81 g kg−1, and the farmland with the topsoil TN contents of 1.00–1.23 g kg−1 and 0.80–1.23 g kg−1 accounted for 48.2% and 81.2% of the total farmland area, respectively. Spatially, the topsoil TN in the study area was generally higher in the west and lower in the east. By using the Boruta variable selection algorithm, soil organic matter (SOM) and available potassium contents in topsoil, nitrogen deposition, average annual precipitation, livestock discharges, and topsoil pH were identified as the main factors driving the spatial distribution and variation of soil TN in the study area. The RF and RFK models used showed the expected performance and achieved acceptable TN prediction accuracy. In comparison, RFK performed slightly better than the RF model. The R2 and RMSE achieved by the RFK model were improved by 4.5% and 4.5%, respectively, compared with that by the RF model. However, the results suggest that RFK was inferior to the RF model in quantifying prediction uncertainty and thus may have a slight disadvantage in model reliability.
In order to accurately predict soil properties, various machine learning (ML) approaches and hybrid models constructed by integrating ML into regression kriging framework were used to predict and map arable land topsoil pH in Henan province, central China. Random forest (RF), cubist (Cu), support vector machine, artificial neural network, multiple linear regression, classification and regression trees (CART) and their hybrid models were compared for pH accuracy prediction. Among all standalone ML models, RF had the best predictive performance, in terms of the metrics employed in this study, followed by Cu, and CART was the worst. Compared with their ML counterparts, hybrid models could improve the accuracy of topsoil pH prediction to various extents. The accuracy improvement of the hybrid models constructed based on the simple ML was much greater than that based on the complex ensemble ML. Except for artificial neural network kriging , there was no significant difference between different hybrid models in the predicted results of topsoil pH. The outputs from the best predictive models showed that weak acidic soils and weak alkaline soils were the dominant arable soils in the study region, accounting for more than 30% and more than 50% of the total arable land area respectively, the topsoil pH of arable land in the north of the study area is generally higher than that in the south. Boruta variable selection revealed that altitude, climatic covariates closely related to soil moisture availability and some soil properties were the most critical factors affecting and controlling the topsoil pH of arable land.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.