Probabilistic predictions aim to produce a prediction interval with probabilities associated with each possible outcome instead of a single value for each outcome. In multiple regression problems, this can be achieved by propagating the known uncertainties in data of the response variables through a Monte Carlo approach. This paper presents an analysis of the impact of the training response variable uncertainty on the prediction uncertainties with the help of a comparison with probabilistic prediction obtained with quantile regression random forest. The result is an uncertainty quantification of the impact on the prediction. The approach is illustrated with the example of the probabilistic regionalization of soil moisture derived from cosmic-ray neutron sensing measurements, providing a regional-scale soil moisture map with data uncertainty quantification covering the Selke river catchment, eastern Germany.
<p>Information about soil water content (SWC) in adequate spatial and temporal resolution is highly desired for a variety of scientific and practical applications. Cosmic-Ray Neutron Sensing (CRNS) has become an established method for passive SWC data collection, providing SWC information over several hectares, either by stationary CRNS sensors (local continuous measurements) or by mobile CRNS roving (expanding the footprint on certain field campaign days). Recent approaches of automatic rail-based CRNS roving (Rail-CRNS) allowed to expand the monitored areas further up to the kilometer scale in high temporal resolution. While a pilot study on Rail-CRNS provided promising results along the railway track, currently in daily resolution, it also raised the question of how transferable these SWC data are for areas not directly adjacent to the footprints along the railway. In this study, we have tested the performance of SWC regionalization by probabilistic predictions based on Rail-CRNS derived SWC data. A Monte Carlo approach was applied in regression random forest, using static (e.g. topographical indices, soil properties) and dynamic (precipitation) predictors and quantified their impact on the prediction accuracy. Using daily SWC values from a ~ 9 km long railway at the Harz mountain, Germany, recorded by the Rail-CRNS between September 2021 and July 2022, we predicted the daily spatial SWC variation for an area of ~ 85 km&#178; and a period of 300 days on a 250 x 250 m grid. The resulting maps of gravimetric soil moisture showed realistic pattern for both, spatial and temporal SWC variation. The maps resolved spatial variation as related to land cover, seasonal SWC dynamics and individual responses of single areas to wetting and drying periods. As the demonstrated data represented the outcome of a relatively narrow area as given by the limited training Rail-CRNS data, the extension of the proposed approach by expanding the railway networks, by future technical improvements and by the automatization of the workflow has the clear potential to offer near real time SWC products for the large scale (> 100 km).&#160;</p>
<p>Upscaling of soil water content (SWC) information towards<strong> </strong>the large-scale (>10 km) is highly desired to address the increasing demand on SWC products at various sectors. Random forest (RF) regression has been suggested as suitable method to generate large SWC maps from a limited amount of observations. RF deals with multiple prediction variables (predictors) to derive the missing values of a desired variable (e.g. SWC) based on their internal relationship. Cosmic ray neutron sensing (CRNS) is an alternative method for passive SWC mapping and monitoring, either by stationary CRNS sensors or by mobile CRNS roving. CRNS has a certain advantage over most classical hydrogeophysical approaches because of its footprint at the hectares-scale and beyond, particularly true for roving data, which qualifies CRNS data as suitable input for RF regressions. However, commonly CRNS roving data contain a high amount of noise and outlier values, related to the statistical distribution of neutron counting, which hinders the signal interpretation and could lower the quality of the RF regression performance. There are so far two ways to overcome the noise problem and to achieve a higher data stability; i) increasing of the aggregation time, which decreases the signal uncertainty but also reduces the spatial resolution and ii) applying smoothing algorithms, e.g. interpolation or moving averages, which results in more stable values, but it does not solve the outlier problem.</p> <p>We used SWC data from CRNS roving along the Selketal catchment at the Harz mountain, Germany, to test the performance of a score criteria for an adaptive removal of potential outliers. The score criteria are internal test parameters, providing an indication about the probability of values that might be an outlier or not. Therefore, each observation was subject to a group of queries, asking its conformity to the surrounding values by selected statistical parameters. Based on the total score of the queries, the potentially unreliable observations were removed using various thresholds and used as input for the RF regression. RF regression was performed using static (e.g. topographical indices, soil properties) and dynamic (precipitation) predictors generating SWC maps from an area of ~2700 km&#178;. SWC input data were split into training (~2/3) and validation sets (~1/3).</p> <p>Preliminary results showed that the application of the score criteria resulted in more stable spatial pattern and improved the R&#178; from 0.099 to 0.196, 0.266 and 0.308 for score 6, 4 and 3, respectably. Achieved root mean squared error also decreased with stronger filtering, ranging from 0.14 for the original datasets to 0.078 for score 3. However, by using the score 3 threshold, 22.4% of the data were omitted. Hence, an optimization between the amount of excluded data and the resulting improvement of prediction needs to be developed and tested. The implementation of the spatial relationship in-between the observations and a weighting of the score values according to their importance should further increase the performance. Due to its easy application and its adjustable criteria selection, the proposed filtering approach has the potential to become more popular in CRNS roving studies.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.