The existence of water markets establishes water prices, promoting trading of water from low- to high-valued uses. However, market participants can face uncertainty when asking and offering prices because water rights are heterogeneous, resulting in inefficiency of the market. This paper proposes three random forest regression models (RFR) to predict water price in the western United States: a full variable set model and two reduced ones with optimal numbers of variables using a backward variable elimination (BVE) approach. Transactions of 12 semiarid states, from 1987 to 2009, and a dataset containing various predictors, were assembled. Multiple replications of k-fold cross-validation were applied to assess the model performance and their generalizability was tested on unused data. The importance of price influencing factors was then analyzed based on two plausible variable importance rankings. Results show that the RFR models have good predictive power for water price. They outperform a baseline model without leading to overfitting. Also, the higher degree of accuracy of the reduced models is insignificant, reflecting the robustness of RFR to including lower informative variables. This study suggests that, due to its ability to automatically learn from and make predictions on data, RFR-based models can aid water market participants in making more efficient decisions.
Flood frequency analysis generally involves the use of simple parametric probability distributions to smooth and extrapolate the information provided by short flood records to estimate extreme flood flow quantiles. Parametric probability distributions can have difficulty simultaneously fitting both the largest and smallest floods. A danger is that the smallest observations in a record can distort the exceedance probabilities assigned to the large floods of interest. The identification and treatment of such Potentially Influential Low Floods (PILFs) frees a fitting algorithm to describe the distribution of the larger observations. This can allow parametric flood frequency analysis to be both efficient, and also robust to deviations from the proposed probability model's lower tail. Historically, PILF identification involved subjective judgement. We propose a new multiple Grubbs‐Beck outlier test (MGBT) for objective PILF identification. MGBT PILF identification rates (akin to Type I errors) are reported for the lognormal (LN) distribution and the log‐Pearson Type III (LP3) distribution with a variety of skew coefficients. MGBT PILF identification generally matched subjective identification from a recent California flood frequency study. Monte Carlo results show that censoring of PILFs identified by the MGBT algorithm improves the extreme quantile estimator efficiency of the expected moments algorithm (EMA) for negatively skewed LP3 distributions and has little effect for zero or positive skews; simultaneously it protects against deviations from the LP3 in the lower tail, as illustrated by distorted LN examples. Thus, MGBT generally makes flood frequency analysis based on the LP3 distribution with EMA both more accurate and more robust.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.