When best is the enemy of good – critical evaluation of performance criteria in hydrological models

Cinkus, Guillaume; Mazzilli, Naomi; Jourde, Hervé; Wünsch, Andreas; Liesch, Tanja; Ravbar, Nataša; Chen, Zhao; Goldscheider, Nico

doi:10.5194/hess-2022-380

Cited by 5 publications

(5 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Overall, the overestimation in autumn and the underestimation in spring can still lead to an adequate Qmean over the entire time period. This and other counterbalancing errors (Cinkus et al., 2023) are one reason why KGE is unlikely to lead to adequate model structures that capture relevant hydrological mechanisms in a catchment.…”

Section: Discussionmentioning

confidence: 99%

“…Thus, it may be a sign of an adequate model choice or simply luck if a KGE calibrated model manages to reproduce a more nuanced understanding of hydrological processes as described for example, through the hydrological signatures tested. This is another reason why aggregated metrics have been criticized (Cinkus et al., 2023; Clark et al., 2021) and the calls for using additional methods (e.g., Bouaziz et al., 2021; Knoben et al., 2020; Pool et al., 2017) or better metrics (Fowler et al., 2018; Pool et al., 2018; Schwemmle et al., 2021) for evaluating model performance accumulate.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Investigating the Model Hypothesis Space: Benchmarking Automatic Model Structure Identification With a Large Model Ensemble

Spieler,

Schütze

2024

Water Resources Research

View full text Add to dashboard Cite

Selecting an appropriate model for a catchment is challenging, and choosing an inappropriate model can yield unreliable results. The Automatic Model Structure Identification (AMSI) method simultaneously calibrates model structural choices and model parameters, which reduces the workload of comparing different models. In this study we benchmark AMSI's capabilities in two ways, using 12 hydro‐climatically diverse Model Parameter Estimation Experiment catchments. First, we calibrate parameter values for 7,488 different model structures and test AMSI's ability to find the best‐performing models in this set. Second, we compare the performance of these 7,488 models and AMSI's selection to the performance of 45 commonly used, structurally more diverse, conceptual models. In both cases, we quantify model accuracy (through the Kling‐Gupta Efficiency) and model adequacy (through various hydrologic signatures). AMSI effectively identifies high‐accuracy models among the 7,488 options, with Kling‐Gupta‐Efficiency scores comparable to the best among the 45 models. However, model adequacy remains poor for the accurate models, regardless of the selection method. In nine of the tested catchments, none of the most accurate models replicate observed signatures with less than 50% errors; in the remaining three catchments, only a handful of models do so. This paper thus provides strong empirical evidence that relying on aggregated efficiency metrics is unlikely to result in hydrologically adequate models, no matter how the models themselves are selected. Nevertheless, AMSI has been shown to effectively search the model hypothesis space it was given. Combined with an improved calibration approach it can therefore offer new ways to address the challenges of model structure selection.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Investigating the Model Hypothesis Space: Benchmarking Automatic Model Structure Identification With a Large Model Ensemble

Spieler,

Schütze

2024

Water Resources Research

View full text Add to dashboard Cite

show abstract

“…Previous studies (Ambroise et al, 1995;Refsgaard, 1997) have highlighted that calibrating a distributed hydrological model solely against single-point hydrological parameters may not yield satisfactory results for the entire catchment. The criteria to calibrate and evaluate the hydrological models play an important role in its performance (Cinkus et al, 2022). To achieve better performance, it is recommended to employ multiple variables and multiple site calibration strategies in distributed hydrological modelling.…”

Section: Calibration and Validationmentioning

confidence: 99%

Surface-subsurface interaction analysis and the influence of precipitation spatial variability on a lowland mesoscale catchment

Sardar,

Ali,

Popescu

et al. 2023

Preprint

View full text Add to dashboard Cite

Abstract. The hydrology of the catchments is primarily shaped by the intricate and dynamic interactions between surface water and groundwater. This is particularly evident in lowland catchments, where these interactions assume a complex nature. This study investigated the complex interaction between surface water and groundwater in the transboundary catchment Aa of Weerijs, shared by the Netherlands and Belgium. A hydrological model, MIKE SHE coupled with MIKE 11, was calibrated and validated over twelve years using streamflow, groundwater levels, and evapotranspiration data. The model performance was analyzed using model efficiency parameters i.e., correlation coefficient and Nash-Sutcliffe Efficiency coefficient. The model performed well, with satisfactory simulations of streamflow, groundwater levels, and evapotranspiration dynamics. Groundwater levels rose in winter and declined from April to September due to increased evapotranspiration in summer. Precipitation drove the water balance, with 60 % lost through evapotranspiration. Base flow from subsurface drainage networks significantly contributed to river water. Spatial variability in precipitation minimally impacted streamflow but caused localized fluctuations in groundwater levels. Higher spatial resolution precipitation data led to fluctuations due to local recharge points, yet overall catchment hydrology was unaffected. The findings highlight the importance of surface water-groundwater interactions in lowland catchments. The developed model provides insights for water resource planning and climate change adaptation in the catchment.

show abstract

“…Ignoring these assumptions and limitations can lead to contradictory results and confusion about model evaluation (Bennett et al, 2013; Castaneda‐Gonzalez et al, 2018; Criss & Winston, 2008; Knoben et al, 2019; Koskinen et al, 2017). Although several modifications regarding the existing metrics of R 2 (Legates & McCabe Jr, 1999; Onyutha, 2022), NSE (Criss & Winston, 2008; Duc & Sawada, 2023; Mathevet et al, 2006), and KGE (Cinkus et al, 2023; Kling et al, 2012; Lamontagne et al, 2020; Liu, 2020; Pool et al, 2018) have been proposed, these revised versions have not been widely accepted and there is still no broad consensus on how to evaluate the performance of hydrologic and hydraulic models by using an appropriate criterion given the availability and accuracy of observed hydrologic data, epistemic uncertainty in the modeling process (Beven & Lane, 2022; Clark et al, 2021; Huang & Merwade, 2023b; Knoben et al, 2019). Therefore, in order to evaluate the reliability and accuracy of flood model predictions, the pros and cons of multiple commonly used evaluation metrics for ensemble flood modeling are investigated and demonstrated in this study.…”

Section: Introductionmentioning

confidence: 99%

Beyond a fixed number: Investigating uncertainty in popular evaluation metrics of ensemble flood modeling using bootstrapping analysis

Huang,

Merwade

2024

J Flood Risk Management

View full text Add to dashboard Cite

Evaluation of the performance of flood models is a crucial step in the modeling process. Considering the limitations of single statistical metrics, such as uncertainty bounds, Nash Sutcliffe efficiency, Kling Gupta efficiency, and the coefficient of determination, which are widely used in the model evaluation, the inherent properties and sampling uncertainty in these metrics are demonstrated. A comprehensive evaluation is conducted using an ensemble of one‐dimensional Hydrologic Engineering Center's River Analysis System (HEC‐RAS) models, which account for the uncertainty associated with the channel roughness and upstream flow input, of six reaches located in Indiana and Texas of the United States. Specifically, the effects of different prior distributions of the uncertainty sources, multiple high‐flow scenarios, and various types of measurement errors in observations on the evaluation metrics are investigated using bootstrapping. Results show that the model performances based on the uniform and normal priors are comparable. The statistical distributions of all the evaluation metrics in this study are significantly different under different high‐flow scenarios, thus suggesting that the metrics should be treated as “random” variables due to both aleatory and epistemic uncertainties and conditioned on the specific flow periods of interest. Additionally, the white‐noise error in observations has the least impact on the metrics.

show abstract

When best is the enemy of good – critical evaluation of performance criteria in hydrological models

Cited by 5 publications

References 34 publications

Investigating the Model Hypothesis Space: Benchmarking Automatic Model Structure Identification With a Large Model Ensemble

Investigating the Model Hypothesis Space: Benchmarking Automatic Model Structure Identification With a Large Model Ensemble

Surface-subsurface interaction analysis and the influence of precipitation spatial variability on a lowland mesoscale catchment

Beyond a fixed number: Investigating uncertainty in popular evaluation metrics of ensemble flood modeling using bootstrapping analysis

Contact Info

Product

Resources

About