“…Ignoring these assumptions and limitations can lead to contradictory results and confusion about model evaluation (Bennett et al, 2013; Castaneda‐Gonzalez et al, 2018; Criss & Winston, 2008; Knoben et al, 2019; Koskinen et al, 2017). Although several modifications regarding the existing metrics of R 2 (Legates & McCabe Jr, 1999; Onyutha, 2022), NSE (Criss & Winston, 2008; Duc & Sawada, 2023; Mathevet et al, 2006), and KGE (Cinkus et al, 2023; Kling et al, 2012; Lamontagne et al, 2020; Liu, 2020; Pool et al, 2018) have been proposed, these revised versions have not been widely accepted and there is still no broad consensus on how to evaluate the performance of hydrologic and hydraulic models by using an appropriate criterion given the availability and accuracy of observed hydrologic data, epistemic uncertainty in the modeling process (Beven & Lane, 2022; Clark et al, 2021; Huang & Merwade, 2023b; Knoben et al, 2019). Therefore, in order to evaluate the reliability and accuracy of flood model predictions, the pros and cons of multiple commonly used evaluation metrics for ensemble flood modeling are investigated and demonstrated in this study.…”