Abstract. Continental- to global-scale hydrologic and land surface models increasingly include representations of the groundwater system. Such large-scale models are essential for examining, communicating, and understanding the dynamic interactions between the Earth system above and below the land surface as well as the opportunities and limits of groundwater resources. We argue that both large-scale and regional-scale groundwater models have utility, strengths, and limitations, so continued modeling at both scales is essential and mutually beneficial. A crucial quest is how to evaluate the realism, capabilities, and performance of large-scale groundwater models given their modeling purpose of addressing large-scale science or sustainability questions as well as limitations in data availability and commensurability. Evaluation should identify if, when, or where large-scale models achieve their purpose or where opportunities for improvements exist so that such models better achieve their purpose. We suggest that reproducing the spatiotemporal details of regional-scale models and matching local data are not relevant goals. Instead, it is important to decide on reasonable model expectations regarding when a large-scale model is performing “well enough”
in the context of its specific purpose. The decision of reasonable
expectations is necessarily subjective even if the evaluation criteria are
quantitative. Our objective is to provide recommendations for improving the
evaluation of groundwater representation in continental- to global-scale
models. We describe current modeling strategies and evaluation practices,
and we subsequently discuss the value of three evaluation strategies: (1) comparing model outputs with available observations of groundwater levels or
other state or flux variables (observation-based evaluation), (2) comparing
several models with each other with or without reference to actual
observations (model-based evaluation), and (3) comparing model behavior with
expert expectations of hydrologic behaviors in particular regions or at
particular times (expert-based evaluation). Based on evolving practices in
model evaluation as well as innovations in observations, machine learning,
and expert elicitation, we argue that combining observation-, model-, and
expert-based model evaluation approaches, while accounting for
commensurability issues, may significantly improve the realism of
groundwater representation in large-scale models, thus advancing our ability
for quantification, understanding, and prediction of crucial Earth science
and sustainability problems. We encourage greater community-level
communication and cooperation on this quest, including among global
hydrology and land surface modelers, local to regional hydrogeologists, and
hydrologists focused on model development and evaluation.