Models are used to predict and/or investigate and explain phenomena in nature. Often, many hypotheses exist for these two tasks. Naturally, the question arises, which of the competing modeling approaches predicts or explains nature best. Bayesian model selection (BMS, e.g., Wasserman, 2000) is a statistical method that uses observed data to select between competing models. BMS is settled in a rigorous probabilistic framework and follows the scheme of Bayesian updating: A prior belief about the plausibility of each candidate model is updated to a posterior model weight in the light of measured data (i.e., the probability of the model to have generated the data, given the model set). Posterior model weights are then used as a basis for Bayesian model ranking, selection, or averaging (BMA, Hoeting et al., 1999).To help with the interpretation of posterior model weights, the so-called model confusion matrix (MCM) has been introduced by Schöniger, Illman, et al. (2015). It reveals whether a lack of confidence in model choice is due to similarity between the candidate models or due to weakly informative data. The MCM is a purely synthetic analysis that can be used as a scale of reference for model weights obtained from real data. Schäfer Rodrigues Silva et al. (2020) have recently extended the MCM analysis to identify the best surrogate model from a set of candidates to replace an expensive full-complexity model in stochastic analysis.Technically, the Bayesian updating procedure requires calculating the so-called Bayesian model evidence (BME). BME is the likelihood of a model to have generated the data, integrated over its whole parameter space and all involved probability distributions. While the likelihood accounts for uncertainty in measured data, the integration considers parameter uncertainty, and potentially also uncertainty in model drivers or boundary conditions. In some cases, the integration even accounts for statistical representations of model errors (Leube et al., 2012;Nowak et al., 2012), which is perceived by many studies to be part of the likelihood.