Abstract:The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score… Show more
“…In some instances, prediction errors due to model inadequacy can be handled by statistical correction of predictions, which may provide a reliable uncertainty measure [20]. Various surrogate methods have been developed for the estimation of prediction uncertainty, such as bootstrap-based methods, Gaussian process regression, neural networks and deep learning ensembles [21][22][23]. Gaussian process regression has been employed to identify particular calculations within a given dataset for which the uncertainties exceed a given threshold [24,25].…”
Molecular dynamics simulation is now a widespread approach for understanding complex systems on the atomistic scale. It finds applications from physics and chemistry to engineering, life and medical science. In the last decade, the approach has begun to advance from being a computer-based means of rationalizing experimental observations to producing apparently credible predictions for a number of real-world applications within industrial sectors such as advanced materials and drug discovery. However, key aspects concerning the reproducibility of the method have not kept pace with the speed of its uptake in the scientific community. Here, we present a discussion of uncertainty quantification for molecular dynamics simulation designed to endow the method with better error estimates that will enable it to be used to report actionable results. The approach adopted is a standard one in the field of uncertainty quantification, namely using ensemble methods, in which a sufficiently large number of replicas are run concurrently, from which reliable statistics can be extracted. Indeed, because molecular dynamics is intrinsically chaotic, the need to use ensemble methods is fundamental and holds regardless of the duration of the simulations performed. We discuss the approach and illustrate it in a range of applications from materials science to ligand–protein binding free energy estimation.
This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification
in silico
’.
“…In some instances, prediction errors due to model inadequacy can be handled by statistical correction of predictions, which may provide a reliable uncertainty measure [20]. Various surrogate methods have been developed for the estimation of prediction uncertainty, such as bootstrap-based methods, Gaussian process regression, neural networks and deep learning ensembles [21][22][23]. Gaussian process regression has been employed to identify particular calculations within a given dataset for which the uncertainties exceed a given threshold [24,25].…”
Molecular dynamics simulation is now a widespread approach for understanding complex systems on the atomistic scale. It finds applications from physics and chemistry to engineering, life and medical science. In the last decade, the approach has begun to advance from being a computer-based means of rationalizing experimental observations to producing apparently credible predictions for a number of real-world applications within industrial sectors such as advanced materials and drug discovery. However, key aspects concerning the reproducibility of the method have not kept pace with the speed of its uptake in the scientific community. Here, we present a discussion of uncertainty quantification for molecular dynamics simulation designed to endow the method with better error estimates that will enable it to be used to report actionable results. The approach adopted is a standard one in the field of uncertainty quantification, namely using ensemble methods, in which a sufficiently large number of replicas are run concurrently, from which reliable statistics can be extracted. Indeed, because molecular dynamics is intrinsically chaotic, the need to use ensemble methods is fundamental and holds regardless of the duration of the simulations performed. We discuss the approach and illustrate it in a range of applications from materials science to ligand–protein binding free energy estimation.
This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification
in silico
’.
“…Overall, we consider the Gaussian fit a reasonable approximation to the empirical distribution, which is also reflected by the respective 95% confidence intervals: σ .95 = 1.70×10 −1 (normal distribution), Q .95 = 1.76 × 10 −1 (empirical distribution). The latter quantity refers to the distribution of absolute values of the residuals [28,29]. IV.…”
Section: Model Dispersion Vs Measurement Uncertaintymentioning
<p>Herbert Mayr’s research on reactivity scales tells a success story of how polar organic synthesis can be rationalized by a simple empirical relationship. In this work, we propose an extension to Mayr’s reactivity approach that is rooted in uncertainty quantification (UQ). It transforms the <i>unique</i> values of reactivity parameters (<i>s</i><sub>N</sub>, <i>N</i>, <i>E</i>) into value <i>distributions</i>. Through uncertainty propagation, these distributions can be exploited to quantify the uncertainty of bimolecular rate constants. Our UQ-based extension serves three purposes. First, predictions of polar organic reactivity can be transformed into testable hypotheses, which increases the overall reliability of the method and guides the exploration of new research directions. Second, it is also possible to quantify the discriminability of two competing reactions, which is particularly important if subtle reactivity differences matter. Third, since rate constant uncertainty can also be quantified for reactions that have yet to be observed, new opportunities arise for benchmarking computational chemistry methods (benchmarking <i>under uncertainty</i>). We demonstrate the functionality and performance of the UQ-extended reactivity approach at the example of the 2001/12 reference data set released by Mayr and co-workers [<i>J. Am. Chem. Soc.</i> <b>2001</b>, <i>123</i>, 9500; <i>J. Am. Chem. Soc.</i> <b>2012</b>, <i>134</i>, 13902]. As a by-product of the new approach, we obtain revised reactivity parameters for the electrophiles and the nucleophiles of the reference set.</p>
“…This lack of correlation supports the main message of this work: The number of fitted parameters does not represent an effective measure of the transferability of a functional. More reliable statistical criteria-such as those developed in this work, or alternatively, the probabilistic performance estimator recently introduced by Pernot and Savin [91,92] -should be used to evaluate the reliability of new and existing xc functionals.…”
Section: Evaluation Of 60 Exchange-correlation Functionalsmentioning
Counting parameters has become customary in the density functional theory community as a way to infer the transferability of popular approximations to the exchangecorrelation functionals. Recent work in data science, however, has demonstrated that the number of parameters of a fitted model is not related to the complexity of the model itself, nor to its eventual overfitting. Using similar arguments, here, we show that it is possible to represent every modern exchange-correlation functional approximations using just one single parameter. This procedure proves the futility of the number of parameters as a measure of transferability. To counteract this shortcoming, we introduce and analyze the performance of three statistical criteria for the evaluation of the transferability of exchange-correlation functionals. The three criteria are called Akaike information criterion, Vapnik-Chervonenkis criterion, and cross-validation criterion and are used in a preliminary assessment to rank 60 exchange-correlation functional approximations using the ASCDB database of chemical data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.