2019
DOI: 10.1021/acs.jctc.8b00959
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Accurate Uncertainty Estimation in Chemical Machine Learning

Abstract: We present a scheme to obtain an inexpensive and reliable estimate of the uncertainty associated with the predictions of a machine-learning model of atomic and molecular properties. The scheme is based on resampling, with multiple models being generated based on sub-sampling of the same training data. The accuracy of the uncertainty prediction can be benchmarked by maximum likelihood estimation, which can also be used to correct for correlations between resampled models, and to improve the performance of the u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
142
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 137 publications
(142 citation statements)
references
References 57 publications
0
142
0
Order By: Relevance
“…The SOAP-GPR framework is robust, easily trained, has recently been generalised to the prediction of tensorial properties such as (anisotropic) chemical shielding tensors [70]. Furthermore, it provides accurate estimates of prediction uncertainty [67]. These are particularly important in this context, not only to estimate the reliability of assignments, but also because GIPAW calculations can at times yield unreliable results, and the ML model can be improved by automatically discarding problematic training data (see appendix A).…”
Section: Methodsmentioning
confidence: 99%
“…The SOAP-GPR framework is robust, easily trained, has recently been generalised to the prediction of tensorial properties such as (anisotropic) chemical shielding tensors [70]. Furthermore, it provides accurate estimates of prediction uncertainty [67]. These are particularly important in this context, not only to estimate the reliability of assignments, but also because GIPAW calculations can at times yield unreliable results, and the ML model can be improved by automatically discarding problematic training data (see appendix A).…”
Section: Methodsmentioning
confidence: 99%
“…where I and J run over the training structures. As detailed in [40], this strategy is however not very practical because of its computational expense, so that other kind of methods such as bootstrapping or subsampling can rather be used to estimate the prediction errors. In addition, in the particular case of the present work one would like to propagate the error that occurs in the prediction of a to the Raman spectrum.…”
Section: Errors and Uncertainty Estimationsmentioning
confidence: 99%
“…This is of course not true in general, so that one needs to correct the model to take into account for the underlying correlations. Following [40], a maximum likelihood recipe can be adopted to linearly scale the variance of the predictions by a constant factor ν 2 . The calibration of this scaling factor is carried out by computing the actual prediction errors of the polarizabilites over a suitably selected validation set N val , for which the reference polarizabilities are known, and then considering where σ 2 ( j) are the variances of the predicted polarizabilities.…”
Section: Errors and Uncertainty Estimationsmentioning
confidence: 99%
“…Their scheme is based on resampling wherein they generate multiple models based on subsampling of the same training data. They benchmark the accuracy of the uncertainty prediction by maximum likelihood estimation which in turn can correct for correlations between resampled models and improve performance of the uncertainty estimation via a cross-validation procedure [120]. By tracking model uncertainty during the MD simulation, a call to the high-fidelity (but expensive) DFT calculation can made when the system drifts to a configuration where the model is uncertain in the energy/force prediction beyond a certain threshold [102,121,122].…”
Section: Iterative Improvement Of Ff Using Active and Transfer Learningmentioning
confidence: 99%