Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS

Tom, Gary; Hickman, Riley J.; Zinzuwadia, Aniket; Mohajeri, Afshan; Sánchez-Lengeling, Benjamín; Aspuru‐Guzik, Alán

doi:10.1039/d2dd00146b

Cited by 16 publications

(26 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Empirically, GPs tend to be effective surrogate models for Bayesian optimization of molecules in the small-data regime. 109…”

Section: Resultsmentioning

confidence: 99%

“…, require a small number of examples to learn to make accurate predictions), and (iii) express well-calibrated uncertainty. 109,113,114 The acquisition function for experimental planning must balance exploration, exploitation, and cost. The surrogate model and acquisition function must be cheap to train and evaluate, respectively, relative to the simulations/experiments to evaluate the material property.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-fidelity Bayesian optimization of covalent organic frameworks for xenon/krypton separations

Gantzler,

Deshwal,

Doppa

et al. 2023

Digital Discovery

View full text Add to dashboard Cite

show abstract

“…Empirically, GPs tend to be effective surrogate models for Bayesian optimization of molecules in the small-data regime. 109…”

Section: Resultsmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Multi-fidelity Bayesian optimization of covalent organic frameworks for xenon/krypton separations

Gantzler,

Deshwal,

Doppa

et al. 2023

Digital Discovery

View full text Add to dashboard Cite

show abstract

“…Severity of the Distribution Shifts. The ultimate severity measure of the training to deployment covariate shift is the gap in performance and uncertainty calibration 17 (see Section 4.6.1). Due to the cyclic nature of the drug discovery process, uncertainty calibration is specifically important during deployment to effectively balance between exploration and exploitation.…”

Section: Resultsmentioning

confidence: 99%

Real-World Molecular Out-Of-Distribution: Specification and Investigation

Tossou,

Wognum,

Craig

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.

show abstract

“…Existing research, however, suggests that the application of BO can still help reach promising results even in those scenarios. 49 Despite these challenges, we demonstrate that augmenting BO with adequate reaction representations, initialisation schemes and appropriate surrogate models results in an efficient search towards the best-performing additives in less than 100 evaluations while using as little as ten initialisation reactions.…”

Section: Introductionmentioning

confidence: 97%

Bayesian optimisation for additive screening and yield improvements – beyond one-hot encoding

Ranković,

Griffiths,

Moss

et al. 2024

Digital Discovery

View full text Add to dashboard Cite

show abstract

Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS

Abstract: A toolkit for the study of the calibration, performance, and generalizability of probabilistic models and molecular featurizations for low-data chemical datasets.

Cited by 16 publications

References 81 publications

Multi-fidelity Bayesian optimization of covalent organic frameworks for xenon/krypton separations

Multi-fidelity Bayesian optimization of covalent organic frameworks for xenon/krypton separations

Real-World Molecular Out-Of-Distribution: Specification and Investigation

Bayesian optimisation for additive screening and yield improvements – beyond one-hot encoding

Contact Info

Product

Resources

About