2020
DOI: 10.48550/arxiv.2010.01118
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gaussian Process Molecule Property Prediction with FlowMO

Henry B. Moss,
Ryan-Rhys Griffiths

Abstract: We present FlowMO: an open-source Python library for molecular property prediction with Gaussian Processes. Built upon GPflow and RDKit, FlowMO enables the user to make predictions with well-calibrated uncertainty estimates, an output central to active learning and molecular design applications. Gaussian Processes are particularly attractive for modelling small molecular datasets, a characteristic of many real-world virtual screening campaigns where high-quality experimental data is scarce. Computational exper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…Gaussian processes confer a Bayesian nonparametric framework to model general time-series data (Roberts et al 2013;Tobar et al 2015) and have proven effective in tasks such as periodicity detection (Durrande et al 2016) and spectral density estimation (Tobar 2018). More broadly, Gaussian processes (GPs) have recently demonstrated modeling success across a wide range of spatial and temporal application domains including robotics (Deisenroth & Rasmussen 2011;Greeff & Schoellig 2020), Bayesian optimization (Shahriari et al 2015;Grosnit et al 2020;Cowen-Rivers et al 2021;Grosnit et al 2021), as well as areas of the natural sciences such as molecular machine learning (Nigam et al 2021;Griffiths & Hernández-Lobato 2020;Moss & Griffiths 2020;Thawani et al 2020;Griffiths et al 2021;Hase et al 2020, Bartók et al 2010, genetics , and materials science (Cheng et al 2020;Zhang et al 2020). In the context of astrophysics there is a recent trend favoring nonparametric models such as GPs due to the flexibility afforded when specifying the underlying data modeling assumptions.…”
Section: Introductionmentioning
confidence: 99%
“…Gaussian processes confer a Bayesian nonparametric framework to model general time-series data (Roberts et al 2013;Tobar et al 2015) and have proven effective in tasks such as periodicity detection (Durrande et al 2016) and spectral density estimation (Tobar 2018). More broadly, Gaussian processes (GPs) have recently demonstrated modeling success across a wide range of spatial and temporal application domains including robotics (Deisenroth & Rasmussen 2011;Greeff & Schoellig 2020), Bayesian optimization (Shahriari et al 2015;Grosnit et al 2020;Cowen-Rivers et al 2021;Grosnit et al 2021), as well as areas of the natural sciences such as molecular machine learning (Nigam et al 2021;Griffiths & Hernández-Lobato 2020;Moss & Griffiths 2020;Thawani et al 2020;Griffiths et al 2021;Hase et al 2020, Bartók et al 2010, genetics , and materials science (Cheng et al 2020;Zhang et al 2020). In the context of astrophysics there is a recent trend favoring nonparametric models such as GPs due to the flexibility afforded when specifying the underlying data modeling assumptions.…”
Section: Introductionmentioning
confidence: 99%
“…via active learning 84 and Bayesian optimisation. The condence-error curves in the ESI † show initial promise in this direction and indeed understanding how best to tailor calibrated Bayesian models to molecular representations 65,85 is an avenue worthy of pursuit. We release our curated dataset and all code to train models at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset in order that the photoswitch community may derive benet from our work.…”
Section: Discussionmentioning
confidence: 94%
“…Practical advantages of GPs for molecular datasets include the fact that they have few hyperparameters to tune and maintain uncertainty estimates over property values. [63][64][65] A GP is dened as a collection of random variables, {f(x 1 ), f(x 2 ), .} any nite subset of which are distributed according to a multivariate Gaussian.…”
Section: Gaussian Processesmentioning
confidence: 99%
“…As for the molecular property prediction, the SPOC models gave comparable results with literature models in these datasets. [38] Taking the Freesolv dataset as an example (Fig-ure 6b), SPOC performs significantly better (test RMSE 1.03 kcal/ mol) than the Extended-Connectivity fingerprints (ECFP) based models (Kernel Ridge Regression (KRR), RF, GP, [39] XGBoost [40] ), and better than most of the graph-based models (Directed acyclic graph model (DAG), [41] Graph Convolutional model (GC), Information Maximizing Graph Neural Networks (EIGNN), [42] Weave, Message Passing Neural Network (MPNN) [43] and baseline Chemception models [44] ), and is only inferior to specifically designed graph networks such as EAGNN [42] and Attentive FP, [45] and SMILES-X [46] by using SMILES as model input directly. It should be noted that the FreeSolv datasets comprise both experimental and calculation values by molecular dynamics simulation, and the RMSE error of the simulation-based method is ~1.5 kcal/mol.…”
Section: Discussionmentioning
confidence: 99%