Gaussian Process Molecule Property Prediction with FlowMO

Moss, Henry B.; Griffiths, Ryan-Rhys

doi:10.48550/arxiv.2010.01118

Cited by 13 publications

(14 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gaussian processes confer a Bayesian nonparametric framework to model general time-series data (Roberts et al 2013;Tobar et al 2015) and have proven effective in tasks such as periodicity detection (Durrande et al 2016) and spectral density estimation (Tobar 2018). More broadly, Gaussian processes (GPs) have recently demonstrated modeling success across a wide range of spatial and temporal application domains including robotics (Deisenroth & Rasmussen 2011;Greeff & Schoellig 2020), Bayesian optimization (Shahriari et al 2015;Grosnit et al 2020;Cowen-Rivers et al 2021;Grosnit et al 2021), as well as areas of the natural sciences such as molecular machine learning (Nigam et al 2021;Griffiths & Hernández-Lobato 2020;Moss & Griffiths 2020;Thawani et al 2020;Griffiths et al 2021;Hase et al 2020, Bartók et al 2010, genetics , and materials science (Cheng et al 2020;Zhang et al 2020). In the context of astrophysics there is a recent trend favoring nonparametric models such as GPs due to the flexibility afforded when specifying the underlying data modeling assumptions.…”

Section: Introductionmentioning

confidence: 99%

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Griffiths

Jiang

Buisson

et al. 2021

ApJ

View full text Add to dashboard Cite

The optical and UV variability of the majority of active galactic nuclei may be related to the reprocessing of rapidly changing X-ray emission from a more compact region near the central black hole. Such a reprocessing model would be characterized by lags between X-ray and optical/UV emission due to differences in light travel time. Observationally, however, such lag features have been difficult to detect due to gaps in the lightcurves introduced through factors such as source visibility or limited telescope time. In this work, Gaussian process regression is employed to interpolate the gaps in the Swift X-ray and UV lightcurves of the narrow-line Seyfert 1 galaxy Mrk 335. In a simulation study of five commonly employed analytic Gaussian process kernels, we conclude that the Matern 1 2 and rational quadratic kernels yield the most well-specified models for the X-ray and UVW2 bands of Mrk 335. In analyzing the structure functions of the Gaussian process lightcurves, we obtain a broken power law with a break point at 125 days in the UVW2 band. In the X-ray band, the structure function of the Gaussian process lightcurve is consistent with a power law in the case of the rational quadratic kernel while a broken power law with a break point at 66 days is obtained from the Matern 1 2 kernel. The subsequent cross-correlation analysis is consistent with previous studies and furthermore shows tentative evidence for a broad X-ray-UV lag feature of up to 30 days in the lag-frequency spectrum where the significance of the lag depends on the choice of Gaussian process kernel.Unified Astronomy Thesaurus concepts: Accretion (14); Galaxy accretion disks (562); Active galaxies (17); Gaussian Processes regression (1930); Bayesian statistics (1900); Galaxy nuclei (609)

show abstract

Section: Introductionmentioning

confidence: 99%

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Griffiths

Jiang

Buisson

et al. 2021

ApJ

View full text Add to dashboard Cite

show abstract

“…via active learning 84 and Bayesian optimisation. The condence-error curves in the ESI † show initial promise in this direction and indeed understanding how best to tailor calibrated Bayesian models to molecular representations 65,85 is an avenue worthy of pursuit. We release our curated dataset and all code to train models at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset in order that the photoswitch community may derive benet from our work.…”

Section: Discussionmentioning

confidence: 94%

“…Practical advantages of GPs for molecular datasets include the fact that they have few hyperparameters to tune and maintain uncertainty estimates over property values. [63][64][65] A GP is dened as a collection of random variables, {f(x 1 ), f(x 2 ), .} any nite subset of which are distributed according to a multivariate Gaussian.…”

Section: Gaussian Processesmentioning

confidence: 99%

Data-driven discovery of molecular photoswitches with multioutput Gaussian processes

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…As for the molecular property prediction, the SPOC models gave comparable results with literature models in these datasets. [38] Taking the Freesolv dataset as an example (Fig-ure 6b), SPOC performs significantly better (test RMSE 1.03 kcal/ mol) than the Extended-Connectivity fingerprints (ECFP) based models (Kernel Ridge Regression (KRR), RF, GP, [39] XGBoost [40] ), and better than most of the graph-based models (Directed acyclic graph model (DAG), [41] Graph Convolutional model (GC), Information Maximizing Graph Neural Networks (EIGNN), [42] Weave, Message Passing Neural Network (MPNN) [43] and baseline Chemception models [44] ), and is only inferior to specifically designed graph networks such as EAGNN [42] and Attentive FP, [45] and SMILES-X [46] by using SMILES as model input directly. It should be noted that the FreeSolv datasets comprise both experimental and calculation values by molecular dynamics simulation, and the RMSE error of the simulation-based method is ~1.5 kcal/mol.…”

Section: Discussionmentioning

confidence: 99%

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine‐Learning Prediction of Chemical Reaction and Molecular Properties

et al. 2022

View full text Add to dashboard Cite

Feature representations, or descriptors, are machines’ chemical language that largely shapes the prediction capability, generalizability and interpretability of machine learning models. To develop a generally applicable descriptor is highly warranted for chemists to deal with conventional prediction tasks in the context of sparsely distributed and small datasets. Inspired by the chemist's vision on molecules, we presented herein an ensemble descriptor, SPOC, curated on the principles of physical organic chemistry that integrates Structure and Physicochemical property (SPOC) of a molecule. SPOC could be readily constructed by combining molecular fingerprints, representing the structure of a given molecule, and molecular physicochemical properties extracted from RDKit or Mordred molecular descriptors. The applicability of SPOC was fully surveyed in a range of well‐structured chemical databases with machine learning tasks varying from regression to classifications.

show abstract

Gaussian Process Molecule Property Prediction with FlowMO

Cited by 13 publications

References 25 publications

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Modeling the Multiwavelength Variability of Mrk 335 Using Gaussian Processes

Data-driven discovery of molecular photoswitches with multioutput Gaussian processes

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine‐Learning Prediction of Chemical Reaction and Molecular Properties

Contact Info

Product

Resources

About