2019
DOI: 10.1021/acs.jpca.9b01398
|View full text |Cite
|
Sign up to set email alerts
|

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

Abstract: Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
29
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(32 citation statements)
references
References 55 publications
1
29
0
Order By: Relevance
“…Thus, for multi-layer ANNs, feature-based proximity can be very different from the intrinsic relationship between points in the model. Such ideas have been explored in generative modeling where distances in auto-encoded latent representations have informed chemical diversity55,56 and in anomaly detection with separate models57,58 ( e.g. , autoencoders59–61 or nearest-neighbor classifiers62,63) have enabled identification of ‘poisoned’ input data 64.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, for multi-layer ANNs, feature-based proximity can be very different from the intrinsic relationship between points in the model. Such ideas have been explored in generative modeling where distances in auto-encoded latent representations have informed chemical diversity55,56 and in anomaly detection with separate models57,58 ( e.g. , autoencoders59–61 or nearest-neighbor classifiers62,63) have enabled identification of ‘poisoned’ input data 64.…”
Section: Introductionmentioning
confidence: 99%
“…For example, synthetic protocols have been optimized via the training of ML models on experimental reaction databases (USPTO, Reaxsys, and SciFinder) ( 6 ), while generative design strategies have enabled targeted small-molecule design ( 7 ). However, materials science often presents problems where substantially less data are available, thereby necessitating the development of creative approaches for navigating data-scarce regimes ( 8 , 9 ).…”
Section: Introductionmentioning
confidence: 99%
“…It has been shown capable of learning underlying relationships from a diverse set of molecular data by letting multiple data tasks and domains interact adaptively while generating the joint embeddings. Joint training is also a type of joint embedding that has been successfully applied to improve deep learning‐based molecule generation and enable transfer learning [25–29] . Joint training incorporates a property prediction task into a variational autoencoder (VAE) [30] and has been shown to organize points in the VAE latent space, making the latent space amenable to inverse molecular design and optimization [25,29] .…”
Section: Introductionmentioning
confidence: 99%
“…jointly trained a VAE using drug likeliness and synthetic accessibility, then performed Bayesian optimization in the resulting latent space to identify novel drug‐like molecules [25] . In addition to latent space organization, joint training provides a platform for knowledge transfer between abundant and scarce data tasks [27,29] . When more than one property is used to develop the model, this constitutes a multitask transfer learning approach.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation