2018
DOI: 10.1021/acscentsci.7b00572
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

Abstract: We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

16
2,878
3
8

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 2,445 publications
(2,915 citation statements)
references
References 46 publications
16
2,878
3
8
Order By: Relevance
“…Deep Patient, an unsupervised deep feature learning method based on Stacked DAE, was developed to predict future of patients with different cancers from Electronic Health Records data (Miotto et al, 2016). VAE and AAE were also successfully utilized in designing new molecules with desired properties for drug discovery purposes (Gómez-Bombarelli et al, 2016;Kadurin et al, 2017). Moreover, VAE was also able to capture patterns in the gene expression pan-cancer data for specific tissues (Way and Greene, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Deep Patient, an unsupervised deep feature learning method based on Stacked DAE, was developed to predict future of patients with different cancers from Electronic Health Records data (Miotto et al, 2016). VAE and AAE were also successfully utilized in designing new molecules with desired properties for drug discovery purposes (Gómez-Bombarelli et al, 2016;Kadurin et al, 2017). Moreover, VAE was also able to capture patterns in the gene expression pan-cancer data for specific tissues (Way and Greene, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…For this purpose, the use of Gaussian mixture models and cluster-wise MLR left considerable room for improvement, due to its multi-parametric nature and tendency of overfitting if training data were organized into large number of clusters 14 . Recently, autoencoder modeling was proposed as an approach for two-stage inverse QSAR 16 . Continuous latent space, corresponding to a descriptor space, is constructed on the basis of encoding a line notation of a molecule by recurrent neural networks (RNNs).…”
Section: Introductionmentioning
confidence: 99%
“…As such, the approach does not depend on chosen descriptors and has the potential to automatically address two-stage inverse QSAR in a single step. However, the generation of new valid line notations (SMILES strings) for chemical structures corresponding to optimized coordinates was difficult in a case study designing organic light-emitting diodes 16 .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Settimo et al (2013) have recently shown that the empirical methods can fail for some amines, which represent a large fraction of drugs currently on the market or in development. This problem could make these methods difficult to apply to computational exploration of chemical space (Rupakheti et al, 2016;Gómez-Bombarelli et al, 2016) where molecules with completely new chemical substructures are likely to be encountered.…”
Section: Introductionmentioning
confidence: 99%