Population-based De Novo Molecule Generation, Using Grammatical Evolution

Yoshikawa, N.; Terayama, Kei; Sumita, Masato; Homma, Teruki; Oono, Kenta; Tsuda, Koji

doi:10.1246/cl.180665

Cited by 100 publications

(94 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…13,25 Yoshikawa et al proposed a method that evolves string molecular representations using mutations exploiting the SMILES context-free grammar. 87 For each goal-directed benchmark the 300 highest scoring molecules in the dataset are selected as the initial population. Each molecule is represented by 300 genes.…”

Section: Smiles Gamentioning

confidence: 99%

GuacaMol: Benchmarking Models for de Novo Molecular Design

Brown

Fiscato

Segler

et al. 2019

J. Chem. Inf. Model.

633

883

View full text Add to dashboard Cite

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking open-source Python code, and a leaderboard can be found on https://benevolent.ai/guacamol.

show abstract

Section: Smiles Gamentioning

confidence: 99%

GuacaMol: Benchmarking Models for de Novo Molecular Design

Brown

Fiscato

Segler

et al. 2019

J. Chem. Inf. Model.

633

883

View full text Add to dashboard Cite

show abstract

“…158 Since then, the development of better reward functions has greatly helped to mitigate such issues, but low diversity and novelty remains an issue. [159][160][161] After reviewing the work that has been done so far on reward function design, we conclude that good reward functions should lead to generated molecules which meet the following desiderata:…”

Section: Reward Function Designmentioning

confidence: 99%

Deep learning for molecular design—a review of the state of the art

Elton

Boukouvalas

Fuge

et al. 2019

Mol. Syst. Des. Eng.

536

516

View full text Add to dashboard Cite

In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.

show abstract

“…The literature concerning generative models of molecules has exploded since the first work on the topic Gómez-Bombarelli et al [2018]. Current methods feature molecular representations such as SMILES [Janz et al, 2018, Segler et al, 2017, Skalic et al, 2019, Ertl et al, 2017, Lim et al, 2018, Kang and Cho, 2018, Sattarov et al, 2019, Gupta et al, 2018, Harel and Radinsky, 2018, Yoshikawa et al, 2018, Bjerrum and Sattarov, 2018, Mohammadi et al, 2019 and graphs [Simonovsky and Komodakis, 2018, Li et al, 2018a, De Cao and Kipf, 2018, Kusner et al, 2017, Dai et al, 2018, Samanta et al, 2019, Li et al, 2018b, Kajino, 2019, Jin et al, 2019, Bresson and Laurent, 2019, Lim et al, 2019, Pölsterl and Wachinger, 2019, Krenn et al, 2019, Maziarka et al, 2019, Madhawa et al, 2019, Shen, 2018, Korovina et al, 2019 In this section we conduct an empirical test of the hypothesis from [Gómez-Bombarelli et al, 2018] that the decoder's lack of efficiency is due to data point collection in "dead regions" of the latent space far from the data on which the VAE was trained. We use this information to construct a binary classification Bayesian Neural Network (BNN) to serve as a constraint function that outputs the probability of a latent point being valid, the details of which will be discussed in the section on labelling criteria.…”

Section: Related Workmentioning

confidence: 99%

Constrained Bayesian optimization for automatic chemical design using variational autoencoders

2020

View full text Add to dashboard Cite

Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this class of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.

show abstract

Population-based De Novo Molecule Generation, Using Grammatical Evolution

Cited by 100 publications

References 22 publications

GuacaMol: Benchmarking Models for de Novo Molecular Design

GuacaMol: Benchmarking Models for de Novo Molecular Design

Deep learning for molecular design—a review of the state of the art

Constrained Bayesian optimization for automatic chemical design using variational autoencoders

Contact Info

Product

Resources

About