GEN: highly efficient SMILES explorer using autodidactic generative examination networks

Deursen, Ruud van; Ertl, Peter; Tetko, Igor V.; Godin, Guillaume

doi:10.1186/s13321-020-00425-8

Cited by 25 publications

(20 citation statements)

References 35 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though it is possible to generate novel compounds with the desired properties, the resulting solutions often lack chemical diversity [ 23 – 25 ]. Deursen et al proposed to address this issue with the introduction of Generative Examination Networks (GEN), which perform statistical analysis of the generated compounds during training [ 26 ]. However, their study does did not include the application of this approach in any pre-defined optimization scenario.…”

Section: Introductionmentioning

confidence: 99%

Memory-assisted reinforcement learning for diverse molecular de novo design

et al. 2020

View full text Add to dashboard Cite

In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Memory-assisted reinforcement learning for diverse molecular de novo design

et al. 2020

View full text Add to dashboard Cite

show abstract

“…In principle, any kind of deep network might be used for the encoding, and the same or any other kind for the decoding [115]. In this case, the input (encoder) network [27] was mainly a CNN while the output used a specific type of RNN called a gated recurrent unit [116,117]. The latent space used [27] was mainly of 196 dimensions, and the VAE was trained to reproduce its inputs at the outputs (another module from RDKit was used to filter invalid SMILES strings).…”

Section: A Specific Examplementioning

confidence: 99%

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

Kell

Samanta

Swainston

2020

Biochemical Journal

View full text Add to dashboard Cite

The number of ‘small’ molecules that may be of interest to chemical biologists — chemical space — is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved ‘forward’ problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). ‘Deep’ (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.

show abstract

“…More recently, it was recognised that various kinds of architectures could, in fact, permit the reversal of this numerical encoding so as to return a molecule (or its SMILES string encoding a unique structure). These are known as generative methods [ 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 ], and at heart their aim to generate a suitable and computationally useful representation [ 56 ] of the input data. It is common (but cf.…”

Section: Introductionmentioning

confidence: 99%

“…[ 90 , 91 , 92 , 93 , 94 ]), we use backpropagation to update the network so as to minimise the difference between the predicted and the desired output, subject to any other constraints that we may apply. We also recognise the importance of various forms of regularisation, that are all designed to prevent overfitting [ 49 , 95 , 96 , 97 , 98 ].…”

Section: Introductionmentioning

confidence: 99%

VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

et al. 2020

View full text Add to dashboard Cite

Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.

show abstract

GEN: highly efficient SMILES explorer using autodidactic generative examination networks

Cited by 25 publications

References 35 publications

Memory-assisted reinforcement learning for diverse molecular de novo design

Memory-assisted reinforcement learning for diverse molecular de novo design

Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently

VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

Contact Info

Product

Resources

About