A Transformer Model for Retrosynthesis

Karpov, Pavel; Godin, Guillaume; Tetko, Igor V.

doi:10.26434/chemrxiv.8058464.v1

Cited by 59 publications

(119 citation statements)

References 19 publications

(41 reference statements)

Supporting

Mentioning

117

Contrasting

Order By: Relevance

“…Other rules for handling most common two-letters elements, charges, and stereochemistry also are used for preparing the input for the neural network. According to our experience, the use of more complicated schemes instead of simple character-level tokenization did not increase the accuracy of models [30]. Therefore a simple character-level tokenization was used in this study.…”

Section: Model Inputmentioning

confidence: 99%

“…Each pair contained on the left side a non-canonical, and on the right side-a canonical SMILES for the same molecule. Such an arrangement of the training dataset allowed us to reuse the previous Transformer code, which was originally applied for retrosynthetic tasks [30]. For completeness, we added for every compound a line where both left and right sides were identical, i.e.…”

Section: Smiles Canonicalization Model Datasetmentioning

confidence: 99%

“…Our model utilized a three layer architecture of Transformer with 10 blocks of self-attention, i.e. the same one as used in our previous study [30]. After the encoding process was finished, the output of the top encoder layer contained a representation of a molecule suitable for decoding into canonical SMILES.…”

Section: Transformer Modelmentioning

confidence: 99%

“…The whole architecture shows a significant speed-up during training and inference with improved accuracy over translation benchmarks. The Transformer model was applied for prediction of reaction outcomes [29] and for retrosynthesis [30].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

2020

Self Cite

View full text Add to dashboard Cite

We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https ://githu b.com/bigch em/trans forme r-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https ://ochem .eu) hosts the on-line implementation of the method proposed.

show abstract

Section: Model Inputmentioning

confidence: 99%

Section: Smiles Canonicalization Model Datasetmentioning

confidence: 99%

Section: Transformer Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…In contrast to the reaction prediction task in the forward direction, where a defined set of reaction conditions should lead to a single distribution of product molecules, a single-step retrosynthetic prediction takes the form of a one-to-many mapping, where the target could theoretically be made through a variety of different individual reaction steps. Retrosynthesis has seen increased attention from the data science and cheminformatics communities recently with a number of machine learning efforts leveraging reaction templates or rules, [1][2][3][4] techniques adapted from natural language processing, [5][6][7][8] and graph based models. 9,10 However, only the template and rulebased methods are capable of making a connection from the prediction directly back to the source of the template or rule, which is most likely a reaction that was successfully performed in a laboratory.…”

Section: Introductionmentioning

confidence: 99%

Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

Fortunato

Coley

Barnes

et al. 2020

Preprint

View full text Add to dashboard Cite

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets.

show abstract