Unbiasing Retrosynthesis Language Models with Disconnection Prompts

Thakkar, Amol; Vaucher, Alain C.; Byekwaso, Andrea; Schwaller, Philippe; Toniato, Alessandra; Laino, Teodoro

doi:10.1021/acscentsci.3c00372

Cited by 10 publications

(22 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the present work, we removed the tagging information, and reactions were remapped and retagged using our new SMILES tagging strategy and syntax. The same dataset split for training, validation, and test (90 : 5 : 5), as shared by Thakkar et al 30 was used across all models resulting in 1 139 608, 63 672 and 63 454 reactions respectively.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search

Kreutter,

Reymond

2023

Chem. Sci.

View full text Add to dashboard Cite

show abstract

Section: Methodsmentioning

confidence: 99%

“…A given template can contain multiple disconnected sets of reactive atoms. Finally, the transformer model AutoTag reported by Thakkar et al 30 was trained from untagged SMILES to the corresponding tagged molecule to provide additional tagging examples.…”

Section: Methodsmentioning

confidence: 99%

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search

Kreutter,

Reymond

2023

Chem. Sci.

View full text Add to dashboard Cite

show abstract

“…They retrieve synthons from a predefined library and then employ a transformer decoder to complete the full molecule reactants. Similarly drawing inspiration from graph models, Thakkar et al initially predict disconnection bonds and incorporate additional features. Moreover, prior research consistently finds that providing atom mapping is beneficial for chemical reaction modeling .…”

Section: Mainmentioning

confidence: 99%

“…However, in real cases, multiple sets of precursors could be transformed into target structures. By using a prompt-based method, the model incorporates human intervention to limit the prediction to a constrained subspace.…”

Section: Mainmentioning

confidence: 99%

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Ding,

Qiang,

Chen

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.

show abstract

“…24 Since the first implementations of a retrosynthesis transformer model, 11,24 several enhancements and developments have been presented, including a diversity-enhanced transformer, 18 triple-transformer validation loop, 17 and disconnection-prompted transformer. 20 Although earlier studies show great promise for retrosynthesis transformers, there are still questions to address before introducing a transformer model like Chemformer into the AiZynthFinder production platform. 31 First, most studies consider models trained on the United States Patent and Trademark Office (USPTO) data.…”

Section: ■ Introductionmentioning

confidence: 99%

Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis

Westerlund,

Manohar Koki,

Kancharla

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Synthesis planning of new pharmaceutical compounds is a wellknown bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the templatebased model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.

show abstract

Unbiasing Retrosynthesis Language Models with Disconnection Prompts

Cited by 10 publications

References 33 publications

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis

Contact Info

Product

Resources

About