2023
DOI: 10.1021/acscentsci.3c00372
|View full text |Cite
|
Sign up to set email alerts
|

Unbiasing Retrosynthesis Language Models with Disconnection Prompts

Abstract: Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
21
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(22 citation statements)
references
References 33 publications
1
21
0
Order By: Relevance
“…In the present work, we removed the tagging information, and reactions were remapped and retagged using our new SMILES tagging strategy and syntax. The same dataset split for training, validation, and test (90 : 5 : 5), as shared by Thakkar et al 30 was used across all models resulting in 1 139 608, 63 672 and 63 454 reactions respectively.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the present work, we removed the tagging information, and reactions were remapped and retagged using our new SMILES tagging strategy and syntax. The same dataset split for training, validation, and test (90 : 5 : 5), as shared by Thakkar et al 30 was used across all models resulting in 1 139 608, 63 672 and 63 454 reactions respectively.…”
Section: Methodsmentioning
confidence: 99%
“…A given template can contain multiple disconnected sets of reactive atoms. Finally, the transformer model AutoTag reported by Thakkar et al 30 was trained from untagged SMILES to the corresponding tagged molecule to provide additional tagging examples.…”
Section: Methodsmentioning
confidence: 99%
“…They retrieve synthons from a predefined library and then employ a transformer decoder to complete the full molecule reactants. Similarly drawing inspiration from graph models, Thakkar et al initially predict disconnection bonds and incorporate additional features. Moreover, prior research consistently finds that providing atom mapping is beneficial for chemical reaction modeling .…”
Section: Mainmentioning
confidence: 99%
“…However, in real cases, multiple sets of precursors could be transformed into target structures. By using a prompt-based method, the model incorporates human intervention to limit the prediction to a constrained subspace.…”
Section: Mainmentioning
confidence: 99%
“…24 Since the first implementations of a retrosynthesis transformer model, 11,24 several enhancements and developments have been presented, including a diversity-enhanced transformer, 18 triple-transformer validation loop, 17 and disconnection-prompted transformer. 20 Although earlier studies show great promise for retrosynthesis transformers, there are still questions to address before introducing a transformer model like Chemformer into the AiZynthFinder production platform. 31 First, most studies consider models trained on the United States Patent and Trademark Office (USPTO) data.…”
Section: ■ Introductionmentioning
confidence: 99%