Data‐driven approaches for identifying hyperparameters in multi‐step retrosynthesis

Westerlund, Annie M.; Barge, Bente; Mervin, Lewis; Genheden, Samuel

doi:10.1002/minf.202300128

Cited by 4 publications

(6 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For ChEMBL, we only find solutions to 71% of the targets, somewhat lower than the AstraZeneca sets. However, this could be explained by the extended number of iterations used for the AstraZeneca sets as recommended in a recent study of search hyperparameters [ 61 ]. For the GDB set we only find solutions to about 10%, highlighting a disconnection of the current template-based model trained on historical reaction data, with the chemistry needed to find synthesis routes for the enumerated, and therefore potentially non-synthesizable, GDB compounds.…”

Section: Resultsmentioning

confidence: 99%

AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application

Saigiridharan,

Hassen,

Lai

et al. 2024

J Cheminform

Self Cite

View full text Add to dashboard Cite

We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder.

show abstract

Section: Resultsmentioning

confidence: 99%

AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application

Saigiridharan,

Hassen,

Lai

et al. 2024

J Cheminform

Self Cite

View full text Add to dashboard Cite

show abstract

“…As such, the number of iterations is not a limiting factor for the template-based model. The hyperparameters for the MCTS search, including the number of iterations, were set to their default values since our intention was to evaluate the Chemformer within the production framework . For the Chemformer predictions, we used a beam size of 10, whereas for the template-based model, we added the top-50 predictions to the search tree.…”

Section: Methodsmentioning

confidence: 99%

“…The maximum search time was set to 300 s for both the template-based model and Chemformer. According to our recent paper on hyperparameter-tuning MCTS, there is only a very minor difference in solvability when setting the number of iterations to 100 compared to 400, while the search time is often beyond 300 s for 400 iterations. As such, the number of iterations is not a limiting factor for the template-based model.…”

Section: Methodsmentioning

confidence: 99%

Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis

Westerlund,

Manohar Koki,

Kancharla

et al. 2024

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

Synthesis planning of new pharmaceutical compounds is a wellknown bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the templatebased model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.

show abstract

“…Furthermore, the configuration of the model significantly influences the success of route discovery. 99 Template reactions model a finite number of transformations, which may lack exhaustiveness. As an alternative to template-based formalism, template-free singlestep approaches have also been employed in MCTS.…”

Section: Monte Carlo Tree Search (Mcts)mentioning

confidence: 99%

“…The data source for building templates is an important factor, as highlighted by Thakkar et al, wherein the impact of four data sets (AiZynthFinder, Pistachio, Reaxys, and USPTO) on MCTS performance is investigated. Furthermore, the configuration of the model significantly influences the success of route discovery . Template reactions model a finite number of transformations, which may lack exhaustiveness.…”

Section: Multistep Retrosynthesismentioning

confidence: 99%

Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review

Gricourt,

Meyer,

Duigou

et al. 2024

ACS Synth. Biol.

View full text Add to dashboard Cite

Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retrobiosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.

show abstract

Data‐driven approaches for identifying hyperparameters in multi‐step retrosynthesis

Cited by 4 publications

References 38 publications

AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application

AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application

Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis

Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review

Contact Info

Product

Resources

About