2023
DOI: 10.1002/minf.202300128
|View full text |Cite
|
Sign up to set email alerts
|

Data‐driven approaches for identifying hyperparameters in multi‐step retrosynthesis

Annie M. Westerlund,
Bente Barge,
Lewis Mervin
et al.

Abstract: Multi‐step retrosynthesis problem can be solved by a search algorithm, such as Monte Carlo tree search (MCTS). The performance of multistep retrosynthesis, as measured by a trade‐off in search time and route solvability, therefore depends on the hyperparameters of the search algorithm. In this paper, we demonstrated the effect of three MCTS hyperparameters (number of iterations, tree depth, and tree width) on metrics such as Linear integrated speed‐accuracy score (LISAS) and Inverse efficiency score which cons… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 38 publications
0
3
0
Order By: Relevance
“…For ChEMBL, we only find solutions to 71% of the targets, somewhat lower than the AstraZeneca sets. However, this could be explained by the extended number of iterations used for the AstraZeneca sets as recommended in a recent study of search hyperparameters [ 61 ]. For the GDB set we only find solutions to about 10%, highlighting a disconnection of the current template-based model trained on historical reaction data, with the chemistry needed to find synthesis routes for the enumerated, and therefore potentially non-synthesizable, GDB compounds.…”
Section: Resultsmentioning
confidence: 99%
“…For ChEMBL, we only find solutions to 71% of the targets, somewhat lower than the AstraZeneca sets. However, this could be explained by the extended number of iterations used for the AstraZeneca sets as recommended in a recent study of search hyperparameters [ 61 ]. For the GDB set we only find solutions to about 10%, highlighting a disconnection of the current template-based model trained on historical reaction data, with the chemistry needed to find synthesis routes for the enumerated, and therefore potentially non-synthesizable, GDB compounds.…”
Section: Resultsmentioning
confidence: 99%
“…As such, the number of iterations is not a limiting factor for the template-based model. The hyperparameters for the MCTS search, including the number of iterations, were set to their default values since our intention was to evaluate the Chemformer within the production framework . For the Chemformer predictions, we used a beam size of 10, whereas for the template-based model, we added the top-50 predictions to the search tree.…”
Section: Methodsmentioning
confidence: 99%
“…The maximum search time was set to 300 s for both the template-based model and Chemformer. According to our recent paper on hyperparameter-tuning MCTS, there is only a very minor difference in solvability when setting the number of iterations to 100 compared to 400, while the search time is often beyond 300 s for 400 iterations. As such, the number of iterations is not a limiting factor for the template-based model.…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, the configuration of the model significantly influences the success of route discovery. 99 Template reactions model a finite number of transformations, which may lack exhaustiveness. As an alternative to template-based formalism, template-free singlestep approaches have also been employed in MCTS.…”
Section: Monte Carlo Tree Search (Mcts)mentioning
confidence: 99%
“…The data source for building templates is an important factor, as highlighted by Thakkar et al, wherein the impact of four data sets (AiZynthFinder, Pistachio, Reaxys, and USPTO) on MCTS performance is investigated. Furthermore, the configuration of the model significantly influences the success of route discovery . Template reactions model a finite number of transformations, which may lack exhaustiveness.…”
Section: Multistep Retrosynthesismentioning
confidence: 99%