2021
DOI: 10.26434/chemrxiv-2021-bpv0c
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction

Abstract: Computer aided synthesis planning is a rapidly growing field for suggesting synthetic routes for molecules of interest. The methods used are usually dependent on access to large datasets for training, but with a finite experimental budget there are limitations on how much data can be obtained from experiments. Active learning, which has been used in recent studies with success, is a strategy to identify which data points impact model accuracy the most. However, little has been done to explore the robustness of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 17 publications
(24 reference statements)
0
4
0
Order By: Relevance
“…The goal of finding desired reaction conditions is qualitatively different from the goals of other active learning studies for classification, which usually aim to minimize error in the domain of interest. 40,47 A similar formulation was used in the context of drug discovery (up to ∼17 500 molecules), where data points were selectively labeled based on the farthest distance from the support vector machine's classification hyperplane. 70 Another study 71 under a similar setting (with ∼100 million molecules) showed greedy selection to be effective at identifying molecules with the best docking scores.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The goal of finding desired reaction conditions is qualitatively different from the goals of other active learning studies for classification, which usually aim to minimize error in the domain of interest. 40,47 A similar formulation was used in the context of drug discovery (up to ∼17 500 molecules), where data points were selectively labeled based on the farthest distance from the support vector machine's classification hyperplane. 70 Another study 71 under a similar setting (with ∼100 million molecules) showed greedy selection to be effective at identifying molecules with the best docking scores.…”
Section: Discussionmentioning
confidence: 99%
“…38,39 Active learning is therefore well-suited for reaction development, which greatly benefits from efficient exploration and where chemists conduct the next batch of reactions based on previous experimental results. Based on this analogy, reaction optimization 27,28 and reaction condition identification 40 have been demonstrated to benefit from active learning. However, these prior works initiate exploration with randomly selected data points (Fig.…”
Section: Introductionmentioning
confidence: 99%
“…Active learning is well-suited for modeling reactions with carefully selected experiments, although its use is not widespread yet in chemical engineering [10]. In chemical engineering, the benefits of active learning have been investigated for drug discovery [10][11][12], catalysis [13][14][15], quantum chemical calculations [8,[16][17][18][19][20], polymer design [21], and retrosynthesis [22,23]. However, active learning has not yet been applied in reaction modeling to the best of our knowledge.…”
Section: Introductionmentioning
confidence: 99%
“…However, active learning has not yet been applied in reaction modeling to the best of our knowledge. To date, active learning has been predominantly used on simulated databases [8,14,17,18,20] or large experimental datasets (greater than1000 datapoints) [22,23]. This data-intensity makes it impossible for utilization in a real experimental campaign where data is limited due to the high cost and long duration of experiments.…”
Section: Introductionmentioning
confidence: 99%