Prediction of Reaction Yield for Buchwald‐Hartwig Cross‐coupling Reactions Using Deep Learning

Sato, Akinori; Miyao, Tomoyuki; Funatsu, Kimito

doi:10.1002/minf.202100156

Cited by 14 publications

(14 citation statements)

References 37 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Interestingly, however, regression models for Amine OOS split are superior with random forest methods. Consistent with previous work 11,15,[27][28][29][30][31] , random splitting of the data yields better models than any OOS scaffold splitting (Amine OOS, ArX OOS, Both OOS, Figure 4B, top). The performance of all three OOS cases for the stratified split was significantly inferior to the DRS strategy indicating that the model has a limited ability to extrapolate beyond molecules in its training set.…”

Section: Modellingsupporting

confidence: 88%

“…The overall goal of predictive models is to predict reaction failures and successes with high fidelity. Literature reports suggest that models trained on random splits of HTE data generally perform well within the modeled datasets, 11,15,[27][28][29][30][31] an observation that is hypothesized to be a result of hidden patterns in the dataset and bias in its construction. 43 However, extending models to unseen structures is often difficult and limited by narrow substrate scopes.…”

Section: Modellingmentioning

confidence: 99%

“…Modeling the yield of these datasets (4K C-N couplings 15 , or 2K Suzuki-Miyaura couplings in flow 14 ) produces predictive models with R 2 or AUROC > 0.9. 11,15,[27][28][29][30][31][32][33][34] However, models trained on these datasets demonstrate limited ability to extrapolate beyond the molecules in their training sets, in part due to the minimal structural diversity in the dataset.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Roadmap to Pharmaceutically Relevant Reactivity Models Leveraging High-Throughput Experimentation

Kalyani

Struble

et al. 2022

Preprint

View full text Add to dashboard Cite

The merger of High-Throughput Experimentation (HTE) and data science presents an opportunity to both accelerate and inspire innovations in synthetic chemistry. Similarly, developments in machine learning (ML) have enabled the distillation of large and complex data sets into predictive models capable of generalizing patterns in the data. However, efforts to merge HTE with ML remain constrained by a few reported datasets with limited structural diversity and corresponding trained models that do not extrapolate well to substrates beyond the training set. Herein, we detail the first ML models for Pd-catalyzed C–N couplings using pharmaceutically relevant structurally diverse large data sets (~ 5000 unique products) generated using nanomole scale compatible chemistry. Careful consideration is given to both the diversity of the data set and accurate model predictions for substrates bearing features beyond those present in the training set. The structural diversity in the data set is enabled by leveraging the Merck & Co., Inc Building Block Collection with an initial focus on C–N coupling using secondary amines. The large dataset enables the systematic evaluation of model performance using five different data-splitting strategies. These five splits are carefully designed to evaluate the model’s ability to extrapolate beyond the substrates in the training set. The accuracy of classification models built with a lens toward application to medicinal chemistry campaigns exceeded the baseline precision-recall by 25-67% depending on the splitting strategy. These results would manifest as significant enrichment of successful C–N couplings using the hits recommended by the models. In addition, the accuracy of the best models for each of the five splits ranges between 70-87% suggesting excellent overall predictivity of the models even for completely unseen substrates.

show abstract

Section: Modellingsupporting

confidence: 88%

Section: Modellingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Roadmap to Pharmaceutically Relevant Reactivity Models Leveraging High-Throughput Experimentation

Kalyani

Struble

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“… 11 − 13 On the other hand, topological descriptors accompanied by nonlinear machine learning (ML) models have sufficient predictive capability when trained on high-throughput experimental (HTE) data. 14 , 15 Although HTE data 16 − 18 provide the opportunity to analyze the comprehensive reaction space with high precision, the exhaustive combinations of substances under uniformly controlled experimental conditions are not usually available in laboratory-scale experiments for novel reaction development. Thus, methods for constructing highly predictive ML models trained on a small number of reactions are highly demanded.…”

Section: Introductionmentioning

confidence: 99%

Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction

Asahara

Miyao

2022

ACS Omega

Self Cite

View full text Add to dashboard Cite

Predicting the outcomes of organic reactions using data-driven approaches aids in the acceleration of research. In laboratory-scale experiments, only a small number of reaction data can be accessed for machine learning model construction, where reaction representations play a pivotal role in the success of model construction. Nevertheless, representation comparison for a small data set is not adequate. Herein, focusing on the enantioselectivity of phosphoric-acid-catalyzed reactions, various two-dimensional and three-dimensional reaction representations (descriptors) were compared. Overall, the concatenated form of the extended connectivity fingerprints showed the best predictive capability for the two types of data sets: high-throughput experimental data and manually collected literature data sets. Furthermore, highlighting the substructure contribution to the prediction outcome was shown to be informative for guiding catalyst development.

show abstract

“…Special attention in the issue was paid to the modeling of reaction properties. Thus, Sato et al [4] reported a deep learning-based descriptor-free model for yield prediction in important for medicinal chemistry Buchwald-Hartwig reaction. Genheden et al [5] proposed an interesting approach for predicting Buchwald-Hartwig reaction conditions, such as ligand, base, solvent, and (pre-)catalyst.…”

mentioning

confidence: 99%

Editorial: Chemical Reactions Mining

Madzhidov

Varnek

2022

Molecular Informatics

View full text Add to dashboard Cite

Tremendous progress of deep learning methods makes possible to generate potent chemical entities which stability and synthetic feasibility need to be estimated. In turn, this requires to answer such questions as (a) how given compounds can be synthesized, (b) under which conditions a given reaction should be carried out, (c) what is estimated rate/yield/selectivity of a given reaction under given conditions, (d) how one can design molecules with controlled synthetic accessibility, (e) what is the reactivity/stability of particular molecule in particular environment or organism?Here, we introduce special issue of Molecular Informatics devoted to chemical reactions mining. It covers a wide variety of topics, from condition prediction to de novo design and tends to answer the above question using chemoinformatics approaches.P. Ertl et al. [1] reviewed some original approaches for the assessment of molecular reactivity and possible molecular transformations in ex vivo and in vivo settings. The article by Gimadiev et al. [2] described reaction data curation and cleaning protocol using open-sourced tools. Lin et al. [3] benchmarked popular atom-to-atom mapping algorithms and proposed an elegant strategy of erroneous mapping correction. Special attention in the issue was paid to the modeling of reaction properties. Thus, Sato et al. [4] reported a deep learning-based descriptor-free model for yield prediction in important for medicinal chemistry Buchwald-Hartwig reaction. Genheden et al. [5] proposed an interesting approach for predicting Buchwald-Hartwig reaction conditions, such as ligand, base, solvent, and (pre-)catalyst. In order to identify the most promising reactants for the Claisen reaction, Okada et al. [6] applied a genetic algorithmbased approach coupled with quantum chemical reaction barrier estimation. Focusing on de novo design of synthetically available molecules, Ghiandoni et al. [7] proposed reaction-based tool RENATE for reaction-based structure generation that can fragment and assemble molecules following a chemical logic.We hope that this special issue wil be of interest of the readers of Molecular Informatics.

show abstract

Prediction of Reaction Yield for Buchwald‐Hartwig Cross‐coupling Reactions Using Deep Learning

Cited by 14 publications

References 37 publications

Roadmap to Pharmaceutically Relevant Reactivity Models Leveraging High-Throughput Experimentation

Roadmap to Pharmaceutically Relevant Reactivity Models Leveraging High-Throughput Experimentation

Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction

Editorial: Chemical Reactions Mining

Contact Info

Product

Resources

About