Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning

Xu, Li‐Cheng; Zhang, Shuo‐Qing; Li, Xin; Tang, Miao‐Jiong; Xie, Pei‐Pei; Hong, Xin

doi:10.1002/anie.202106880

Cited by 30 publications

(21 citation statements)

References 109 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thanks to its hierarchical structure, it is compatible with diverse structural representations (e. g., SMILES, 3D structures), genetic operations and fitness functions. Additional functionalities, including ML‐based acceleration, [44–50] can also be conveniently deployed for the fitness evaluation. While NaviCatGA, as presented here, is a core component of inverse design efforts in catalysis, it also constitutes a powerful stand‐alone program for general optimization problems.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Genetic Optimization of Homogeneous Catalysts

2022

View full text Add to dashboard Cite

We present the NaviCatGA package, a versatile genetic algorithm capable of optimizing molecular catalyst structures using well‐suited fitness functions to achieve a set of targeted properties. The flexibility and generality of this tool are validated and demonstrated with two examples: i) Ligand optimization and exploration for Ni‐catalyzed aryl‐ether cleavage manipulating SMILES and using a fitness function derived from molecular volcano plots, ii) multi‐objective (i. e., activity/selectivity) optimization of bipyridine N,N‐dioxide Lewis basic organocatalysts for the asymmetric propargylation of benzaldehyde from 3D molecular fragments. We show that evolutionary optimization, enabled by NaviCatGA, is an efficient way of accelerating catalyst discovery through bypassing combinatorial scaling issues and incorporating compelling chemical constraints.

show abstract

Section: Discussionmentioning

confidence: 99%

“…based acceleration, [44][45][46][47][48][49][50] can also be conveniently deployed for the fitness evaluation. While NaviCatGA, as presented here, is a core component of inverse design efforts in catalysis, it also constitutes a powerful stand-alone program for general optimization problems.…”

Section: Chemistry-methodsmentioning

confidence: 99%

Genetic Optimization of Homogeneous Catalysts

2022

View full text Add to dashboard Cite

show abstract

“…We recently built a database of asymmetric hydrogenation of olefins (12619 enantioselectivities) based on experimentation literature between the years 2000 and 2020. [45] In addition to the literature data, the reaction data schemes from US patents were extracted as the USPTO reaction database via text-mining techniques by NextMove. [46,47] However, it is noteworthy that, based on a recent study, there may be some inherent potential problems in the data source.…”

Section: Chemistry-a European Journalmentioning

confidence: 99%

“…Prime examples of this strategy include Doyle's database of Ullman–Goldberg/Buchwald–Hartwig cross‐couplings (4140 reaction yields) [18] and Denmark's database of asymmetric imine addition (1075 enantioselectivities), [20] which have now been widely applied as benchmark databases for ML of synthesis/catalysis performance. We recently built a database of asymmetric hydrogenation of olefins (12619 enantioselectivities) based on experimentation literature between the years 2000 and 2020 [45] . In addition to the literature data, the reaction data schemes from US patents were extracted as the USPTO reaction database via text‐mining techniques by NextMove [46,47] .…”

Section: Introductionmentioning

confidence: 99%

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis

Zhang

et al. 2022

Chemistry A European J

Self Cite

View full text Add to dashboard Cite

Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.

show abstract

“…To ensure statistical relevance, the input data should cover a wide range of outcomes, assuring that they properly represent the system under investigation. This can be a challenge when the synthetic chemistry literature is mined, as it is not common practice to report results of failed, unselective, or low-yielding reactions. ,,− Consequently, when a mechanistic approach is adopted on the basis of statistics, it is often the case that data mining the literature is not enough and data sets need to be augmented experimentally. In 2018, the Doyle group started investigating HTE data sets for the predictions of reaction yields .…”

Section: Weight Of Parametersmentioning

confidence: 99%

Mechanistic Inference from Statistical Models at Different Data-Size Regimes

Lustosa

Milo

2022

ACS Catal.

View full text Add to dashboard Cite

The chemical sciences are witnessing an influx of statistics into the catalysis literature. These developments are propelled by modern technological advancements that are leading to fast and reliable data production, mining, and management. In organic chemistry, models encoded with information-rich parameters have facilitated the formulation of mechanistic hypotheses across different data-size regimes. Herein, we aim to demonstrate through selected examples that the integration of statistical principles into homogeneous catalysis can streamline not only reaction optimization protocols but also mechanistic investigation procedures. Namely, we highlight how different aspects of molecular modeling, data set design, data visualization, and nuanced data restructuring can contribute to improving chemical reactivity and selectivity, while furthering our understanding of reaction mechanisms. By mapping out these techniques at different data set sizes, we hope to encourage the broad application of data-driven approaches for mechanistic studies regardless of the accessible amount of data.

show abstract

Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning

Cited by 30 publications

References 109 publications

Genetic Optimization of Homogeneous Catalysts

Genetic Optimization of Homogeneous Catalysts

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis

Mechanistic Inference from Statistical Models at Different Data-Size Regimes

Contact Info

Product

Resources

About