Advancing molecular graphs with descriptors for the prediction of chemical reaction yields

Yarish, Dzvenymyra; Garkot, Sofiya; Grygorenko, Oleksandr O.; Radchenko, Dmytro S.; Moroz, Yurii S.; Gurbych, Oleksandr

doi:10.1002/jcc.27016

Cited by 5 publications

(3 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to product prediction and retrosynthesis where the essence is to define and model the changes in the local structure, works have shown that applying representations based on molecular graphs of reactants, reagents, and the products could be beneficial for more reaction tasks, e.g., yield prediction − and condition prediction. , For example, Kwon et al used GNN to embed the reaction into the latent space and then apply variational inference to obtain multiple reaction conditions from the GNN encoded space. Though implementing graphs to represent reactions possesses strength, molecular graphs still have limitations, as they lack critical information such as charges, energies, and steric effects.…”

Section: Mainmentioning

confidence: 99%

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Ding,

Qiang,

Chen

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.

show abstract

Section: Mainmentioning

confidence: 99%

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Ding,

Qiang,

Chen

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…If we are interested in assessing model performance on new molecules, we can train a model with many reaction templates but use substructure splitting to create training, validation, and testing sets. Bemis-Murcko scaffolds [70] are commonly used to partition the data for this purpose, though clustering based on other input features or chemical similarity to measure extrapolation has also been explored [23,[71][72][73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88] as has quantifying domains of model applicability [89][90][91][92][93]. Scaffold splitting is not perfect, but by ensuring that molecules in the testing set are structurally different than those in the training set, it offers a much better assessment of generalizability than splitting randomly [17,24,67,[94][95][96][97][98][99][100][101][102][103][104][105][106][107][108][109]…”

Section: Interpolation Vs Extrapolationmentioning

confidence: 99%

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

Spiekermann,

Stuyver,

Pattanaik

et al. 2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

In a recent article in this journal, van Gerwen et al (2022 Mach. Learn.: Sci. Technol. 3 045005) presented a kernel ridge regression model to predict reaction barrier heights. Here, we comment on the utility of that model and present references and results that contradict several statements made in that article. Our primary interest is to offer a broader perspective by presenting three aspects that are essential for researchers to consider when creating models for chemical kinetics: (1) are the model’s prediction targets and associated errors sufficient for practical applications? (2) Does the model prioritize user-friendly inputs so it is practical for others to integrate into prediction workflows? (3) Does the analysis report performance on both interpolative and more challenging extrapolative data splits so users have a realistic idea of the likely errors in the model’s predictions?

show abstract

“…Yarish et al 91 developed the directed message-passing neural network (RD-MPNN) yield prediction models, which they tested on Enamine's proprietary reaction data. Their binary classification model showed a commendable ROC AUC of 0.78.…”

Section: Journal Of Chemical Information and Modelingmentioning

confidence: 99%

When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges

Voinarovska,

Kabeshov,

Dudenko

et al. 2023

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.

show abstract

Advancing molecular graphs with descriptors for the prediction of chemical reaction yields

Cited by 5 publications

References 77 publications

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges

Contact Info

Product

Resources

About