Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: Given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between simplified molecular-input line-entry system (SMILES) strings (a text-based representation) of reactants, reagents, and the products. We show that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set. Molecular Transformer makes predictions by inferring the correlations between the presence and absence of chemical motifs in the reactant, reagent, and product present in the data set. Our model requires no handcrafted rules and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without a reactant–reagent split and including stereochemistry, which makes our method universally applicable.
Using a text-based representation of molecules, chemical reactions are predicted with a neural machine translation model borrowed from language processing.
<div><div><div><p>Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.</p></div></div></div>
The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure–property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing data sets in catalysis can be used to accelerate ligand selection during reaction optimization.
<div><div><div><p>Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other works, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.</p></div></div></div>
Quantitative structure−property relationships (QSPRs) are increasingly used for the prediction of physicochemical properties of pure compounds, but only a few have been developed to predict the properties of mixtures. In this work, a series of existing and new formulas were proposed to derive mixture descriptors for the development of QSPR models for mixtures. These mixture descriptors were used to model the flash points of a series of 435 organic mixture compositions. Multilinear models were obtained using 12 different mathematical formulas, taking into account the linear or nonlinear dependence of the flash point on the concentration of each compound. The best model, issued from the newly proposed (x 1 d 1 + x 2 d 2 ) 2 formula, was a four-parameter model presenting good prediction capabilities (with a mean absolute error in prediction of 10.3°C) compared with existing predictive methods for both mixtures and pure compounds. ■ INTRODUCTIONQuantitative structure−property relationships (QSPRs) are predictive models that allow the prediction of macroscopic properties by correlation with descriptors of the molecular structures of chemicals. 1 These molecular descriptors belong to various categories: 1,2 constitutional, topological, geometrical, or quantum-chemical. Such methods have been largely used for biological activities in the fields of toxicology, 3 ecotoxicology, 4 and pharmaceutics 5,6 and are increasingly used for physicochemical properties. 7−9Various models have been developed for hazardous physicochemical properties 10−19 such as flammability, 10,11 thermal stability, 12−14 and explosibility. 15−19 To date, the QSPR approach has mostly been dedicated to pure compounds, and only a few recent works have been dedicated to mixtures. 20 Ajmani and co-workers proposed various models to predict the densities 21,22 and infinite-dilution activity coefficients 23 of binary mixtures. In these studies, the molecular descriptors for each pure compound were combined, e.g., by mole-weighted averaging, 24 to derive mixture descriptors. These mixture descriptors were then correlated to the property of the studied mixtures. Several studies have also been dedicated to azeotropic mixtures. 25−29 In particular, Oprisiu et al. 29 developed several QSPR models to predict the boiling points 25,28 of azeotropic binary mixtures on the basis of fragment descriptors.The flash point (FP) is the temperature at which the vapor above a flammable liquid ignites under the effect of a spark. 30 This property characterizes flammability hazards of liquids and is a key safety issue in the risk assessment of industrial processes and in various regulatory frameworks dedicated to chemicals (for use, storage, and transport). 31,32 The flash points of pure compounds have been studied in several works with the aim of developing predictive methods taking advantage of the large availability of data. 33,34 Among them, many are based on knowledge of other properties such as the boiling point. 24,35,36 The highest performance was obtained by Carroll ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.