A minimal subnetwork is extracted from a very complex full network upon exploring the reaction pathways connecting reactants and products with minimum dissociation and formation of chemical bonds. Such a process reduces computational cost and correctly predicts the pathway for two representative reactions.
Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol −1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.
Basin-hopping sampling has been widely used for searching local minima on a potential energy surface. Reaction intermediates including reactants and products are also local minima composed of a reaction path, but their brute-force sampling is too demanding because of large degrees of freedom. We developed an efficient Monte Carlo basin-hopping method to sample reaction intermediates through the fragmentation of molecules and a postanalysis scheme using the graph theory with a matrix representation of molecular structures. The former greatly reduces the dimension of a given potential energy surface, while the latter offers not only the effective screening of resulting local minima toward desirable intermediates but also their automatic ordering along a reaction path. We combined it with the density functional tight binding method for rapid calculations and tested its performance for organic reactions.
Machine learning based on big data has emerged as a powerful solution in various chemical problems. We investigated the feasibility of machine learning models for the prediction of activation energies of gas-phase reactions. Six different models with three different types, including the artificial neural network, the support vector regression, and the tree boosting methods, were tested. We used the structural and thermodynamic properties of molecules and their differences as input features without resorting to specific reaction types so as to maintain the most general input form for broad applicability. The tree boosting method showed the best performance among others in terms of the coefficient of determination, mean absolute error, and root mean square error, the values of which were 0.89, 1.95, and 4.49 kcal mol , respectively. Computation time for the prediction of activation energies for 2541 test reactions was about one second on a single computing node without using accelerators.
The stabilities of radicals play a central role in determining the thermodynamics and kinetics of many reactions in organic chemistry. In this data descriptor, we provide consistent and validated quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules containing C, H, N and O atoms. These data consist of optimized 3D geometries, enthalpies, Gibbs free energy, vibrational frequencies, Mulliken charges and spin densities calculated at the M06-2X/def2-TZVP level of theory, which was previously found to have a favorable trade-off between experimental accuracy and computational efficiency. We expect this data to be useful in the further development of machine learning techniques to predict reaction pathways, bond strengths, and other phenomena closely related to organic radical chemistry.
We present a powerful method for the conversion of molecular structures from atomic connectivity to bond orders to three-dimensional (3D) geometries. There are a number of bond orders and 3D geometries corresponding to a given atomic connectivity. To uniquely determine an energetically more favorable one among them, we use general chemical rules without invoking any empirical parameter, which makes our method valid for any organic molecule. Specifically, we first assign a proper bond order to each atomic pair in the atomic connectivity so as to maximize their sum and the result is converted to a SMILES notation using graph theory. The corresponding 3D geometry is then obtained using force field or ab initio calculations. This method successfully reproduced the bond order matrices and 3D geometries of 10 000 molecules randomly sampled from the PubChem database with high success rates of near 100% except a few exceptional cases. As an application, we demonstrate that it can be used to search for molecular isomers efficiently.
Hydroxymethylfurfural
(HMF) is one of the important renewable platform
compounds that can be obtained from biomass feedstocks through glucose
conversion catalyzed by Brønsted and Lewis acids. However, it
is challenging to enhance the HMF yield due to side reactions. In
this study, a systematic approach combining theory and experiment
was performed to investigate the influence of Lewis acids and organic
solvents on the HMF yield. For the Lewis acid effect, a relationship
between chemical hardness and experimental HMF yields was found in
the rate-limiting step of glucose-to-fructose isomerization for six
metal chlorides; HMF production was promoted when the metal chloride
and a substrate had a similar chemical hardness. To study the organic
solvent effect, a multivariate model was developed based on the insights
gained from the mechanistic study of fructose dehydration, to predict
HMF yields in a given water-organic cosolvent system. It showed a
reliable accuracy in evaluating HMF yields with a mean absolute error
(MAE) of 3.0% with respect to experimental HMF yields for 13 solvents,
and also predicted HMF yields with a MAE of 10.7% for four new solvents.
Chemical interpretation of the model revealed that it is desirable
to use a solvent capable of stabilizing the carbocation intermediates
with low proton transfer activity and high hydrogen bond basicity,
to maximize the HMF yield. This multivariate model informs experimentalists
about rational selection of solvents with very low computational costs
needed to calculate only six variables for each solvent. It can be
expanded to other catalytic systems such as heterogeneous Brønsted–Lewis
bifunctional catalysts and enables optimization of reaction conditions
to obtain other useful platform molecules through biomass conversion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.