Abstract:Neural network-based first-principles method for predicting heat of formation (HOF) was previously demonstrated to be able to achieve chemical accuracy in a broad spectrum of target molecules [L. H. Hu et al., J. Chem. Phys. 119, 11501 (2003)]. However, its accuracy deteriorates with the increase in molecular size. A closer inspection reveals a systematic correlation between the prediction error and the molecular size, which appears correctable by further statistical analysis, calling for a more sophisticated … Show more
“…Obviously, in all these specialpurpose approaches, ML's role is relegated to only predicting or improving the prediction of enthalpy of formation for a given chemical structure, and some ML-based approaches have the further limitation that they were developed only for certain classes of compounds such as acyclic hydrocarbons, 33 cyclic hydrocarbons, 34 energetic materials, 8 or fuels. 39 Such specialpurpose ML approaches also rely on molecular structures and other descriptors derived from structures, which are provided to a ML model, with the consequence that the ML model itself can neither generate a new molecular geometry nor improve upon it. An alternative to both QM and special-purpose ML approaches comes from a parallel development of general-purpose, data-driven methods based on ML, which target accurate predictions of QM potential energies for a wide range of compounds and can be used as a drop-in replacement for QM or force-field methods in many simulations such as molecular dynamics and geometry optimizations.…”
Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI-1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such datadriven methods.
“…Obviously, in all these specialpurpose approaches, ML's role is relegated to only predicting or improving the prediction of enthalpy of formation for a given chemical structure, and some ML-based approaches have the further limitation that they were developed only for certain classes of compounds such as acyclic hydrocarbons, 33 cyclic hydrocarbons, 34 energetic materials, 8 or fuels. 39 Such specialpurpose ML approaches also rely on molecular structures and other descriptors derived from structures, which are provided to a ML model, with the consequence that the ML model itself can neither generate a new molecular geometry nor improve upon it. An alternative to both QM and special-purpose ML approaches comes from a parallel development of general-purpose, data-driven methods based on ML, which target accurate predictions of QM potential energies for a wide range of compounds and can be used as a drop-in replacement for QM or force-field methods in many simulations such as molecular dynamics and geometry optimizations.…”
Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI-1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such datadriven methods.
“…Simultaneously, machine learning (ML) has been added to the quantum chemical toolbox, ,,,,− leading to a significant decrease in the computational cost and/or increase in the accuracy of the corresponding calculated properties. The success of a given ML model depends on its chosen set of molecular descriptors, as the representation must fully describe patterns in the desired output values.…”
Section: Machine Learning Models In Thermochemistrymentioning
Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion−exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small-to medium-sized organic HCNOSClcontaining molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.
“…Yang et al 30 introduce a size-independent NN model of heats of formation trained on small organic molecules that can be applied to large molecules. For these, the MAE from reference B3LYP numbers is reduced to 1.7 kcal/mol.…”
Section: A Prediction Of Energies and Other Properties Throughout Chmentioning
A survey of the contributions to the Special Topic on Data-enabled Theoretical Chemistry is given, including a glossary of relevant machine learning terms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.