2020
DOI: 10.1021/acs.jctc.0c00236
|View full text |Cite
|
Sign up to set email alerts
|

Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning

Abstract: Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(26 citation statements)
references
References 65 publications
(132 reference statements)
0
26
0
Order By: Relevance
“…The former relies heavily on chemical intuition by requiring the user to choose which atomic or molecular attributes are important for the problem at hand. Recent developments from our group proposed a class of fragmentation-based representations termed ML­(CBH) or simply MLCBH, in which a system is broken apart into smaller fragments based on the generalized isodesmic schemes of the Connectivity-Based Hierarchy. , CBH reactions are characterized by deconstructing the molecule into smaller n diameter fragments, corresponding to the n th rung on CBH, as well as their overlaps, to satisfy the inclusion–exclusion principle. Once the full reaction scheme is constructed, the coefficients of the fragments along with their overlaps are multi-hot encoded into a vector of all possible fragments (Figure b).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The former relies heavily on chemical intuition by requiring the user to choose which atomic or molecular attributes are important for the problem at hand. Recent developments from our group proposed a class of fragmentation-based representations termed ML­(CBH) or simply MLCBH, in which a system is broken apart into smaller fragments based on the generalized isodesmic schemes of the Connectivity-Based Hierarchy. , CBH reactions are characterized by deconstructing the molecule into smaller n diameter fragments, corresponding to the n th rung on CBH, as well as their overlaps, to satisfy the inclusion–exclusion principle. Once the full reaction scheme is constructed, the coefficients of the fragments along with their overlaps are multi-hot encoded into a vector of all possible fragments (Figure b).…”
Section: Methodsmentioning
confidence: 99%
“…Nevertheless, many of the most successful representations thus far have been designed to replace QM through standard ML techniques by describing the composition and the 3D structure of a chemical system as either an encoded vector through popular fingerprinting algorithms, such as Morgan FP or ECFP, or as a higher dimensional tensor through machine learning interatomic potentials. , Such models are typically benchmarked against large datasets of DFT-calculated properties. Although these ML models can achieve mean absolute errors (MAEs) below the threshold of “chemical accuracy” (∼1 kcal/mol), the reference values being reproduced (typically DFT) are still significantly inaccurate compared to experiments or more sophisticated CCSD­(T)-based cWFTs such as G4 or G4­(MP2). …”
Section: Introductionmentioning
confidence: 99%
“…The rise of powerful machine learning (ML) methods and hardware has provided tools for chemists to approach the problem of accurate enthalpy calculations from a completely different, data-driven, angle exploiting the fact that ML calculations practically come at no cost compared to high-level QM methods. A series of special-purpose ML approaches were suggested either to specifically target prediction of enthalpies of formation directly , or by correcting predictions of a baseline lower-level, DFT, Hartree–Fock, or SQM method to experimental data , , or higher-level QM methods such as G4 and G4MP2 . Impressively, many of these special-purpose ML approaches were reported to come close to reaching chemical accuracy, but all of them are based on correcting DFT-level predictions, which strongly limits their computational efficiency.…”
Section: Uncertainty Quantificationmentioning
confidence: 99%
“…Indeed, the development of new descriptors and techniques to represent chemical information for specialized use cases is subject of ongoing research [497], [498], [499], [500], [501], [502], [503], [504], [505], [506]. For example, Ref.…”
Section: E Technical Aspectsmentioning
confidence: 99%