2022
DOI: 10.1039/d2cp03966d
|View full text |Cite
|
Sign up to set email alerts
|

nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

Abstract: In this work we present nablaDFT, the new dataset and benchmark for the Density Functional Theory Hamiltonian and energy prediction. We provide data for over 1 million different molecules and over 5 million conformations and baseline models for both tasks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 67 publications
0
6
0
Order By: Relevance
“…So-called semiempirical methods 10 , 11 , such as the Neglect of Diatomic Orbital Overlap (NDDO 12 ) and modern related methods 13 , tight-binding DFT (xTB 14 ), or ‘quantum force fields’ 15 , can greatly improve the scaling of traditional quantum chemistry methods through simplification or approximation of the underlying physics, without restrictions on the domain of applicability imposed by purely data-driven methods. Nonetheless, there is great interest in supporting this vision by taking a purely data-driven approach and developing quantum machine learning (QML) models based on accurate physical methods such as density functional theory (DFT) combined with large and diverse collections of data (e.g., QM9 16 , QMugs 17 , nablaDFT 18 and QM7-X 19 ). Another approach is to devise transfer learning datasets and algorithms that can extract useful patterns from less accurate, but cheaper and more scalable simulations that ultimately benefit predictions at a higher fidelity level 8 , 17 , 20 , 21 .…”
Section: Introductionmentioning
confidence: 99%
“…So-called semiempirical methods 10 , 11 , such as the Neglect of Diatomic Orbital Overlap (NDDO 12 ) and modern related methods 13 , tight-binding DFT (xTB 14 ), or ‘quantum force fields’ 15 , can greatly improve the scaling of traditional quantum chemistry methods through simplification or approximation of the underlying physics, without restrictions on the domain of applicability imposed by purely data-driven methods. Nonetheless, there is great interest in supporting this vision by taking a purely data-driven approach and developing quantum machine learning (QML) models based on accurate physical methods such as density functional theory (DFT) combined with large and diverse collections of data (e.g., QM9 16 , QMugs 17 , nablaDFT 18 and QM7-X 19 ). Another approach is to devise transfer learning datasets and algorithms that can extract useful patterns from less accurate, but cheaper and more scalable simulations that ultimately benefit predictions at a higher fidelity level 8 , 17 , 20 , 21 .…”
Section: Introductionmentioning
confidence: 99%
“…This chemical space is incomplete, as many drug molecules contain phosphorus, sulfur, and halogen atoms, and some contain metal ions. The ANI-2x model was extended to include S, F, and Cl, but the full data set, including the important reference energies and forces at the ωB97X/6-31G* level, to our knowledge, has not yet become publicly available. Currently, there are a number of recent data sets that include compounds that contain phosphorus, sulfur, and halogens at various levels of theory as well as metal ions . Among them, only the SPICE data set includes forces at the ωB97M-D3BJ/def2-TZVPPD level and currently includes over 420K phosphorus, 520K sulfur, 750K halogen, and 8K metal-containing structures.…”
Section: Resultsmentioning
confidence: 99%
“…Several quantum chemistry (QC) data sets have been made publicly available, but some are less suitable for pretraining drug-like molecules. For example, QM9, 54 Alchemy, nablaDFT and QM1B have limited ranges of molecular weights and do not sufficiently cover the chemical space of drug-like molecules. SPICE is more leaned toward application for machine learned potentials.…”
Section: Related Workmentioning
confidence: 99%