Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2021
DOI: 10.1145/3458817.3476222
|View full text |Cite
|
Sign up to set email alerts
|

Enabling large-scale correlated electronic structure calculations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
22
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 13 publications
(22 citation statements)
references
References 37 publications
0
22
0
Order By: Relevance
“…For GPU implementations, it is important that the integrals are never stored in the GPU global memory, as this leads to a large limitation of resources and a high number of read operations that are inherently slow. The Fock build described in refs and , which uses the aforementioned integral routines, uses an atomic-operation-oriented algorithm for digesting the integrals into the Fock matrix, thereby keeping synchronization at a minimum. This algorithm was implemented and benchmarked against state-of-the-art programs such as QUICK and Terachem (), showing promising speedups on the NVIDIA V100 architecture. , Figure shows speedups against Terachem and QUICK using the same benchmark systems as those for the ERIs.…”
Section: Graphical Processing Unitsmentioning
confidence: 99%
“…For GPU implementations, it is important that the integrals are never stored in the GPU global memory, as this leads to a large limitation of resources and a high number of read operations that are inherently slow. The Fock build described in refs and , which uses the aforementioned integral routines, uses an atomic-operation-oriented algorithm for digesting the integrals into the Fock matrix, thereby keeping synchronization at a minimum. This algorithm was implemented and benchmarked against state-of-the-art programs such as QUICK and Terachem (), showing promising speedups on the NVIDIA V100 architecture. , Figure shows speedups against Terachem and QUICK using the same benchmark systems as those for the ERIs.…”
Section: Graphical Processing Unitsmentioning
confidence: 99%
“…For this reason, more efficient algorithms and approximate implementations have been developed to improve the scaling of both RPA and MP2. Common strategies are the usage of localized orbitals, cluster-in-molecule (CIM) approaches, , or implementations which rely on sparsity in the atomic orbital basis. In the latter class of methods, implementations using local DF approximations have gained increasing popularity. ,, While they do not achieve linear scaling with systems sizes, they typically come with a very small prefactor and are believed to only introduce minor errors compared to canonical, molecular orbital based implementations. , …”
Section: Introductionmentioning
confidence: 99%
“…For this reason, more efficient algorithms and approximate implementations have been developed to improve the scaling of both RPA and MP2. Common strategies are the usage of localized orbitals, 66−70 cluster-in-molecule (CIM) approaches, 71,72 or implementations which rely on sparsity in the atomic orbital basis. 73−90 In the latter class of methods, implementations using local DF approximations have gained increasing popularity.…”
Section: Introductionmentioning
confidence: 99%
“…All of these factors make it especially challenging to port Gaussian integral kernels onto accelerated coprocessors, such as general-purpose graphical processing units (GPGPUs, or, simply, GPUs), that have become the norm both on the commodity and high-end platforms. Hence there has been an intense effort to address these challenges, both on the modern central processing units (CPUs) with wide single-instruction-multiple-data (SIMD) instructions and on GPUs. ,,,, …”
Section: Introductionmentioning
confidence: 99%