2020
DOI: 10.1021/acs.jctc.0c00768
|View full text |Cite
|
Sign up to set email alerts
|

High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm

Abstract: We present a high-performance, GPU (graphics processing unit)-accelerated algorithm for building the Fock matrix. The algorithm is designed for efficient calculations on large molecular systems and uses a novel dynamic load balancing scheme that maximizes the GPU throughput and avoids thread divergence that could occur due to integral screening. Additionally, the code adopts a novel ERI digestion algorithm that exploits all forms of permutational symmetry, combines efficiently the evaluation of both Coulomb an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 33 publications
(29 citation statements)
references
References 27 publications
0
29
0
Order By: Relevance
“…The power of multi-GPUs has been harnessed into a range of traditional computational chemistry tools, [3][4][5][6][7][8][9][10][11][12] however, only a handful of ab initio quantum chemical packages [9][10][11][12][13] are among them. Meanwhile, with multi-GPU nodes increasingly becoming common in contemporary supercomputer centers, opensource quantum chemical codes that can fully exploit their power are in demand.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The power of multi-GPUs has been harnessed into a range of traditional computational chemistry tools, [3][4][5][6][7][8][9][10][11][12] however, only a handful of ab initio quantum chemical packages [9][10][11][12][13] are among them. Meanwhile, with multi-GPU nodes increasingly becoming common in contemporary supercomputer centers, opensource quantum chemical codes that can fully exploit their power are in demand.…”
Section: Introductionmentioning
confidence: 99%
“…A second, but a hybrid ERI engine, has been developed by Kussman and Ochsenfeld, 10 and quite recently, a fragmentation based Fock build algorithm with dynamic load balancing has been reported by Gordon and coworkers. 12 In the context of XC parallelization on multi-GPUs, Williams-Young et al 11 has documented a three level parallelization scheme. In such a scheme, the load balancing is achieved by pre-estimating the FLOPs incurred by batches of grid points.…”
Section: Introductionmentioning
confidence: 99%
“…Over the last few decades, an immense research effort directed towards the development of GPU accelerated Gaussian basis set KS-DFT, [268][269][270][271][272][273] a majority of which has been focused on the development of highly efficient algorithms for the evaluation and digestion of the electron repulsion integrals (ERI) required for the formation of the Coulomb and explicit exchange components of the Fock matrix. 61,65,255,268,[274][275][276][277][278][279][280] A major component of this effort in NWChemEx has been the development of an efficient and highly scalable algorithm for the numerical integration of the XC potential on clusters of GPU accelerators. 281 The key component of this algorithm is the reliance on highly tuned, microarchitecture optimized implementations of GPU accelerated batched level-3 BLAS operations such as matrix multiply (GEMM) and symmetric rank-2k updates (SYR2K)…”
Section: Newton-krylov Solver For Coupled-cluster Equationsmentioning
confidence: 99%
“…The HF-3c method uses three separate geometry-dependent formulas 38,39 to add energy corrections ("3c") for the various deficiencies of minimal basis set HF: one to account for some of the missing dispersion interactions, and two to mitigate the effects of basis set incompleteness errors. Several other techniques have been proposed in the literature [40][41][42][43][44][45][46][47][48][49][50][51][52][53][54] , reflecting the interest in developing computationally inexpensive methods for large systems.…”
Section: Introductionmentioning
confidence: 99%