2021
DOI: 10.1021/acs.jctc.1c00145
|View full text |Cite
|
Sign up to set email alerts
|

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Abstract: We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achieving excellent load balanc… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 21 publications
(20 citation statements)
references
References 67 publications
1
17
0
Order By: Relevance
“…In the last decade, driven by the enticing potential of accelerators, various approaches have been developed for accelerating quantum chemistry and more specifically the Hartree–Fock (HF) method using GPUs. The majority of these implementations focus on the evaluation of two-electron repulsion integrals (ERIs) on GPUs, while few algorithms also include the digestion of ERIs on GPUs, i.e., their contraction with the density matrix to form the Fock matrix. , Even fewer support multi-GPU execution, and apparently, none accelerates all of the main stages of the self-consistent field (SCF) procedure on single or multiple GPUs.…”
Section: Introductionmentioning
confidence: 99%
“…In the last decade, driven by the enticing potential of accelerators, various approaches have been developed for accelerating quantum chemistry and more specifically the Hartree–Fock (HF) method using GPUs. The majority of these implementations focus on the evaluation of two-electron repulsion integrals (ERIs) on GPUs, while few algorithms also include the digestion of ERIs on GPUs, i.e., their contraction with the density matrix to form the Fock matrix. , Even fewer support multi-GPU execution, and apparently, none accelerates all of the main stages of the self-consistent field (SCF) procedure on single or multiple GPUs.…”
Section: Introductionmentioning
confidence: 99%
“…The most direct comparison can be made to results reported by Merz and co-workers on the same processors, NVIDIA Tesla K80 GPUs. They report the parallel efficiency of ERI computation in their QUICK program to be 71% on 16 K80 GPUs for a water system with 2250 basis functions . While the authors use a different integral algorithm (OSHGP , ), molecular system, and basis set (def2-SVP) than the systems we tested, we are interested in comparing workloads and can use a computation taking comparable time in this work: 9 DNA base pairs and 3755 basis functions.…”
Section: Performance and Discussionmentioning
confidence: 99%
“…Schemes to scale GPU implementations to multiple nodes have only emerged in the past few years. 38,45,72,73 Merz and co-workers have extended the capability of their QUICK program, which computes J and the exchange correlation potential, to 4 GPU nodes using an MPI root-worker model. 72 Kussmann and Ochsenfeld have extended their PreLinK scheme for efficiently computing exact exchange to 10 GPU nodes using MPI.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…While efficient parallel schemes for Fock matrix construction have been devised, , even when using high-performance eigensolvers, the diagonalization of large Fock matrices does not achieve a high parallel efficiency. , Therefore, as computer systems dedicated to scientific calculations move toward massively parallel architectures with hundreds to millions of processor cores, the Fock matrix diagonalization becomes increasingly inefficient in taking advantage of the FLOP capabilities of the hardware and a major impediment to achieving larger molecular sizes in SCF calculations.…”
Section: Introductionmentioning
confidence: 99%