Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

Asadchev, Andrey; Valeev, Edward F.

doi:10.1021/acs.jctc.2c00995

Cited by 6 publications

(6 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To optimize the bandwidth, it is necessary to maximize the occupancy, which means minimizing the fast memory footprint. Our approach is to evaluate the 1-index integrals using eq for monotonically decreasing auxiliary indices m , reusing the memory occupied by false[ boldr̃ false] false( m + 1 false) to store false[ boldr̃ false] false( m false) ; this is akin to the in-place evaluation techniques we explored in ref . The prefactors in eq and metadata (maps from index triplets boldr̃ to their ordinals) are independent of m .…”

Section: Methodsmentioning

confidence: 99%

“…Each thread computes a 2-day round-robin range of 2-index integrals, one at a time, to approximately balance the load between threads. By minimizing the memory footprint of false[ boldr̃ false] false( m false) integrals using the in-place evaluation technique of ref , it is possible to evaluate 1-index integrals even for the [ii|ii] integrals using only 23 kB of shared memory. This allows us to assign 4 thread blocks to each SM even on the V100 GPU with a relatively modest amount of shared memory per SM and make the performance of the 2-index integral evaluation less dependent on the hardware details to ensure efficient execution on current and future generations of accelerators.…”

Section: Methodsmentioning

confidence: 99%

“…To make the performance analysis easier and more meaningful, we focus here on microbenchmarking the integral kernels, i.e., we analyze their performance for specific integral classes rather than computing, e.g., the entire Fock operator matrix and/or the entire set of integrals for a given problem. While microbenchmarking is less common, ,, it provides a more detailed model of performance by removing extra details (e.g., screening) that can greatly influence performance of integration benchmarks. We strongly encourage others to follow suit.…”

Section: Performancementioning

confidence: 99%

“…An even more serious issue is the high memory footprint of such kernels that exceeds the size of the lowest levels of memory hierarchy (registers and scratchpad memory) even for relatively low angular momenta, thereby reducing the performance. Although detailed performance can be difficult to extract from the numerous publications dedicated to Gaussian AO integral evaluation on GPUs, − the performance of the Head-Gordon–Pople , refinement of the Obara-Saika recurrence scheme implemented by Barca et al is in our experience representative: whereas the performance for 4-center integrals of low total angular momenta (up to (pp|pp), with s, p, d, f, g, h, i, k... denoting Gaussian AOs with angular momenta l = 0, 1, 2, 3, 4, 5, 6, 7..., respectively), was found to reach a substantial (20–50%) fraction of the peak FP64 FLOP rate, the performance for higher angular momenta dropped rapidly to 2% of the peak rate for the (dd|dd) integrals. Another, albeit a less direct, datapoint comes from a study by Johnson et al who observed significant loss of efficiency of the GPU code for the Coulomb matrix evaluation (using McMurchie-Davidson recurrence-based formalism) vs the CPU counterpart as the basis set is enlarged to include higher angular momenta.…”

Section: Introductionmentioning

confidence: 98%

“…Recently, we reconsidered the design of Gaussian AO integral algorithms in order to optimize their memory footprint . For the specific case of 3-center Gaussian AO integrals, we argued that even for high angular momenta the Obara-Saika recurrence-based schemes , would outperform the Rys quadrature , commonly thought to lead to optimally compact memory footprints; however, even with several algorithmic and programming innovations the performance was reasonable for integrals up to (ff|f) but dropped for higher angular momenta to only a few percent of the hardware peak.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

High-Performance Evaluation of High Angular Momentum 4-Center Gaussian Integrals on Modern Accelerated Processors

Asadchev,

Valeev

2023

J. Phys. Chem. A

Self Cite

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Performancementioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

High-Performance Evaluation of High Angular Momentum 4-Center Gaussian Integrals on Modern Accelerated Processors

Asadchev,

Valeev

2023

J. Phys. Chem. A

Self Cite

View full text Add to dashboard Cite

Distributed memory, GPU accelerated Fock construction for hybrid, Gaussian basis density functional theory

Williams‐Young

Asadchev

Popovici

et al. 2023

The Journal of Chemical Physics

View full text Add to dashboard Cite

With the growing reliance of modern supercomputers on accelerator-based architecture such a graphics processing units (GPUs), the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development GPU accelerated, distributed memory algorithms for many modern electronic structure methods, the primary focus of GPU development for Gaussian basis atomic orbital methods has been for shared memory systems with only a handful of examples pursing massive parallelism. In the present work, we present a set of distributed memory algorithms for the evaluation of the Coulomb and exact exchange matrices for hybrid Kohn–Sham DFT with Gaussian basis sets via direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods, respectively. The absolute performance and strong scalability of the developed methods are demonstrated on systems ranging from a few hundred to over one thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter supercomputer.

show abstract

A call to arms: Making the case for more reusable libraries

Lehtola

2023

The Journal of Chemical Physics

View full text Add to dashboard Cite

The traditional foundation of science lies on the cornerstones of theory and experiment. Theory is used to explain experiment, which in turn guides the development of theory. Since the advent of computers and the development of computational algorithms, computation has risen as the third cornerstone of science, joining theory and experiment on an equal footing. Computation has become an essential part of modern science, amending experiment by enabling accurate comparison of complicated theories to sophisticated experiments, as well as guiding by triage both the design and targets of experiments and the development of novel theories and computational methods. Like experiment, computation relies on continued investment in infrastructure: it requires both hardware (the physical computer on which the calculation is run) as well as software (the source code of the programs that performs the wanted simulations). In this Perspective, I discuss present-day challenges on the software side in computational chemistry, which arise from the fast-paced development of algorithms, programming models, as well as hardware. I argue that many of these challenges could be solved with reusable open source libraries, which are a public good, enhance the reproducibility of science, and accelerate the development and availability of state-of-the-art methods and improved software.

show abstract

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

Cited by 6 publications

References 73 publications

High-Performance Evaluation of High Angular Momentum 4-Center Gaussian Integrals on Modern Accelerated Processors

High-Performance Evaluation of High Angular Momentum 4-Center Gaussian Integrals on Modern Accelerated Processors

Distributed memory, GPU accelerated Fock construction for hybrid, Gaussian basis density functional theory

A call to arms: Making the case for more reusable libraries

Contact Info

Product

Resources

About