The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2023
DOI: 10.1021/acs.jctc.2c00995
|View full text |Cite
|
Sign up to set email alerts
|

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

Abstract: To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures, FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation, the use of multiquantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovations i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 73 publications
0
6
0
Order By: Relevance
“…To optimize the bandwidth, it is necessary to maximize the occupancy, which means minimizing the fast memory footprint. Our approach is to evaluate the 1-index integrals using eq for monotonically decreasing auxiliary indices m , reusing the memory occupied by false[ boldr̃ false] false( m + 1 false) to store false[ boldr̃ false] false( m false) ; this is akin to the in-place evaluation techniques we explored in ref . The prefactors in eq and metadata (maps from index triplets boldr̃ to their ordinals) are independent of m .…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…To optimize the bandwidth, it is necessary to maximize the occupancy, which means minimizing the fast memory footprint. Our approach is to evaluate the 1-index integrals using eq for monotonically decreasing auxiliary indices m , reusing the memory occupied by false[ boldr̃ false] false( m + 1 false) to store false[ boldr̃ false] false( m false) ; this is akin to the in-place evaluation techniques we explored in ref . The prefactors in eq and metadata (maps from index triplets boldr̃ to their ordinals) are independent of m .…”
Section: Methodsmentioning
confidence: 99%
“…Each thread computes a 2-day round-robin range of 2-index integrals, one at a time, to approximately balance the load between threads. By minimizing the memory footprint of false[ boldr̃ false] false( m false) integrals using the in-place evaluation technique of ref , it is possible to evaluate 1-index integrals even for the [ii|ii] integrals using only 23 kB of shared memory. This allows us to assign 4 thread blocks to each SM even on the V100 GPU with a relatively modest amount of shared memory per SM and make the performance of the 2-index integral evaluation less dependent on the hardware details to ensure efficient execution on current and future generations of accelerators.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations