The Massively Parallel Quantum Chemistry (MPQC) program is a 30-year-old project that enables facile development of electronic structure methods for molecules for efficient deployment to massively parallel computing architectures. Here, we describe the historical evolution of MPQC’s design into its latest (fourth) version, the capabilities and modular architecture of today’s MPQC, and how MPQC facilitates rapid composition of new methods as well as its state-of-the-art performance on a variety of commodity and high-end distributed-memory computer platforms.
A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization. Scalability of iterative computation of square-root inverse of block-ranksparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).1 Related matrix data structures have appeared under many names (matrices with decay, H-matrices, rank-structured matrices, and mosaic skeleton approximation), but no single globally-accepted terminology exists. For the history of these types of matrices see Ref [37]. arXiv:1509.00309v2 [cs.DC] 9 Oct 2015 nication costs can be partially or fully hidden by overlapping computation and communication, (b) performance should be less sensitive to topology, latency, and CPU clock variations, (c) fine-grained, task-based parallelism is a proven means to attain high intra-node performance by leveraging massively multicore platforms and hiding the costs of memory hierarchy (e.g. Intel TBB, Cilk), (d) lack of global synchronization allows the overlap multiple, high-level stages of computation (e.g. two or more multiple matrix multiplications contributing to the same expression).The new formulation was used to implement iterative computation of the square root inverse of a matrix, a prototypical operation in which block ranks of intermediate matrices change dynamically during the iteration. The usual advantage of the task formulation, tolerance of load imbalance and latency, are demonstrated in the regime where matrices approach full rank, by comparison against the state-of-the-art dense MM implementations.
Clustered Low Rank (CLR) framework for block-sparse and block-low-rank tensor representation and computation is described. The CLR framework depends on 2 parameters that control precision: one controlling the CLR block rank truncation and another that controls screening of small contributions in arithmetic operations on CLR tensors. As these parameters approach zero CLR representation and arithmetic become exact. There are no other ad-hoc heuristics, such as domains. Use of the CLR format for the order-2 and order-3 tensors that appear in the context of density fitting (DF) evaluation of the Hartree-Fock (exact) exchange significantly reduced the storage and computational complexities below their standard O N 3 and O N 4 figures. Even for relatively small systems and realistic basis sets CLR-based DF HF becomes more efficient than the standard DF approach, and significantly more efficient than the conventional non-DF HF, while negligibly affecting molecular energies and properties.
We present the coupled-cluster singles and doubles method formulated in terms of truncated pair natural orbitals (PNO) that are optimized to minimize the effect of truncation. Compared to the standard ground-state PNO coupled-cluster approaches, in which truncated PNOs derived from first-order Møller-Plesset (MP1) amplitudes are used to compress the CC wave operator, the iteratively optimized PNOs ("iPNOs") offer moderate improvement for small PNO ranks but rapidly increase their effectiveness for large PNO ranks. The error introduced by PNO truncation in the CCSD energy is reduced by orders of magnitude in the asymptotic regime, with an insignificant increase in PNO ranks. The effect of PNO optimization is particularly effective when combined with Neese's perturbative correction for the PNO incompleteness of the CCSD energy. The use of the perturbative correction in combination with the PNO optimization procedure seems to produce the most precise approximation to the canonical CCSD energies for small and large PNO ranks. For the standard benchmark set of noncovalent binding energies, remarkable improvements with respect to the standard PNO approach range from a factor of 3 with PNO truncation threshold τ = 10 (with the maximum PNO truncation error in the binding energy of only 0.1 kcal/mol) to more than 2 orders of magnitude with τ = 10.
The evaluation of the exact [Hartree–Fock (HF)] exchange operator is a crucial ingredient for the accurate description of the electronic structure in periodic systems through ab initio and hybrid density functional approaches. An efficient formulation of periodic HF exchange in a linear combination of atomic orbitals representation presented here is based on the concentric atomic density fitting approximation, a domain-free local density fitting approach in which the product of two atomic orbitals is approximated using a linear combination of fitting basis functions centered at the same nuclei as the AOs in that product. A significant reduction in the computational cost of exact exchange is demonstrated relative to the conventional approach due to avoiding the need to evaluate four-center two-electron integrals, with sub-millihartree/atom errors in absolute HF energies and good cancellation of fitting errors in relative energies. The novel aspects of the evaluation of the Coulomb contribution to the Fock operator, such as the use of real two-center multipole expansions and spheropole-compensated unit cell densities, are also described.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.