Scalable task-based algorithm for multiplication of block-rank-sparse matrices

Calvin, Justus A.; Lewis, Cannada Andrew; Valeev, Edward F.

doi:10.1145/2833179.2833186

Cited by 32 publications

(38 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea of DF reconstruction in batches of course is not novel; what is novel is how batched DF reconstruction is implemented in the context of a distributed memory contraction

W_{cd}^{ab} τ_{ij}^{cd}

. This contraction is implemented as a matrix multiplication using an asynchronous (task‐based) 2‐dimensional block‐sparse scalable universal matrix multiplication algorithm (SUMMA) described elsewhere . The standard formulation of SUMMA, in which result matrix is stationary, evaluates contributions to the result from each index cd (since in TiledArray index spaces are tiled, the outer loop of SUMMA is over tiles of cd ).…”

Section: Methodsmentioning

confidence: 99%

Coupled‐cluster singles, doubles and perturbative triples with density fitting approximation for massively parallel heterogeneous platforms

Peng

Calvin

Valeev

2019

Int J of Quantum Chemistry

Self Cite

View full text Add to dashboard Cite

A high‐performance implementation of the coupled‐cluster singles, doubles, and perturbative triples [CCSD(T)] is developed in the Massively Parallel Quantum Chemistry program. Novel features include: (1) reduced memory requirements via a density‐fitting (DF) CCSD implementation utilizing distributed lazy evaluation for tensors with more than two unoccupied indices and (2) the ability to utilize efficiently many‐core nodes (Intel Xeon Phi) and heterogeneous nodes with multiple NVIDIA GPUs on each node. All data that are greater than quadratic in the system size are distributed among processes. Excellent strong scaling is observed on distributed‐memory computers equipped with conventional CPUs, Intel Xeon Phi processors, and heterogeneous nodes with multiple NVIDIA GPUs Canonical CCSD(T) energies can be evaluated for systems containing 200 electrons and 1000 basis functions in a few days using a small size commodity cluster, with even larger computations possible on leadership‐class computing resources.

show abstract

“…The idea of DF reconstruction in batches of course is not novel; what is novel is how batched DF reconstruction is implemented in the context of a distributed memory contraction

W_{cd}^{ab} τ_{ij}^{cd}

Section: Methodsmentioning

confidence: 99%

Coupled‐cluster singles, doubles and perturbative triples with density fitting approximation for massively parallel heterogeneous platforms

Peng

Calvin

Valeev

2019

Int J of Quantum Chemistry

Self Cite

View full text Add to dashboard Cite

show abstract

“…In tensor contractions, the data locality is used such that MPI Raccumulate is intra-node while MPI Rget can be inter-node; we made this decision because MPI Raccumulate is typically not implemented at the hardware level unlike MPI Rget and MPI Rput. The index permutation of tensors is currently performed at the destination; further optimization using a scalable universal matrix multiplication algorithm (SUMMA) 29,30 to avoid the repeated permutation operations will be performed in the future.…”

Section: F Code Generator and Parallelizationmentioning

confidence: 99%

Nuclear Energy Gradients for Internally Contracted Complete Active Space Second-Order Perturbation Theory: Multistate Extensions

Vlaisavljevich

Shiozaki

2016

J. Chem. Theory Comput.

117

190

View full text Add to dashboard Cite

We report the development of the theory and computer program for analytical nuclear energy gradients for (extended) multi-state complete active space perturbation theory (CASPT2) with full internal contraction. The vertical shifts are also considered in this work. This is an extension of the fully internally contracted CASPT2 nuclear gradient program, recently developed for a state-specific variant by us [MacLeod and Shiozaki, J. Chem. Phys. 142, 051103 (2015)]; in this extension, the so-called λ equation is solved to account for the variation of the multi-state CASPT2 energies with respect to the change in the amplitudes obtained in the preceding statespecific CASPT2 calculations, and the Z-vector equations are modified accordingly. The program is parallelized using the MPI3 remote memory access protocol that allows us to perform efficient one-sided communication.The optimized geometries of the ground and excited states of a copper corrole and benzophenone are presented as numerical examples. The code is publicly available under the GNU General Public License.

show abstract

“…114 Several computer libraries capable of efficiently evaluating the final sequence of binary contractions have recently become available. [70][71][72][73] We here use a prototyping library developed by one us, which will be described elsewhere.…”

Section: Wick's Theorem and Tensor Contractionsmentioning

confidence: 99%

Combining Internally Contracted States and Matrix Product States To Perform Multireference Perturbation Theory

Sharma

Knizia

Guo

et al. 2017

J. Chem. Theory Comput.

View full text Add to dashboard Cite

We present two efficient and intruder-free methods for treating dynamic correlation on top of general multi-configuration reference wave functions-including such as obtained by the density matrix renormalization group (DMRG) with large active spaces. The new methods are the second order variant of the recently proposed multi-reference linearized coupled cluster method (MRLCC) [S. Sharma, A. Alavi, J. Chem. Phys. 143, 102815 (2015)], and of N-electron valence perturbation theory (NEVPT2), with expected accuracies similar to MRCI+Q and (at least) CASPT2, respectively. Great efficiency gains are realized by representing the first-order wave function with a combination of internal contraction (IC) and matrix product state perturbation theory (MPSPT). With this combination, only third order reduced density matrices (RDMs) are required. Thus, we obviate the need for calculating (or estimating) RDMs of fourth or higher order; these had so far posed a severe bottleneck for dynamic correlation treatments involving the large active spaces accessible to DMRG. Using several benchmark systems, including first and second row containing small molecules, Cr2, pentacene and oxo-Mn(Salen), we shown that active spaces containing at least 30 orbitals can be treated using this method. On a single node, MRLCC2 and NEVPT2 calculations can be performed with over 550 and 1100 virtual orbitals, respectively. We also critically examine the errors incurred due to the three sources of errors introduced in the present implementation -calculating second order instead of third order energy corrections, use of internal contraction and approximations made in the reference wavefunction due to DMRG.

show abstract

Scalable task-based algorithm for multiplication of block-rank-sparse matrices

Cited by 32 publications

References 35 publications

Coupled‐cluster singles, doubles and perturbative triples with density fitting approximation for massively parallel heterogeneous platforms

Coupled‐cluster singles, doubles and perturbative triples with density fitting approximation for massively parallel heterogeneous platforms

Nuclear Energy Gradients for Internally Contracted Complete Active Space Second-Order Perturbation Theory: Multistate Extensions

Combining Internally Contracted States and Matrix Product States To Perform Multireference Perturbation Theory

Contact Info

Product

Resources

About