Accelerating the Global Arrays ComEx Runtime Using Multiple Progress Ranks

Gawande, Nitin; Kowalski, Karol; Palmer, Bruce; Krishnamoorthy, Sriram; Aprà, Edoardo; Manzano, Joseph; Amatya, Vinay; Crawford, Jonathan

doi:10.1109/exampi49596.2019.00009

Cited by 3 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At present, the coupled-cluster code within TCE can utilize both the CPU and GPU hardware at a massive scale. 32,359 The emergence of many-core processors in the last ten years provided the opportunity for starting a collaborative effort with Intel corporation to optimize NWChem on this new class of computer architecture. As part of this collaboration, the TCE implementation of the CCSD(T) code was ported to the Intel Xeon Phi line of many-core processors 35 using a parallelization strategy based on a hybrid GA-OpenMP approach.…”

Section: Parallel Performancementioning

confidence: 99%

NWChem: Past, present, and future

Aprà

Bylaska

Jong

et al. 2020

The Journal of Chemical Physics

516

289

View full text Add to dashboard Cite

Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction of atomistic and electronic properties. In this regard, electronic structure packages have played a special role by using first-principle-driven methodologies to model complex chemical and materials processes. Over the past few decades, the rapid development of computing technologies and the tremendous increase in computational power have offered a unique chance to study complex transformations using sophisticated and predictive many-body techniques that describe correlated behavior of electrons in molecular and condensed phase systems at different levels of theory. In enabling these simulations, novel parallel algorithms have been able to take advantage of computational resources to address the polynomial scaling of electronic structure methods. In this paper, we briefly review the NWChem computational chemistry suite, including its history, design principles, parallel tools, current capabilities, outreach, and outlook.

show abstract

Section: Parallel Performancementioning

confidence: 99%

NWChem: Past, present, and future

Aprà

Bylaska

Jong

et al. 2020

The Journal of Chemical Physics

516

289

View full text Add to dashboard Cite

show abstract

“…The power of using multiple GPUs has been harnessed into a range of traditional computational chemistry tools, − including a range of ab initio electronic structure software packages. − However, among these are only a few Gaussian function based quantum chemistry codes for mean-field Hartree–Fock (HF) and density functional theory (DFT) calculations. − ,,− …”

Section: Introductionmentioning

confidence: 99%

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Manathunga

Jin

Cruzeiro

et al. 2021

J. Chem. Theory Comput.

View full text Add to dashboard Cite

We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achieving excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular systems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have realized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.

show abstract

“…A crucial component for performing this very large-scale calculations was the efficient implementation of the global array (GA) operations over the Cray Aries network that connects the processing elements comprising the NERSC Cori parallel computer. This level of parallel performance was achieved by using the progress-rank runtime , that translates GA one-side operations into MPI operations. In order to fully exploit thousands of KNL nodes at once, we had to explore ways to avoid network congestion issues.…”

mentioning

confidence: 99%

Guest–Host Interactions in Clathrate Hydrates: Benchmark MP2 and CCSD(T)/CBS Binding Energies of CH₄, CO₂, and H₂S in (H₂O)₂₀Cages

Heindel

Herman

Aprà

et al. 2021

J. Phys. Chem. Lett.

View full text Add to dashboard Cite

We present benchmark binding energies of naturally occurring gas molecules CH 4 , CO 2 , and H 2 S in the small cage, namely, the pentagonal dodecahedron (5 12 ) (H 2 O) 20 , which is one of the constituent cages of the 3 major lattices (structures I, II, and H) of clathrate hydrates. These weak interactions require higher levels of electron correlation and converge slowly with an increasing basis set to the complete basis set (CBS) limit, necessitating the use of large basis sets up to the aug-cc-pV5Z and subsequent correction for basis set superposition error (BSSE). For the host hollow (H 2 O) 20 cages, we have identified a most stable isomer with binding energy of −200.8 ± 2.1 kcal/mol at the CCSD(T)/CBS limit (−199.2 ± 0.5 kcal/mol at the MP2/CBS limit). Additionally, we report converged second order Møller−Plesset (MP2) CBS binding energies for the encapsulation of guests in the (H 2 O) 20 cage of −4.3 ± 0.1 for CH 4 @(H 2 O) 20 , −6.6 ± 0.1 for CO 2 @(H 2 O) 20 , and −8.5 ± 0.1 kcal/mol for H 2 S@(H 2 O) 20 , respectively. For CH 4 @(H 2 O) 20 , exhibiting the weakest encapsulation affinity among the three, we report CCSD(T)/aug-cc-pVTZ binding energies and, based on them, a CCSD(T)/CBS estimate of −4.75 ± 0.1 kcal/mol. To the best of our knowledge, the CCSD(T)/aug-cc-pVTZ calculation for CH 4 @(H 2 O) 20 is the largest one reported to date (168 valence electrons, 1978 basis functions, and the correlation of 84 doubly occupied and 1873 virtual orbitals) and required a scalable implementation of the (T) module on 6144 nodes (350 208 cores) of the "Cori" supercomputer at the National Energy Research Supercomputing Center (NERSC) for a total execution time of 195 min (for the (T) part). These efficient scalable implementations of highly correlated methods offer the capability to obtain long-lasting benchmarks of intermolecular interactions in complex systems. They also provide a path toward parametrizing classical potentials needed to study the dynamical and transport properties in these complex systems as well as assess the accuracy of lower scaling electronic structure methods such as density functional theory (DFT) and MP2 including its spin-biased variants.

show abstract

Accelerating the Global Arrays ComEx Runtime Using Multiple Progress Ranks

Cited by 3 publications

References 22 publications

NWChem: Past, present, and future

NWChem: Past, present, and future

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Guest–Host Interactions in Clathrate Hydrates: Benchmark MP2 and CCSD(T)/CBS Binding Energies of CH₄, CO₂, and H₂S in (H₂O)₂₀Cages

Contact Info

Product

Resources

About

Accelerating the Global Arrays ComEx Runtime Using Multiple Progress Ranks

Cited by 3 publications

References 22 publications

NWChem: Past, present, and future

NWChem: Past, present, and future

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Guest–Host Interactions in Clathrate Hydrates: Benchmark MP2 and CCSD(T)/CBS Binding Energies of CH4, CO2, and H2S in (H2O)20Cages

Contact Info

Product

Resources

About

Guest–Host Interactions in Clathrate Hydrates: Benchmark MP2 and CCSD(T)/CBS Binding Energies of CH₄, CO₂, and H₂S in (H₂O)₂₀Cages