2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI) 2019
DOI: 10.1109/exampi49596.2019.00009
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating the Global Arrays ComEx Runtime Using Multiple Progress Ranks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…At present, the coupled-cluster code within TCE can utilize both the CPU and GPU hardware at a massive scale. 32,359 The emergence of many-core processors in the last ten years provided the opportunity for starting a collaborative effort with Intel corporation to optimize NWChem on this new class of computer architecture. As part of this collaboration, the TCE implementation of the CCSD(T) code was ported to the Intel Xeon Phi line of many-core processors 35 using a parallelization strategy based on a hybrid GA-OpenMP approach.…”
Section: Parallel Performancementioning
confidence: 99%
“…At present, the coupled-cluster code within TCE can utilize both the CPU and GPU hardware at a massive scale. 32,359 The emergence of many-core processors in the last ten years provided the opportunity for starting a collaborative effort with Intel corporation to optimize NWChem on this new class of computer architecture. As part of this collaboration, the TCE implementation of the CCSD(T) code was ported to the Intel Xeon Phi line of many-core processors 35 using a parallelization strategy based on a hybrid GA-OpenMP approach.…”
Section: Parallel Performancementioning
confidence: 99%
“…The power of using multiple GPUs has been harnessed into a range of traditional computational chemistry tools, including a range of ab initio electronic structure software packages. However, among these are only a few Gaussian function based quantum chemistry codes for mean-field Hartree–Fock (HF) and density functional theory (DFT) calculations. ,, …”
Section: Introductionmentioning
confidence: 99%
“…A crucial component for performing this very large-scale calculations was the efficient implementation of the global array (GA) operations over the Cray Aries network that connects the processing elements comprising the NERSC Cori parallel computer. This level of parallel performance was achieved by using the progress-rank runtime , that translates GA one-side operations into MPI operations. In order to fully exploit thousands of KNL nodes at once, we had to explore ways to avoid network congestion issues.…”
mentioning
confidence: 99%