Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2017
DOI: 10.1145/3126908.3126956
|View full text |Cite
|
Sign up to set email alerts
|

An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel®Xeon Phiprocessor

Abstract: Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that di er by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel® Xeon Phi TM processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. e hybrid MPI/OpenMP implementation reduces th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 15 publications
(21 citation statements)
references
References 26 publications
0
21
0
Order By: Relevance
“…In addition, KNL-specific optimizations and tuning are proposed in [63] for seismic computations, with detailed comparisons between KNC and Haswell. In [78], the performance of the hybrid MPI+OpenMP programming paradigm is provided on Theta supercomputer based on KNL compute nodes. The many-core scalability of the edgebased graph coloring algorithm performance is provided in [79], which targets both GPU as well as KNC.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…In addition, KNL-specific optimizations and tuning are proposed in [63] for seismic computations, with detailed comparisons between KNC and Haswell. In [78], the performance of the hybrid MPI+OpenMP programming paradigm is provided on Theta supercomputer based on KNL compute nodes. The many-core scalability of the edgebased graph coloring algorithm performance is provided in [79], which targets both GPU as well as KNC.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…It also eliminates intranode MPI communications. This hybrid MPI+OpenMP parallelization has been recently implemented in GAMESS …”
Section: Methodsmentioning
confidence: 99%
“…This matrix (Equation ) is composed of the internal contribution and ESP. The former is the usual Fock matrix as in non‐FMO calculations, and its parallelization is described elsewhere . Therefore, the description below (Sections 2.2.1 and 2.2.2) focuses on ESP, which is FMO specific, and according to Equation consists from one‐electron and two‐electron contributions.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations