Evaluation of PETSc on a Heterogeneous Architecture, the OLCF Summit System: Part 1: Vector Node Performance

“…The NIC that connects the node to the parallel network is connected to a programmable "local network" that connects it to the CPU memory as well as the GPU memory. This combination of interconnects means that the parallel communication latency and bandwidth (see the first report [8]) are limited by the NIC, the local network, the NVLinks from the CPU to the local network, and the GPU memory but not the CPU memory. However, CUDA-aware MPI calls (send, receive, and waits) must be called by code running on the CPU cores.…”

Section: The Summit System and Experimental Setupmentioning

confidence: 99%

“…We report on the performance of the Portable, Extensible Toolkit for Scientific Computation (PETSc) [2,3] communication infrastructure using basic Ping-Pong point-to-point communication and regular and irregular nearest-neighbor communication on the IBM/NVIDIA Summit computing system [11] at the Oak Ridge Leadership Computing Facility (OLCF). This report is a continuation of Evaluation of PETSc on a Heterogeneous Architecture, the OLCF Summit System: Part I -Vector Node Performance [8] that introduces the Summit architecture and analyzes simple on-node performance characteristics. This report builds on the previous report's analysis and will not repeat the detailed material.…”

Section: Introductionmentioning

confidence: 99%

Evaluation of PETSc on a Heterogeneous Architecture, the OLCF Summit System: Part II - Basic Communication Performance

Zhang

Mills

Smith

2020

Self Cite

View full text Add to dashboard Cite

Nearest-neighbor communication is at the heart of many high-performance parallel computations. We report on the performance of such communication on the Oak Ridge Leadership Computing Facility system Summit in the context of the PETSc communication module. The analysis in this report includes basic Ping-Pong point-to-point communication and regular and irregular nearest-neighbor communication. We evaluated PETSc communication performance in these patterns. We also discussed various synchronization models when using GPU-aware MPI.

show abstract

Evaluation of PETSc on a Heterogeneous Architecture, the OLCF Summit System: Part 1: Vector Node Performance

Abstract: 60439. For information about Argonne and its pioneering science and technology programs, see www.anl.gov.

Cited by 1 publication

References 5 publications

Toward performance-portable PETSc for GPU-based exascale systems

Toward performance-portable PETSc for GPU-based exascale systems

Evaluation of PETSc on a Heterogeneous Architecture, the OLCF Summit System: Part II - Basic Communication Performance

Contact Info

Product

Resources

About