On the efficacy of GPU-integrated MPI for scientific applications

Aji, Ashwin M.; Panwar, Lokendra S.; Ji, Feng; Chabbi, Milind; Murthy, Karthik; Balaji, Pavan; Bisset, Keith R.; Dinan, James; Feng, Wu-chun; Mellor-Crummey, John; Ma, Xiaosong; Thakur, Rajeev

doi:10.1145/2462902.2462915

Cited by 3 publications

(1 citation statement)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most MPI implementations for distributed GPU programming focus on host-initiated techniques [28], which has been simplified with the introduction of unified virtual addressing [29]. RDMA-based programming frameworks, with their simple semantics and low overheads, enable efficient deviceinitiated distributed GPU programming.…”

Section: Related Workmentioning

confidence: 99%

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches

Groves

Brock

Chen

et al. 2020

2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

View full text Add to dashboard Cite

Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, "How can they reduce the overheads and achieve the best communication performance on GPUs?" This work examines device initiated versus host initiated internode GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the deviceinitiated approach for the system evaluated.

show abstract

Section: Related Workmentioning

confidence: 99%