2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021
DOI: 10.1109/ipdpsw52791.2021.00079
|View full text |Cite
|
Sign up to set email alerts
|

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

Abstract: As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPUaware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Charm4Py [15] is a parallel programming model built on top of Charm++. Charm4Py features the message-driven scheduling of Charm++ [16], and has support for many Charm++ features such as dynamic load balancing, GPU-direct communication [17], overdecomposition, and sections. Following the programming model of Charm++, Charm4Py programs consist of one or more chares on each PE in the computation.…”
Section: A Charm4pymentioning
confidence: 99%
See 1 more Smart Citation
“…Charm4Py [15] is a parallel programming model built on top of Charm++. Charm4Py features the message-driven scheduling of Charm++ [16], and has support for many Charm++ features such as dynamic load balancing, GPU-direct communication [17], overdecomposition, and sections. Following the programming model of Charm++, Charm4Py programs consist of one or more chares on each PE in the computation.…”
Section: A Charm4pymentioning
confidence: 99%
“…In addition to optimizations for host-resident data, Charm4Py and mpi4py are capable of inter-process communication consisting of GPU-resident data without first staging data on the host. Charm4Py uses the underlying UCX capabalities of Charm++ [17], and mpi4py utilizes CUDA-aware MPI implementations.…”
Section: Messaging In Pythonmentioning
confidence: 99%