GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

Choi, Jaemin; Fink, Zane; White, Sam; Bhat, Nitin; Richards, David F.; Kalé, Laxmikant V.

doi:10.1109/ipdpsw52791.2021.00079

Cited by 3 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Charm4Py [15] is a parallel programming model built on top of Charm++. Charm4Py features the message-driven scheduling of Charm++ [16], and has support for many Charm++ features such as dynamic load balancing, GPU-direct communication [17], overdecomposition, and sections. Following the programming model of Charm++, Charm4Py programs consist of one or more chares on each PE in the computation.…”

Section: A Charm4pymentioning

confidence: 99%

See 1 more Smart Citation

Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py

Fink¹,

Liu²,

Choi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While highperformance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale highperformance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks' relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py.We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing.

show abstract

Section: A Charm4pymentioning

confidence: 99%

“…In addition to optimizations for host-resident data, Charm4Py and mpi4py are capable of inter-process communication consisting of GPU-resident data without first staging data on the host. Charm4Py uses the underlying UCX capabalities of Charm++ [17], and mpi4py utilizes CUDA-aware MPI implementations.…”

Section: Messaging In Pythonmentioning

confidence: 99%