2018
DOI: 10.1002/cpe.4667
|View full text |Cite
|
Sign up to set email alerts
|

Design considerations for GPU‐aware collective communications in MPI

Abstract: Summary GPU accelerators have established themselves in the state‐of‐the‐art clusters by offering high performance and energy efficiency. In such systems, efficient inter‐process GPU communication is of paramount importance to application performance. This paper investigates various algorithms in conjunction with the latest GPU features to improve GPU collective operations. First, we propose a GPU Shared Buffer‐aware (GSB) algorithm and a Binomial Tree Based (BTB) algorithm for GPU collectives on single‐GPU no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…In particular, this selection includes four interesting works . Two of them were contributions from the last two workshop editions (HUCAA 2015 and 2016, both collocated with the International Conference on Parallel Processing ‐ ICPP'15 and ICPP'16).…”
Section: Themes Of This Special Issuementioning
confidence: 99%
“…In particular, this selection includes four interesting works . Two of them were contributions from the last two workshop editions (HUCAA 2015 and 2016, both collocated with the International Conference on Parallel Processing ‐ ICPP'15 and ICPP'16).…”
Section: Themes Of This Special Issuementioning
confidence: 99%
“…Usually, scientists directly use the way of MPI parallelism or GPU parallelism. However, MPI and GPU can be combined and used for large-scale computing tasks in many fields, and the MPI-GPU heterogeneous way has been widely used [29][30][31]. In computational fluid dynamics, Choi et al used a floating-point compression algorithm to optimize the GPU memory capacity in the heterogeneous MPI-GPU implementation [32], and Lai et al developed a heterogeneous parallel program combining MPI and CUDA for CFD applications on high-performance computing clusters to greatly improve computational efficiency [33].…”
Section: Introductionmentioning
confidence: 99%