2013 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum 2013
DOI: 10.1109/ipdpsw.2013.256
|View full text |Cite
|
Sign up to set email alerts
|

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming

Abstract: Abstract-Despite the vast interest in accelerator-based systems, programming large multinode GPUs is still a complex task, particularly with respect to optimal data movement across the host-GPU PCIe connection and then across the network. In order to address such issues, GPU-integrated MPI solutions have been developed that integrate GPU data movement into existing MPI implementations. Currently available GPUintegrated MPI frameworks differ in aspects related to the buffer synchronization and ordering semantic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 14 publications
(18 reference statements)
0
3
0
Order By: Relevance
“…With our design, one can simply implicitly denote ordering of MPI and GPU operations by associating GPU events or streams with MPI calls, and the MPI-ACC implementation applies different heuristics to synchronize and make efficient communication progress. We have shown in our prior work [26] that this approach improves productivity and performance, while being compatible with the MPI standard. Moreover, our approach introduces a lightweight runtime attribute check to each MPI operation, but the overhead is much less than with automatic detection, as shown in Figure 2.…”
Section: Mpi-acc's Datatype Attributes Approachmentioning
confidence: 92%
See 1 more Smart Citation
“…With our design, one can simply implicitly denote ordering of MPI and GPU operations by associating GPU events or streams with MPI calls, and the MPI-ACC implementation applies different heuristics to synchronize and make efficient communication progress. We have shown in our prior work [26] that this approach improves productivity and performance, while being compatible with the MPI standard. Moreover, our approach introduces a lightweight runtime attribute check to each MPI operation, but the overhead is much less than with automatic detection, as shown in Figure 2.…”
Section: Mpi-acc's Datatype Attributes Approachmentioning
confidence: 92%
“…This method requires no modifications to the MPI interface. Also, we have shown previously that while their approach works well for standalone point-to-point communication, programmers have to explicitly synchronize between interleaved and dependent MPI and CUDA operations, thereby requiring significant programmer effort to achieve ideal performance [26]. Moreover, as shown in Figure 2, the penalty for runtime checking can be significant and is incurred by all operations, including those that require no GPU data movement at all.…”
Section: Api Designmentioning
confidence: 99%
“…Aji et al [22] examine GPU integrated MPI frameworks and discuss alternatives for buffer synchronization and ordering semantics. In particular, they discuss using MPI communicator or datatype attributes to pass semantic information to the runtime implementation.…”
Section: B Architectures and Relaxed Orderingmentioning
confidence: 99%