2012 IEEE 14th International Conference on High Performance Computing and Communication &Amp; 2012 IEEE 9th International Confe 2012
DOI: 10.1109/hpcc.2012.69
|View full text |Cite
|
Sign up to set email alerts
|

DMA-Assisted, Intranode Communication in GPU Accelerated Systems

Abstract: Abstract-Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In our previous work, we developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories. In this paper, we extend this work with techniques to perform efficient data movement between accelerators within the same node using a DMA-assisted, peer-to-p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 14 publications
(16 reference statements)
0
11
0
Order By: Relevance
“…This way, there is no need to stage the GPU data in and out of the host memory, which can significantly enhance the performance of intranode inter-process GPU-to-GPU communication. Previous research has used CUDA IPC to optimize point-to-point and one-sided communications in MPI [12,5]. However, to the best of our knowledge, CUDA IPC has not been used in the design of collective operations.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This way, there is no need to stage the GPU data in and out of the host memory, which can significantly enhance the performance of intranode inter-process GPU-to-GPU communication. Previous research has used CUDA IPC to optimize point-to-point and one-sided communications in MPI [12,5]. However, to the best of our knowledge, CUDA IPC has not been used in the design of collective operations.…”
Section: Introductionmentioning
confidence: 99%
“…It has been shown that intranode and internode communications between GPUs in HPC platforms play an important role in the performance of scientific applications [1,10]. In this regard, researchers have started looking into incorporating GPU-awareness into the MPI library, targeting both point-to-point and collective communications [12,5,14,11].…”
Section: Introductionmentioning
confidence: 99%
“…In addition to enhanced programmability, transparent architecture specific and vendor specific performance optimizations can be provided within the MPI layer. For example, MPI-ACC enables automatic data pipelining for internode communication, NUMA affinity management, and direct GPU-to-GPU data movement (GPUDirect) for all applicable intranode CUDA communications [6,19], thus providing a heavily optimized end-to-end communication platform.…”
Section: Application Design Using Gpu-integrated Mpi Frameworkmentioning
confidence: 99%
“…All-to-all communication [27] and noncontiguous datatype communication [17,29] have also been studied in the context of GPUaware MPI. With a focus on intranode communication, our previous work [18,19] extends transparent GPU buffers support for MPICH [1] and optimizes the cross-PCIe data movement by using shared memory data structures and interprocess communication (IPC) mechanisms. In contrast to those efforts, here we study the synergistic effect between GPU-accelerated MPI applications and a GPU-integrated MPI implementation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation