Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing 2013
DOI: 10.1145/2493123.2462915
|View full text |Cite
|
Sign up to set email alerts
|

On the efficacy of GPU-integrated MPI for scientific applications

Abstract: Scientific computing applications are quickly adapting to leverage the massive parallelism of GPUs in large-scale clusters. However, the current hybrid programming models require application developers to explicitly manage the disjointed host and GPU memories, thus reducing both efficiency and productivity. Consequently, GPU-integrated MPI solutions, such as MPI-ACC and MVAPICH2-GPU, have been developed that provide unified programming interfaces and optimized implementations for end-to-end data communication … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 25 publications
(41 reference statements)
0
10
0
Order By: Relevance
“…Moreover, there are two GPU computation modes depending on how the visit messages are processed on the GPUs. In this paper, we discuss the exclusive GPU computation mode, but discussion of the cooperative CPU-GPU computation mode can be found in our prior work [27].…”
Section: Computation-communication Patterns and Mpi-acc-driven Optimimentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, there are two GPU computation modes depending on how the visit messages are processed on the GPUs. In this paper, we discuss the exclusive GPU computation mode, but discussion of the cooperative CPU-GPU computation mode can be found in our prior work [27].…”
Section: Computation-communication Patterns and Mpi-acc-driven Optimimentioning
confidence: 99%
“…We compare the combined performance of all the phases of GPU-EpiSimdemics (computeVisits and computeInteractions), with and without the MPI-ACC-driven optimizations discussed in Section 5. analysis of CA is described in our prior work [27]. We also vary the number of compute nodes from 8 to 128 and the number of GPU devices between 1 and 2.…”
Section: Case Study Analysis: Episimdemicsmentioning
confidence: 99%
“…GeMTC could benefit from a Grophecy or Singe-like module for creating warp-optimized AppKernels and vice versa. MPI-ACC [41] aims to provide integrated MPI support for accelerators to allow the programmer to easily execute code on a CPU or GPU.…”
Section: Related Workmentioning
confidence: 99%
“…State-of-the-art techniques that combine distributed-and shared-memory programming models [80], as well as many PGAS approaches [6,24,47,48], have demon-strated the potential benefits of combining both levels of parallelism [81,82,39,83], including increased communication-computation overlap [84,85], improved memory utilization [86,87], power optimization [88] and effective use of accelerators [89,90,91,92]. The hybrid MPI and thread model, such as MPI and OpenMP, can take advantage of those optimized shared-memory algorithms and data structures.…”
Section: Chapter 4 Habanero-c Runtime Communication Systemmentioning
confidence: 99%