Proceedings of the 2007 ACM/IEEE Conference on Supercomputing 2007
DOI: 10.1145/1362622.1362692
|View full text |Cite
|
Sign up to set email alerts
|

Implementation and performance analysis of non-blocking collective operations for MPI

Abstract: Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance library for implementing non-blocking collective MPI communication operations. LibNBC provides non-blocking versions of all MPI collective operations, is layered on top of MPI-1, and is portable to nearly all parallel … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
136
0
1

Year Published

2008
2008
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 156 publications
(146 citation statements)
references
References 29 publications
1
136
0
1
Order By: Relevance
“…However, to retain the benefits of optimized collective operations, we would need a nonblocking version of those operations to enable overlap. Such nonblocking collective operations are proposed for MPI-3 [15] and a reference implementation exists with LibNBC [16]. We can easily apply nonblocking collective operations to the map as well as the reduce functionality.…”
Section: Further Optimization Possibilitiesmentioning
confidence: 99%
“…However, to retain the benefits of optimized collective operations, we would need a nonblocking version of those operations to enable overlap. Such nonblocking collective operations are proposed for MPI-3 [15] and a reference implementation exists with LibNBC [16]. We can easily apply nonblocking collective operations to the map as well as the reduce functionality.…”
Section: Further Optimization Possibilitiesmentioning
confidence: 99%
“…In order to optimize the parallel algorithm, we reduce the overhead arising from the allreduce step by overlapping its communication with computations that are independent of the communicated data. We use LibNBC's [6] non-blocking version of MPI Allreduce called NBC Iallreduce, and the MPI Wait counterpart NBC Wait.…”
Section: Algorithm Parallelization Conceptmentioning
confidence: 99%
“…LibNBC's allreduce uses multiple communication rounds (cf. [6]). This requires the user to ensure progress manually by calling NBC Test or run a separate thread that manages the progression of LibNBC (i.e., progress thread).…”
Section: Implementation With Libnbcmentioning
confidence: 99%
See 1 more Smart Citation
“…This distinction between CPU overhead and network parameters enables researchers to model overlap of communication and computation efficiently. We use this ability to assess the overlap potential of different network interconnect architectures and to optimize the implementation of our non-blocking collective operations library LibNBC [12]. The models of the LogP family have been used by different research groups to derive new algorithms for parallel computing, predict the performance of existing algorithms, or prove an algorithm's optimality [13][14][15][16][17][18].…”
Section: Introductionmentioning
confidence: 99%