2006
DOI: 10.1007/11942634_17
|View full text |Cite
|
Sign up to set email alerts
|

A Case for Non-blocking Collective Operations

Abstract: Abstract. Non-blocking collective operations for MPI have been in discussion for a long time. We want to contribute to this discussion and to give a rationale for the usage these operations and assess their possible benefits. A LogGP model for the CPU overhead of collective algorithms and a benchmark to measures it are provided and show a large potential to overlap communication and computation. We show that nonblocking collective operations can provide at least the same benefits as non-blocking point to point… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 27 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…Non-blocking collective operations can move the pseudo-synchronization to the background and allow the user application to tolerate process skew to a certain extent. A detailed discussion of pseudo-synchronization and its effect on parallel program runs is given in [20,21].…”
Section: Non-blocking Collective Oper-ationsmentioning
confidence: 99%
“…Non-blocking collective operations can move the pseudo-synchronization to the background and allow the user application to tolerate process skew to a certain extent. A detailed discussion of pseudo-synchronization and its effect on parallel program runs is given in [20,21].…”
Section: Non-blocking Collective Oper-ationsmentioning
confidence: 99%
“…In this work an MPI AlltoAll collective operation is used to gather data from neighbor nodes instead of using the typical MPI Send/MPI Recv semantics. This collective operation is partially overlapped with the computation on locally available data by using a particular non-blocking version of the MPI AlltoAll collective operation [14]. In addition, the MPI Allreduce collective operation in the CG solver couldn't be overlapped with computation due to data dependencies.…”
Section: Related Workmentioning
confidence: 99%
“…al. proposed using host based techniques for designing non-blocking collective operations [8]. However, host based techniques, offer limited performance portability and may not deliver complete overlap.…”
Section: Impact Of System Noise Of Pcg Run-timesmentioning
confidence: 99%
“…Simplistic designs of non-blocking collectives requiring progressing the MPI library explicitly by CPU intervention, e.g. calling MPI Test [8], offsets much of the benefit of non-blocking communication. Similarly, if threads within the library are used for progression, the application performance can be hurt by interrupt processing, thread scheduling and other such factors, [9].…”
Section: Introductionmentioning
confidence: 99%