Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures 2008
DOI: 10.1145/1378533.1378554
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging non-blocking collective communication in high-performance applications

Abstract: Although overlapping communication with computation is an important mechanism for achieving high performance in parallel programs, developing applications that actually achieve good overlap can be difficult. Existing approaches are typically based on manual or compiler-based transformations. This paper presents a pattern and library-based approach to optimizing collective communication in parallel high-performance applications, based on using non-blocking collective operations to enable overlapping of communic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 10 publications
0
14
0
Order By: Relevance
“…This overlap potential of computation and communication has been analyzed by Hoefler et al and Hoefler and Lumsdaine [13,14]. However, nonblocking collective operations also provide semantic advantages that allow separate start and completion of a globally synchronizing operation.…”
Section: Algorithm N Bx -Nonblocking Consensusmentioning
confidence: 99%
“…This overlap potential of computation and communication has been analyzed by Hoefler et al and Hoefler and Lumsdaine [13,14]. However, nonblocking collective operations also provide semantic advantages that allow separate start and completion of a globally synchronizing operation.…”
Section: Algorithm N Bx -Nonblocking Consensusmentioning
confidence: 99%
“…Although the communication volumes of IS and FT are not as high as CG's, their bulky communication behavior based mostly on all-to-all type operations causes problems for low-bandwidth environments. Recent work by Hoefler et al [21] on non-blocking collective communication operations demonstrated benefits for Fast-FourierTransform operations, such as those used in FT, by allowing collective communication to overlap with computation. A version of FT using nonblocking collective could therefore extend the range of execution environments that lead to reasonable performance.…”
Section: Discussion On Suitable Applications For Parallel Volunteer Cmentioning
confidence: 99%
“…Original HPL chain broadcast hide latency costs, properly applying them to existing realworld applications is non-trivial. Their use often requires significant restructuring to exploit communication/computation overlap fully [20]. This requirement confronts the programmer with yet more complexity in the optimization process.…”
Section: Automatically Updating To Mpi 30mentioning
confidence: 99%