2006
DOI: 10.1007/11846802_52
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations

Abstract: This paper presents a case study that analyzes the suitability and usage of nonblocking collective operations in parallel applications. As with their point-to-point counterparts, non-blocking collective operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. These operations are provided for MPI programs with LibNBC, a portable low-overhead implementation of non-blocking collective operations built on MPI-1. The straightforward applicability of the Lib… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…The plain MPI implementation is extracted from a state of the art work on optimizing CG solvers with non‐blocking collective operations. () This latter implementation contains roughly 900 lines of code. The same output has been reproduced using ExaShark with 150 lines of C++ code.…”
Section: Applicationsmentioning
confidence: 99%
“…The plain MPI implementation is extracted from a state of the art work on optimizing CG solvers with non‐blocking collective operations. () This latter implementation contains roughly 900 lines of code. The same output has been reproduced using ExaShark with 150 lines of C++ code.…”
Section: Applicationsmentioning
confidence: 99%
“…The reference implementation is an open-source code 2 that uses MPI blocking and non-blocking all-to-all collective operations to implement the halo exchange [17]. [17] demonstrates the effectiveness of using non-blocking collective operations to overlap the communication and computation in the haloexchange. We decouple the halo exchange operation onto a separate group of processes, denoted as group G 1 .…”
Section: Conjugate Gradient Solvermentioning
confidence: 99%
“…However, our previous works involving overlap, such as optimization of a Poisson solver [5] or the optimization of a Fast Fourier Transformation [6] showed that this simple heuristic is not sufficient to achieve good overlap. The two main reasons for this have been found in a theoretical and practical analysis of nonblocking collective operations [7].…”
Section: Manual Transformation Techniquementioning
confidence: 99%
“…Code must often be significantly restructured to take full advantage of non-blocking collective operations. We learned in several application studies [5,6] that using non-blocking collectives can lead to performance benefits of up to 35% by overlapping computation and communication. We also showed that their usage is likely to be labor-intensive and error-prone-and may decrease code readability as well.…”
Section: Introductionmentioning
confidence: 99%