27th European MPI Users' Group Meeting 2020
DOI: 10.1145/3416315.3416323
|View full text |Cite
|
Sign up to set email alerts
|

MPI Detach - Asynchronous Local Completion

Abstract: When aiming for large scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…MPI+OpenMP(tasks) model may lead to a loss of thread, when a thread executes blocking MPI code within an OpenMP task [11,12]. Many works addressed this issue [4,9,11,16,18,19]. Some approaches [4,11] consist of marking communication tasks from user codes and dedicating threads to communication or computation.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…MPI+OpenMP(tasks) model may lead to a loss of thread, when a thread executes blocking MPI code within an OpenMP task [11,12]. Many works addressed this issue [4,9,11,16,18,19]. Some approaches [4,11] consist of marking communication tasks from user codes and dedicating threads to communication or computation.…”
Section: Related Workmentioning
confidence: 99%
“…Schuchart et al [19] explored various implementations of it, and having implementations that effectively suspend, enables the expression of fine MPI data movement within OpenMP tasks. This resulted in a more efficient implementation of the blocked Cholesky factorization with fewer synchronizations and led to new approaches on MPI+OpenMP(tasks) interoperability, such as TAMPI [18] and MPI_Detach [16]. TAMPI was proposed as a user library to enable blocking-tasks pause and resume mechanism.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations