MPI Detach - Asynchronous Local Completion

Protze, Joachim; Hermanns, Marc-André; Demiralp, Ali Can; Müller, Matthias S.; Kuhlen, Torsten

doi:10.1145/3416315.3416323

Cited by 6 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MPI+OpenMP(tasks) model may lead to a loss of thread, when a thread executes blocking MPI code within an OpenMP task [11,12]. Many works addressed this issue [4,9,11,16,18,19]. Some approaches [4,11] consist of marking communication tasks from user codes and dedicating threads to communication or computation.…”

Section: Related Workmentioning

confidence: 99%

“…Schuchart et al [19] explored various implementations of it, and having implementations that effectively suspend, enables the expression of fine MPI data movement within OpenMP tasks. This resulted in a more efficient implementation of the blocked Cholesky factorization with fewer synchronizations and led to new approaches on MPI+OpenMP(tasks) interoperability, such as TAMPI [18] and MPI_Detach [16]. TAMPI was proposed as a user library to enable blocking-tasks pause and resume mechanism.…”

Section: Related Workmentioning

confidence: 99%

“…It transforms calls to MPI blocking operations to non-blocking ones through the PMPI interface and interoperates with the underlying tasking runtime -typically using the taskyield in OpenMP, or nanos6_block_current_task in Nanos6. The authors of [16] proposed another interoperability approach using the detach clause, which implies MPI specifications extensions to add asynchronous callbacks on communications completion, and also user code adaptations. Among all these works, our solution on the loss of threads issue differs from [4,11,16]: we aim at no user code modifications, and to progress both communications and computations by any thread opportunistically.…”

Section: Related Workmentioning

confidence: 99%

“…The authors of [16] proposed another interoperability approach using the detach clause, which implies MPI specifications extensions to add asynchronous callbacks on communications completion, and also user code adaptations. Among all these works, our solution on the loss of threads issue differs from [4,11,16]: we aim at no user code modifications, and to progress both communications and computations by any thread opportunistically. Our approach is more likely a mix of [9,16,18] with automation through runtime interoperations.…”

Section: Related Workmentioning

confidence: 99%

“…Deadlocks can be due to the loss of cores when threads execute blocking MPI calls within OpenMP tasks [11]. Several solutions address this issue [16,18,20] and enable working MPI+OpenMP(tasks) codes, but performance issues remain. Task scheduling in this hybrid context can significantly improve the overall performance.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Communication-Aware Task Scheduling Strategy in Hybrid MPI+OpenMP Applications

Pereira

Roussel

Carribault

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

While task-based programming, such as OpenMP, is a promising solution to exploit large HPC compute nodes, it has to be mixed with data communications like MPI. However, performance or even more thread progression may depend on the underlying runtime implementations. In this paper, we focus on enhancing the application performance when an OpenMP task blocks inside MPI communications. This technique requires no additional effort on the application developers. It relies on an online task re-ordering strategy that aims at running first tasks that are sending data to other processes. We evaluate our approach on a Cholesky factorization and show that we gain around 19% of execution time on an Intel Skylake compute nodes machine -each node having two 24-core processors.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Communication-Aware Task Scheduling Strategy in Hybrid MPI+OpenMP Applications

Pereira

Roussel

Carribault

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

Ferat

Pereira

Roussel

et al. 2022

OpenMP in a Modern World: From Multi-Device Support to Meta Programming

View full text Add to dashboard Cite

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.

show abstract