Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing 2020
DOI: 10.1137/1.9781611976137.7
|View full text |Cite
|
Sign up to set email alerts
|

Two-level Dynamic Load Balancing for High Performance Scientific Applications

Abstract: Scientific applications are often complex, irregular, and computationally-intensive. To accommodate the ever-increasing computational demands of scientific applications, high performance computing (HPC) systems have become larger and more complex, offering parallelism at multiple levels (e.g., nodes, cores per node, threads per core). Scientific applications need to exploit all the available multilevel hardware parallelism to harness the available computational power. The performance of applications executing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 9 publications
(13 citation statements)
references
References 27 publications
(58 reference statements)
1
12
0
Order By: Relevance
“…It has been recently shown that thread-level load imbalance has a significant impact on the performance of hybrid MPI+OpenMP applications [13]. OpenMP is the most widely-used standard for expressing and exploiting nodelevel parallelism.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been recently shown that thread-level load imbalance has a significant impact on the performance of hybrid MPI+OpenMP applications [13]. OpenMP is the most widely-used standard for expressing and exploiting nodelevel parallelism.…”
Section: Introductionmentioning
confidence: 99%
“…This work builds on recent work on multilevel load balancing [13] by concentrating on thread-level scheduling and deepening the analysis of its performance impact for multithreaded applications executing on hierarchical parallel systems. Specifically, this work provides a broad range of dynamic loop self-scheduling (DLS) techniques, implemented in a unified OpenMP runtime library (RTL), called LB4OMP [17] that can readily be used for MPI+OpenMP applications.…”
Section: Introductionmentioning
confidence: 99%
“…LB4MPI extends the LB tool [13] by including certain bug fixes and additional DLS techniques. LB4MPI has been used to enhance the performance of various scientific applications [31]. In this work, we extend the LB4MPI in two directions: (1) We enable the support of DCA.…”
Section: Dca Implementation Into Lb4mpimentioning
confidence: 99%
“…It has been recently shown that thread-level load imbalance has a significant impact on the performance of hybrid MPI+OpenMP applications [11]. OpenMP is the most widely-used standard for expressing and exploiting node-level parallelism.…”
Section: Introductionmentioning
confidence: 99%
“…This work builds on recent work on multilevel load balancing [11] by concentrating on thread-level scheduling and deepening the analysis of its performance impact for multithreaded applications executing on hierarchical parallel systems. Specifically, this work provides a broad range of dynamic loop self-scheduling (DLS) techniques, implemented in a unified OpenMP runtime library (RTL), called LB4OMP 3 that can readily be used for MPI+OpenMP applications.…”
Section: Introductionmentioning
confidence: 99%