Two-level Dynamic Load Balancing for High Performance Scientific Applications

Mohammed, Ali; Cavelan, Aurélien; Ciorba, Florina M.; Cabezón, Rubén M.; Banicesu, Ioana

doi:10.1137/1.9781611976137.7

Cited by 9 publications

(13 citation statements)

References 27 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been recently shown that thread-level load imbalance has a significant impact on the performance of hybrid MPI+OpenMP applications [13]. OpenMP is the most widely-used standard for expressing and exploiting nodelevel parallelism.…”

Section: Introductionmentioning

confidence: 99%

“…This work builds on recent work on multilevel load balancing [13] by concentrating on thread-level scheduling and deepening the analysis of its performance impact for multithreaded applications executing on hierarchical parallel systems. Specifically, this work provides a broad range of dynamic loop self-scheduling (DLS) techniques, implemented in a unified OpenMP runtime library (RTL), called LB4OMP [17] that can readily be used for MPI+OpenMP applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Korndörfer

Eleliemy

Mohammed³

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Korndörfer

Eleliemy

Mohammed³

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…LB4MPI extends the LB tool [13] by including certain bug fixes and additional DLS techniques. LB4MPI has been used to enhance the performance of various scientific applications [31]. In this work, we extend the LB4MPI in two directions: (1) We enable the support of DCA.…”

Section: Dca Implementation Into Lb4mpimentioning

confidence: 99%

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Eleliemy¹,

Ciorba²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) to assign variably-sized chunks of loop iterations. We present a distributed chunk calculation approach (DCA) that supports various types of DLS techniques. Using both CCA and DCA, twelve DLS techniques are implemented and evaluated in different CPU slowdown scenarios. The results show that the DLS techniques implemented using DCA outperform their corresponding ones implemented with CCA, especially in extreme system slowdown scenarios.

show abstract

“…It has been recently shown that thread-level load imbalance has a significant impact on the performance of hybrid MPI+OpenMP applications [11]. OpenMP is the most widely-used standard for expressing and exploiting node-level parallelism.…”

Section: Introductionmentioning

confidence: 99%

“…This work builds on recent work on multilevel load balancing [11] by concentrating on thread-level scheduling and deepening the analysis of its performance impact for multithreaded applications executing on hierarchical parallel systems. Specifically, this work provides a broad range of dynamic loop self-scheduling (DLS) techniques, implemented in a unified OpenMP runtime library (RTL), called LB4OMP 3 that can readily be used for MPI+OpenMP applications.…”

Section: Introductionmentioning

confidence: 99%

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Korndörfer¹,

Eleliemy²,

M³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels. OpenMP is the most widelyused standard for expressing and exploiting the ever-increasing node-level parallelism. The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications. The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature. This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature. LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance. Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library. We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP. Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.

show abstract

Two-level Dynamic Load Balancing for High Performance Scientific Applications

Cited by 9 publications

References 27 publications

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications

Contact Info

Product

Resources

About