Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Eleliemy, Ahmed; Ciorba, Florina M.

doi:10.1109/ipdpsw.2019.00117

Cited by 4 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) Separation between concepts and implementations: the DCA [11] and its hierarchical version [12] were motivated by the new advancements in the MPI 3.1 standard, namely MPI one-sided communication and MPI shared-memory. The following question arises: Is DCA limited to specific MPI features?…”

Section: Introductionmentioning

confidence: 99%

“…We highlight specific requirements that a DLS technique needs to fulfill to separate chunk calculation that can be distributed across all PEs and the chunk assignment that should be synchronized across all PEs. In contrast to earlier efforts [11,12], we introduce and evaluate a two-sided MPI-based implementation of DCA. This implementation applies to all existing MPI runtime libraries because they fully support two-sided MPI communication.…”

Section: Introductionmentioning

confidence: 99%

“…(2) Support for new DLS categories: Previously, DCA [11,12] only supported DLS techniques with either fixed or decreasing chunk size patterns. In this extended manuscript, we discuss how DCA supports DLS techniques that calculate fixed, decreasing, increasing, and irregular chunk size patterns.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Eleliemy¹,

Ciorba²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) to assign variably-sized chunks of loop iterations. We present a distributed chunk calculation approach (DCA) that supports various types of DLS techniques. Using both CCA and DCA, twelve DLS techniques are implemented and evaluated in different CPU slowdown scenarios. The results show that the DLS techniques implemented using DCA outperform their corresponding ones implemented with CCA, especially in extreme system slowdown scenarios.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Eleliemy¹,

Ciorba²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

An HPC hybrid parallel approach to the experimental analysis of Fermat’s theorem extension to arbitrary dimensions on heterogeneous computer systems

2021

View full text Add to dashboard Cite

Hierarchical dynamic workload scheduling on heterogeneous clusters for grid search of inverse problems

2023

View full text Add to dashboard Cite

Inverse problems occur in many scientific fields. Albeit grid search, where points of a regular grid are tested as possible solutions, is a straightforward and robust method to numerically solve inverse problems, it is computationally intensive and becomes prohibitive when the problem has a high dimensionality. Heterogeneous clusters are a viable and cost-effective solution to exploit the combined computational power of multiple available computers. In this paper, we present a computing framework that supports efficient grid search for inverse problems on heterogeneous clusters. Scheduling the workload on such systems might be challenging, especially when nodes are comprised of CPUs and GPUs with different computational speeds. The framework dynamically schedules computations on the processing elements of the cluster according to a selected performance index, which is determined at run-time. The framework is extensible, as it allows easy integration of additional inverse problems.

show abstract

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Cited by 4 publications

References 35 publications

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

An HPC hybrid parallel approach to the experimental analysis of Fermat’s theorem extension to arbitrary dimensions on heterogeneous computer systems

Hierarchical dynamic workload scheduling on heterogeneous clusters for grid search of inverse problems

Contact Info

Product

Resources

About