2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2018
DOI: 10.1109/cahpc.2018.8645953
|View full text |Cite
|
Sign up to set email alerts
|

A Batch Task Migration Approach for Decentralized Global Rescheduling

Abstract: Effectively mapping tasks of High Performance Computing (HPC) applications on parallel systems is crucial to assure substantial performance gains. As platforms and applications grow, load imbalance becomes a priority issue. Even though centralized rescheduling has been a viable solution to mitigate this problem, its efficiency is not able to keep up with the increasing size of shared memory platforms. To efficiently solve load imbalance today, and in the years to come, we should prioritize decentralized strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…The increasing performance demand of parallel applications in HPC environments creates a need for fast, reliable and efficient schedulers. Both our experiments and the literature [10,12,13,22] indicate that parallel and distributed load balancers are the best candidate to fulfill this role, especially when dynamic rescheduling is required. We believe that DGM will bring more benefits to communication-aware strategies in the future, helping to achieve Exascale performance in distributed memory environments.…”
Section: Resultsmentioning
confidence: 59%
See 2 more Smart Citations
“…The increasing performance demand of parallel applications in HPC environments creates a need for fast, reliable and efficient schedulers. Both our experiments and the literature [10,12,13,22] indicate that parallel and distributed load balancers are the best candidate to fulfill this role, especially when dynamic rescheduling is required. We believe that DGM will bring more benefits to communication-aware strategies in the future, helping to achieve Exascale performance in distributed memory environments.…”
Section: Resultsmentioning
confidence: 59%
“…Traditionally, these algorithms used a diffusive approach [11], which consists of iteratively sending work to neighbors that carry a lighter workload. This leads to refinement-based approaches, which attempt to send work units to underloaded resources using probabilistic distributions (e.g., Grapevine [12]) or accumulating work to mitigate communication increases after migration (e.g., PackDrop [13]).…”
Section: Efforts In Load Balancingmentioning
confidence: 99%
See 1 more Smart Citation
“…We may divide LB algorithms in two main categories: Global and Diffusive [10]. While global algorithms centralize system information to make scheduling decisions (most list scheduling approaches follow this methodology [11], [12]); diffusive algorithms will use local machine information to dissipate load among its neighbors or other machines in the system [13], [14], [15].…”
Section: Background a Load Balancingmentioning
confidence: 99%
“…Our approach reduces scheduling overhead by applying the following design: (i) execute scheduling steps in a distributed fashion; (ii) perform workload discretization to simplify decisions; (iii) group local tasks to avoid numerous fine-grained migrations and (iv) minimize the messages between scheduling actors. The discretization technique, called packing, extends a previous technique [6] with notions related to -Nash Equilibrium. We also present PackStealLB , a new distributed load balancing algorithm that uses packing and inherits ideas related to constrained [7,8] and randomized [9] Work Stealing (WS) heuristics.…”
Section: Introductionmentioning
confidence: 99%