Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol

Besseron, Xavier; Gautier, Thierry

doi:10.1007/978-3-642-29740-3_36

Cited by 5 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As systems are approaching billion‐way concurrency at exascale , we argue that the data‐driven programming models will likely employ over‐decomposition to generate more fined‐grained tasks than available parallelism. While over‐decomposition has the ability to improve utilization and fault tolerance at extreme scales , it poses severe challenges on scheduling system to make fast scheduling decisions (e.g., millions/s) and to be available, in order to achieve the best performance. These requirements are far beyond the capability of today's centralized batch scheduling systems.…”

Section: Introductionmentioning

confidence: 99%

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Wang

Qiao

Sadooghi

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.

show abstract

Section: Introductionmentioning

confidence: 99%

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Wang

Qiao

Sadooghi

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…As systems are growing exponentially in parallelism approaching billion way concurrency at exascale [2], we argue that future programming models will likely employ over-decomposition generating even many more fined-grained tasks than available parallelism. While over-decomposition has been shown to improve utilization at extreme scales as well to make fault tolerance more efficient [3] [4], it poses significant challenges on task scheduling system to make extremely fast scheduling decisions (e.g. millions/sec), in order to achieve the highest throughput and utilization.…”

Section: Introductionmentioning

confidence: 99%

Optimizing load balancing and data-locality with data-aware scheduling

Wang

Zhou

et al. 2014

2014 IEEE International Conference on Big Data (Big Data)

106

View full text Add to dashboard Cite

Abstract-Load balancing techniques (e.g. work stealing) are important to obtain the best performance for distributed task scheduling systems that have multiple schedulers making scheduling decisions. In work stealing, tasks are randomly migrated from heavy-loaded schedulers to idle ones. However, for data-intensive applications where tasks are dependent and task execution involves processing a large amount of data, migrating tasks blindly yields poor data-locality and incurs significant data-transferring overhead. This work improves work stealing by using both dedicated and shared queues. Tasks are organized in queues based on task data size and location. We implement our technique in MATRIX, a distributed task scheduler for many-task computing. We leverage distributed key-value store to organize and scale the task metadata, task dependency, and data-locality. We evaluate the improved work stealing technique with both applications and micro-benchmarks structured as direct acyclic graphs. Results show that the proposed data-aware work stealing technique performs well.

show abstract

“…This time could be shortened, if only one could store and resubmit the task graph from one timestep to another such as in[2].…”

mentioning

confidence: 99%

Fine-Grained MPI+OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

Richard

Latu

Bigot

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

This paper demonstrates how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. It focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations. Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.

show abstract

Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol

Cited by 5 publications

References 19 publications

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Optimizing load balancing and data-locality with data-aware scheduling

Fine-Grained MPI+OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

Contact Info

Product

Resources

About