Canary: A Scheduling Architecture for High Performance Cloud Computing

Qu, Hang; Mashayekhi, Omid; Terei, David; Levis, Philip

doi:10.48550/arxiv.1602.01412

Cited by 5 publications

(9 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With empty tasks [28], the resulting upper bound on task scheduling throughput fails to represent useful work within a realistic application. With non-empty tasks, since the efficiency of the overall application is typically not reported [3,6], TPS is not a measurement of runtime-limited performance. Large tasks may be used to hide any amount of runtime overhead, while small tasks may result in a drop in total application throughput even as TPS increases.…”

Section: Metgmentioning

confidence: 99%

“…Limit studies of task scheduling throughput in various runtime systems often make additional assumptions. A popular assumption is the use of trivially parallel tasks [3,6], which as shown in Section 5.5 underestimates (often substantially) the cost of scheduling a task and can also impact scalability.…”

Section: Related Workmentioning

confidence: 99%

“…Intuitively, for a given workload, METG(50%) is the smallest task granularity that maintains at least 50% efficiency, meaning that the application achieves at least 50% of the highest performance (in FLOP/s, B/s, or other application-specific measure) achieved on a given machine. The efficiency bound in METG is a key innovation over previous approaches, such as tasks per second (TPS), that fail to consider the amount of useful work performed (if tasks are non-empty [3,6]) or to perform useful work at all (if tasks are empty [28]).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

Slaughter,

Wu,

et al. 2019

Preprint

View full text Add to dashboard Cite

We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple programming systems by making the implementation for a given system orthogonal to the benchmarks themselves: every benchmark constructed with Task Bench runs on every Task Bench implementation. Furthermore, Task Bench's parameterization enables a wide variety of benchmark scenarios that distill the key characteristics of larger applications.We conduct a comprehensive study with implementations of Task Bench in 15 programming systems on up to 256 Haswell nodes of the Cori supercomputer. We introduce a novel metric, minimum effective task granularity to study the baseline runtime overhead of each system. We show that when running at scale, 100 µs is the smallest granularity that even the most efficient systems can reliably support with current technologies. We also study each system's scalability, ability to hide communication and mitigate load imbalance.

show abstract

Section: Metgmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

Slaughter,

Wu,

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Most cluster computing frameworks, such as Spark [64], CIEL [40], and Dryad [28] implement a centralized scheduler, which can provide locality but at latencies in the tens of ms. Distributed schedulers such as work stealing [12], Sparrow [45] and Canary [47] can achieve high scale, but they either don't consider data locality [12], or assume tasks belong to independent jobs [45], or assume the computation graph is known [47].…”

Section: Bottom-up Distributed Schedulermentioning

confidence: 99%

“…Canary [47] achieves impressive performance by having each scheduler instance handle a portion of the task graph, but does not handle dynamic computation graphs.…”

Section: Related Workmentioning

confidence: 99%

Ray: A Distributed Framework for Emerging AI Applications

Moritz¹,

Nishihara²,

Wang³

et al. 2017

Preprint

View full text Add to dashboard Cite

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray-a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications.

show abstract