The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Proceedings of the Thirteenth EuroSys Conference 2018
DOI: 10.1145/3190508.3190516
|View full text |Cite
|
Sign up to set email alerts
|

Decoupling the control plane from program control flow for flexibility and performance in cloud computing

Abstract: Existing cloud computing control planes do not scale to more than a few hundred cores, while frameworks without control planes scale but take seconds to reschedule a job. We propose an asynchronous control plane for cloud computing systems, in which a central controller can dynamically reschedule jobs but worker nodes never block on communication with the controller. By decoupling control plane traffic from program control flow in this way, an asynchronous control plane can scale to run millions of computation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…Thus, unawareness of application-level QoS at runtime could lead to host resource uneven or over-saturation-intrusive applications consume too many resources-making neighbor protected workloads experience performance outliers. 14,16 As shown in Figure 1A,B, we observed the JCT of the Spark Kmeans job co-located with stream 17 and the JCT of the Mapreduce Terasort job co-located with fio. 18 With the increase in the concurrency of the co-located intrusive workloads, the JCT of spark and Mapreduce jobs continue to grow, and the growth has gone from initially flat to relatively sharp as intrusive workloads steal more and more resources.…”
Section: Performance Interferencementioning
confidence: 72%
See 1 more Smart Citation
“…Thus, unawareness of application-level QoS at runtime could lead to host resource uneven or over-saturation-intrusive applications consume too many resources-making neighbor protected workloads experience performance outliers. 14,16 As shown in Figure 1A,B, we observed the JCT of the Spark Kmeans job co-located with stream 17 and the JCT of the Mapreduce Terasort job co-located with fio. 18 With the increase in the concurrency of the co-located intrusive workloads, the JCT of spark and Mapreduce jobs continue to grow, and the growth has gone from initially flat to relatively sharp as intrusive workloads steal more and more resources.…”
Section: Performance Interferencementioning
confidence: 72%
“…Existing efforts on performance interference migration of scale-out workloads focus on application-level scheduling or rescheduling. 6,10,14,16,43 Work in Reference 14 employs a white-box method to collect and analyze the footprints of the scale-out application at runtime to guide the placement of intrusive tasks to avoid interference, which means that a certain amount of resource utilization is sacrificed in exchange for QoS assurance. In response to the evicted scale-out task, multiple replicas were launched and placed to different hosts according to their load level in Reference 10.…”
Section: Migrating Interference Using Application-level Based Schedulingmentioning
confidence: 99%
“…We evaluate four things: (1) how much micro‐partitioning and randomized partition assignment improve the end‐to‐end performance of simulations, (2) how the number of partitions affects performance, (3) how Birdshot performs when using different numbers of nodes and (4) how well Birdshot performs compared with other load balancing algorithms. Birdshot scheduling uses a task‐based runtime implemented in C++ [QMSL18]. MPI implementations (Open MPI 1.6.5 [url17]) are used as a reference point without micro‐partitioning or randomized assignment.…”
Section: Discussionmentioning
confidence: 99%
“…General cloud schedulers A variety of task and cluster management systems include scheduling subsystems. Many architectures have been designed, from distributed [68,75,78,78,100] to centralized [48,97,98] techniques. Kubernetes can assign QoS to pods [19], but cannot provide function-level QoS.…”
Section: Related Workmentioning
confidence: 99%