Decentralized task-aware scheduling for data center networks

Dogar, Fahad R.; Karagiannis, Thomas; Ballani, Hitesh; Rowstron, Antony

doi:10.1145/2740070.2626322

Cited by 107 publications

(84 citation statements)

References 21 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…EyeQ [12] and Gatekeeper [18], in turn, can offer bandwidth guarantees only when the core of the network is congestion-free. Baraat [1] and Varys [10] achieve high network utilization, but cannot provide strict bandwidth guarantees for tenants. Finally, ElasticSwitch [8] and the Logistic Model [3] are orthogonal to our approach, as they assume there exists an allocation method in the cloud platform (i.e., applications are already allocated).…”

Section: Related Workmentioning

confidence: 99%

IoNCloud: Exploring application affinity to improve utilization and predictability in datacenters

Marcon

Neves

Oliveira

et al. 2015

2015 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

The intra-cloud network is typically shared in a best-effort manner, which causes tenant applications to have no actual bandwidth guarantees. Recent proposals address this issue either by statically reserving a slice of the physical infrastructure for each application or by providing proportional sharing among flows. The former approach results in overprovisioned network resources, while the latter requires substantial management overhead. In this paper, we introduce a resource allocation strategy that aims at providing an efficient way to predictably share bandwidth among applications and at minimizing resource underutilization while maintaining low management overhead. To demonstrate the benefits of the strategy, we develop IoN-Cloud, a system that implements the proposed allocation scheme. IoNCloud employs the abstraction of attraction/repulsion among applications according to their temporal bandwidth demands in order to group them in virtual networks. In doing so, we explore the trade-off between high resource utilization (which is desired by providers to achieve economies of scale) and strict network guarantees (necessary for tenants to run jobs predictably). Evaluation results show that IoNCloud can (a) provide predictable network sharing; and (b) reduce allocated bandwidth, resource underutilization and management overhead when compared against state-of-the-art proposals. I. INTRODUCTIONCloud providers lack practical, efficient and reliable mechanisms to offer bandwidth guarantees for applications [1], [2]. The intra-cloud network is typically oversubscribed and shared in a best-effort manner, relying on TCP to achieve high network utilization and scalability. TCP, nonetheless, does not provide robust isolation among flows in the network [3]; in fact, long-lived flows with a large number of packets are privileged over small ones (which is typically called performance interference [4]) [5]. Moreover, recent studies [6], [7] show that bandwidth available for virtual machines (VMs) in the intracloud network can vary by a factor of five or more, resulting in poor and unpredictable overall application performance.The lack of network guarantees directly impacts both tenants and providers. Tenants are unable to enforce the allocation of network resources for their requests (which particularly hinders applications with strict bandwidth requirements) and can only deploy some specific enterprise applications in the cloud [8]. Moreover, costs are unpredictable due to high network variability (in many services, the subsequent computation depends on the data received from the network [9], [10]). Providers, in turn, may lose revenue, because performance interference ends up reducing datacenter throughput [1], [6]. Recent proposals [3], [6], [8], [11], [12] address this issue either by offering minimum guarantees or by providingproportional sharing. The former explicitly reserves a slice of the physical infrastructure for each application, which results in overprovisioned resources for tenants (since the temporal network...

show abstract

Section: Related Workmentioning

confidence: 99%

IoNCloud: Exploring application affinity to improve utilization and predictability in datacenters

Marcon

Neves

Oliveira

et al. 2015

2015 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

show abstract

“…In these frameworks, data-intensive jobs are divided into multiple successive data-parallel computation stages; and a succeeding computation stage cannot start until getting all its required inputs, which is exactly the outputs of the previous stage. Furthermore, the transmission of the intermediate data is not a negligible phase in a job [1]- [3]. For example, some real traces from Facebook show that, the data transferring phase between successive stages accounts for 33% of the running times of jobs in the system [1].…”

Section: Introductionmentioning

confidence: 99%

“…For example, some real traces from Facebook show that, the data transferring phase between successive stages accounts for 33% of the running times of jobs in the system [1]. Accordingly, speed up the data transfer between computation stages will accelerate the job completion and increase the data center utilization [1]- [3].…”

Section: Introductionmentioning

confidence: 99%

“…Many existing works [1]- [3] focus on minimizing average CCT in DCNs. To the best of our knowledge, Varys [2] and Baraat [3] are the state-of-the-art schemes in centralized and decentralized manner, respectively.…”

Section: Introductionmentioning

confidence: 99%

“…To the best of our knowledge, Varys [2] and Baraat [3] are the state-of-the-art schemes in centralized and decentralized manner, respectively. However, centralized schemes like Varys have the scalability problem.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Minimizing average coflow completion time with decentralized scheduling

Luo

Zhao

et al. 2015

2015 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

In current data centers, an application (e.g. MapReduce) usually generates a collection of parallel flows sharing a common goal. These flows compose a coflow and only completing them all is meaningful. Accordingly, minimizing the average coflow completion time (CCT) becomes a critical objective for flow scheduling. In this topic, the state-of-the-art centralized method, Varys, achieves a good average CCT; but it has the scalability problem. Alternatively, the only existing decentralized method, Baraat, suffers from the head-of-line blocking problem.To solve these problems, we propose D-CAS, a preemptive, decentralized, coflow-aware scheduling system in this paper. D-CAS pursues coflow-level remaining-time-first (MRTF) principle by leveraging a simple negotiation mechanism between each coflow's data senders and receivers. As the MRTF principle is inherently preemptive and proven to be a near-optimal guideline to minimize average CCT, D-CAS avoids the head-of-line blocking problem and gets good performances. Through extensive simulations, we find that D-CAS achieves a performance close to Varys (gap < 15%) and outperforms Baraat significantly (about 1.4-4×).

show abstract

Towards Efficient and Scalable Data-Intensive Content Delivery: State-of-the-Art, Issues and Challenges

Kilanioti

Fernández-Montes

Fernández-Cerero

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This chapter presents the authors' work for the Case Study entitled "Delivering Social Media with Scalability" within the framework of High-Performance Modelling and Simulation for Big Data Applications (cHiPSet) COST Action 1406. We identify some core research areas and give an outline of the publications we came up within the framework of the aforementioned action. The ease of user content generation within social media platforms, e.g. check-in information, multimedia data, etc., along with the proliferation of Global Positioning System (GPS)-enabled, always-connected capture devices lead to data streams of unprecedented amount and a radical change in information sharing. Social data streams raise a variety of practical challenges: derivation of real-time meaningful insights from effectively gathered social information, a paradigm shift for content distribution with the leverage of contextual data associated with user preferences, geographical characteristics and devices in general, etc. In this article we present the methodology we followed, the results of our work and the outline of a comprehensive survey, that depicts the state-of-the-art situation and organizes challenges concerning social media streams and the infrastructure of the data centers supporting the efficient access to data streams in terms of content distribution, data diffusion, data replication, energy efficiency and network infrastructure. The challenges of enabling better provisioning of social media data have been identified and they were based on the context of users accessing these resources. The existing literature has been systematized and the main research points and industrial efforts in the area were identified and analyzed. In our works, in the framework of the Action, we came up with

show abstract

Decentralized task-aware scheduling for data center networks

Cited by 107 publications

References 21 publications

IoNCloud: Exploring application affinity to improve utilization and predictability in datacenters

IoNCloud: Exploring application affinity to improve utilization and predictability in datacenters

Minimizing average coflow completion time with decentralized scheduling

Towards Efficient and Scalable Data-Intensive Content Delivery: State-of-the-Art, Issues and Challenges

Contact Info

Product

Resources

About