2012
DOI: 10.1145/2377677.2377760
|View full text |Cite
|
Sign up to set email alerts
|

Surviving failures in bandwidth-constrained datacenters

Abstract: Abstract-Datacenter networks have been designed to tolerate failures of network equipment and provide sufficient bandwidth. In practice, however, failures and maintenance of networking and power equipment often make tens to thousands of servers unavailable, and network congestion can increase service latency. Unfortunately, there exists an inherent tradeoff between achieving high fault tolerance and reducing bandwidth usage in network core; spreading servers across fault domains improves fault tolerance, but r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
52
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 49 publications
(52 citation statements)
references
References 29 publications
0
52
0
Order By: Relevance
“…The 9s are a logarithmic measures; that is, a system with five 9s availability is 10s times more available than another one with four 9s. • Fault domain: A fault domain is a set of devices that share a single point of failure [10]. For instance, servers connected to the same top-of-rack switch belong to the same fault domain.…”
Section: Survivability-related Conceptsmentioning
confidence: 99%
See 2 more Smart Citations
“…The 9s are a logarithmic measures; that is, a system with five 9s availability is 10s times more available than another one with four 9s. • Fault domain: A fault domain is a set of devices that share a single point of failure [10]. For instance, servers connected to the same top-of-rack switch belong to the same fault domain.…”
Section: Survivability-related Conceptsmentioning
confidence: 99%
“…Bodik et al [10] studied resource allocation in data centers that achieve the best tradeoff between fault tolerance and bandwidth usage. Indeed, when VMs of the same VDC (termed "service" in the paper) are spread across the data center, they are less likely to be affected by the same failure (e.g., top-of-rack failures) but they consume significant bandwidth in the data center network, as they are far from each other (Fig.…”
Section: Wcsmentioning
confidence: 99%
See 1 more Smart Citation
“…Each TTA request also has a randomly generated resource requirement graph G k , as tenant applications can have a diverse range of communication patterns ranging from star and mesh to linear and ring [38]. Since the number of tiers in the resource requirement graph is likely to be small [38], we consider the resource requirement graph G k with three different sizes 2, 4, and 8.…”
Section: Simulation Settingsmentioning
confidence: 99%
“…Each TTA request also has a randomly generated resource requirement graph G k , as tenant applications can have a diverse range of communication patterns ranging from star and mesh to linear and ring [38]. Since the number of tiers in the resource requirement graph is likely to be small [38], we consider the resource requirement graph G k with three different sizes 2, 4, and 8. Followed by real cloud providers such as Amazon and Google which provide a small finite set of instances [14,35], we support the resource heterogeneity by defining = {"S", "M", "L", "XL"} where a VM belonging to the "S", "M", "L", and "XL" consumes 1, 2, 4, and 8 units of the server's resources, respectively.…”
Section: Simulation Settingsmentioning
confidence: 99%