Surviving failures in bandwidth-constrained datacenters

Bodík, Peter; Menache, Ishai; Chowdhury, Mosharaf; Mani, Pradeepkumar; Maltz, David A.; Stoica, Ion

doi:10.1145/2377677.2377760

Cited by 49 publications

(52 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The 9s are a logarithmic measures; that is, a system with five 9s availability is 10s times more available than another one with four 9s. • Fault domain: A fault domain is a set of devices that share a single point of failure [10]. For instance, servers connected to the same top-of-rack switch belong to the same fault domain.…”

Section: Survivability-related Conceptsmentioning

confidence: 99%

“…Bodik et al [10] studied resource allocation in data centers that achieve the best tradeoff between fault tolerance and bandwidth usage. Indeed, when VMs of the same VDC (termed "service" in the paper) are spread across the data center, they are less likely to be affected by the same failure (e.g., top-of-rack failures) but they consume significant bandwidth in the data center network, as they are far from each other (Fig.…”

Section: Wcsmentioning

confidence: 99%

“…[10,16]). Furthermore, many of them have assumed that the cluster is homogenous, that is, all devices have the same availabilities and failure rates (e.g., Ref.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Survivability and Fault Tolerance in the Cloud

Zhani

Boutaba

2015

Cloud Services, Networking, and Management

View full text Add to dashboard Cite

Section: Survivability-related Conceptsmentioning

confidence: 99%

Section: Wcsmentioning

confidence: 99%

See 1 more Smart Citation

Survivability and Fault Tolerance in the Cloud

Zhani

Boutaba

2015

Cloud Services, Networking, and Management

View full text Add to dashboard Cite

Section: Simulation Settingsmentioning

confidence: 99%

“…Each TTA request also has a randomly generated resource requirement graph G k , as tenant applications can have a diverse range of communication patterns ranging from star and mesh to linear and ring [38]. Since the number of tiers in the resource requirement graph is likely to be small [38], we consider the resource requirement graph G k with three different sizes 2, 4, and 8. Followed by real cloud providers such as Amazon and Google which provide a small finite set of instances [14,35], we support the resource heterogeneity by defining = {"S", "M", "L", "XL"} where a VM belonging to the "S", "M", "L", and "XL" consumes 1, 2, 4, and 8 units of the server's resources, respectively.…”

Section: Simulation Settingsmentioning

confidence: 99%

Power-efficient resource-guaranteed VM placement and routing for time-aware data center applications

2015

View full text Add to dashboard Cite

Reliability‐aware server consolidation for balancing energy‐lifetime tradeoff in virtualized cloud datacenters

Deng

Liu

Jin

et al. 2013

Int J Communication

View full text Add to dashboard Cite

SUMMARYServer consolidation using virtualization technologies allow large‐scale datacenters to improve resource utilization and energy efficiency. However, most existing consolidation strategies solely focused on balancing the tradeoff between performance service‐level‐agreements (SLAs) desired by cloud applications and energy costs consumed by hosting servers. With the presence of fluctuating workloads in datacenters, the lifetime and reliability of servers under dynamic power‐aware consolidation could be adversely impacted by repeated on–off thermal cycles, ware‐and‐tear and temperature rise. In this paper, we propose a Reliability‐Aware server Consolidation stratEgy, named RACE, to address when and how to perform energy‐efficient server consolidation in a reliability‐friendly and profitable way. The focus is on the characterization and analysis of this problem as a multi‐objective optimization, by developing a utility model that unifies multiple constraints on performance SLAs, reliability factors and energy costs in a holistic manner. An improved grouping genetic algorithm is proposed to search the global optimal solution, which takes advantage of a collection of reliability‐aware resource buffering and virtual machines‐to‐servers re‐mapping heuristics for generating good initial solutions and improving the convergence rate. Extensive simulations are conducted to validate the effectiveness, scalability and overhead of RACE—in improving the overall utility of datacenters while avoiding unprofitable consolidation in the long term—compared with pMapper and PADD strategies for server consolidation. Copyright © 2013 John Wiley & Sons, Ltd.

show abstract

Surviving failures in bandwidth-constrained datacenters

Cited by 49 publications

References 29 publications

Survivability and Fault Tolerance in the Cloud

Survivability and Fault Tolerance in the Cloud

Power-efficient resource-guaranteed VM placement and routing for time-aware data center applications

Reliability‐aware server consolidation for balancing energy‐lifetime tradeoff in virtualized cloud datacenters

Contact Info

Product

Resources

About