2010
DOI: 10.1007/978-3-642-15277-1_10
|View full text |Cite
|
Sign up to set email alerts
|

A Model for Space-Correlated Failures in Large-Scale Distributed Systems

Abstract: Distributed systems such as grids, peer-to-peer systems, and even Internet DNS servers have grown significantly in size and complexity in the last decade. This rapid growth has allowed distributed systems to serve a large and increasing number of users, but has also made resource and system failures inevitable. Moreover, perhaps as a result of system complexity, in distributed systems a single failure can trigger within a short time span several more failures, forming a group of time-correlated failures. To el… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(48 citation statements)
references
References 19 publications
1
47
0
Order By: Relevance
“…The FTA traces have been instrumental in the development of various failure models [46,55,56]. In a recent study [46], we analyzed and model the time-varying behavior of failures in large-scale distributed systems.…”
Section: Discussion: On the Current And Future Use Of The Ftamentioning
confidence: 99%
“…The FTA traces have been instrumental in the development of various failure models [46,55,56]. In a recent study [46], we analyzed and model the time-varying behavior of failures in large-scale distributed systems.…”
Section: Discussion: On the Current And Future Use Of The Ftamentioning
confidence: 99%
“…However, they pose more fundamental challenges for problem determination. Problem determination in traditional distributed systems has always focused on one application at a time [8]. Collocation faults imply that the model for normal behavior of an application needs to be learned in the context of other collocated applications.…”
Section: B End-to-end Problem Determination In a Cloud Ecosystemmentioning
confidence: 99%
“…We configure a cluster of 8 Hadoop Sort benchmark is run on 5 GB of data. We correlate CPU, memory and disk utilization across the VMs and report their average values in Figure 7.…”
Section: ) Stability Of Correlation With Change In Vm Configurationmentioning
confidence: 99%
See 1 more Smart Citation
“…There are other papers that discuss correlated failures in various contexts ( [12] and [13]) other than communication networks. There are also models for temporally correlated failures [14].…”
Section: Introductionmentioning
confidence: 99%