The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004. 2004
DOI: 10.1109/reldis.2004.1353004
|View full text |Cite
|
Sign up to set email alerts
|

The /spl phi/ accrual failure detector

Abstract: The detection of failures is a fundamental issue for faulttolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously.We present a novel abstraction, called accrual failure dete… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
116
0
6

Year Published

2005
2005
2018
2018

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 141 publications
(131 citation statements)
references
References 25 publications
3
116
0
6
Order By: Relevance
“…A fault tolerance service to check the cloud providers and other services status will be developed and evaluated. We also plan to use an adaptive fault monitoring algorithm, as proposed by [18,30] and [70], which are more adaptable to be used in a large-scale distributed environment. It is also important to include a security service and an SLA service in the federated platform.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A fault tolerance service to check the cloud providers and other services status will be developed and evaluated. We also plan to use an adaptive fault monitoring algorithm, as proposed by [18,30] and [70], which are more adaptable to be used in a large-scale distributed environment. It is also important to include a security service and an SLA service in the federated platform.…”
Section: Discussionmentioning
confidence: 99%
“…There are extensive studies in the literature on failure detection systems [16,31,45,70]. On the other hand, few systems are designed to scale with a large number of nodes as those found on clouds.…”
Section: Fault Tolerance Service and High Availabilitymentioning
confidence: 99%
“…Each node periodically disseminates its status information to a number of randomlyselected nodes and relays status information received from other nodes. This method is also used to detect and advertise node failures across the cluster [35].…”
Section: Background: the Systems We Targetmentioning
confidence: 99%
“…One of the key messages of this book is that it is important to distinguish between porting a code Table 1: Syntactical constructs used in several failure detector protocols. ϕ is the accrual failure detector discussed in (Hayashibara, 2004;Hayashibara et al, 2004). D is the eventually perfect failure detector of (Chandra & Toueg, 1996).…”
Section: Failure Detection Protocols In the Application Layermentioning
confidence: 99%