Chandra and Toueg introduced the concept of unreliable failure detectors. They showed how, by adding these detectors to an asynchronous system, it is possible to solve the Consensus problem. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service according to application needs. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of "I am alive" messages.
We present a new failure detector implementation. This implementation, a variant of the heartbeat failure detector, is both adaptable and designed for scalability. Its first specificity lies in the fact that it is designed as a shared service among several applications by way of an adaptation layer. This layer adapts the quality of service according to application needs. The second specificity is the hierarchic organization of the detection service: it allows to decrease the number of messages and the processor load. Through an experimentation evaluation, we show that our implementation is adaptable to the environment characteristics and usable with large scale applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.