Chandra and Toueg introduced the concept of unreliable failure detectors. They showed how, by adding these detectors to an asynchronous system, it is possible to solve the Consensus problem. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service according to application needs. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of "I am alive" messages.
We present a new failure detector implementation. This implementation, a variant of the heartbeat failure detector, is both adaptable and designed for scalability. Its first specificity lies in the fact that it is designed as a shared service among several applications by way of an adaptation layer. This layer adapts the quality of service according to application needs. The second specificity is the hierarchic organization of the detection service: it allows to decrease the number of messages and the processor load. Through an experimentation evaluation, we show that our implementation is adaptable to the environment characteristics and usable with large scale applications.
The Internet of Things (IoT) generates massive streams of data which call for ever more efficient real time processing. Designing and implementing a big data service for the real time processing of such data requires an extensive knowledge of both input load and data distribution in order to provide a service which can cope with the workload. In this context, we study in this paper the challenges inherent to the real time processing of massive data flows from the IoT. We provide a detailed analysis of traces gathered from a well-known healthcare sport-oriented application in order to illustrate our conclusions from a big data perspective.
This paper presents DARX, our framework for building applications that provide adaptive fault tolerance. It relies on the fact that multi-agent platforms constitute a very strong basis for decentralized software that is both flexible and scalable, and makes the assumption that the relative importance of each agent varies during the course of the computation. DARX regroups solutions which facilitate the creation of multi-agent applications in a large-scale context.
Its most important feature is adaptive replication: replication strategies are applied on a per-agent basis with respect to transient environment characteristics such as the importance of the agent for the computation, the network load or the mean time between failures.Firstly, the interwoven concerns of multi-agent systems and fault-tolerant solutions are put forward. An overview of the DARX architecture follows, as well as an evaluation of its performances. We conclude, after outlining the promising outcomes, by presenting prospective work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.