Rachele Fuzzati scite author profile

Abstract. The concept of unreliable failure detectors for reliable distributed systems was introduced by Chandra and Toueg as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem. In this paper, we provide a fresh look at failure detectors from the point of view of programming languages, more precisely using the formal tool of operational semantics. Inspired by this, we propose a new failure detector model that we consider easier to understand, easier to work with and more natural. Using operational semantics, we prove formally that representations of failure detectors in the new model are equivalent to their original representations within the model used by Chandra and Toueg. Executive SummaryBackground In the field of Distributed Algorithms, a widely-used computation model is based on asynchronous communication between a fixed number n of connected processes, where no timing assumptions can be made. Moreover, processes are subject to crash-failure: once crashed, they do not recover. The concept of unreliable failure detectors was introduced by Chandra and Toueg [CT96] as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem [CHT96].The two communities of Distributed Algorithms and Programming Languages do not always speak the same "language". In fact, it is often not easy to understand each other's terminology, concepts, and hidden assumptions. Thus, in this paper, we provide a fresh look at the concept of failure detectors from the point of view of programming languages, using the formal tool of operational semantics. This paper complements previous work [NFM03] in which we used an operational semantics for a distributed process calculus to formally prove that a particular algorithm (also presented in [CT96]) solves the Distributed Consensus problem. Readers who are interested in proofs about algorithms within our new model (rather than proofs about it) are thus referred to our previous paper.

show abstract

Distributed Consensus, revisited

Fuzzati

Merro

Nestmann

2007

Acta Informatica

View full text Add to dashboard Cite

We provide a novel model to formalize a well-known algorithm, by Chandra and Toueg, that solves Consensus among asynchronous distributed processes in the presence of a particular class of failure detectors (3S or, equivalently, Ω), under the hypothesis that only a minority of processes may crash. The model is defined as a global transition system that is unambigously generated by local transition rules. The model is syntax-free in that it does not refer to any form of programming language or pseudo code. We use our model to formally prove that the algorithm is correct. IntroductionIn the field of Distributed Algorithms, a widely-used computation model is based on asynchronous communication between a fixed number n of connected processes. No timing assumptions are made, neither on communications nor on local actions of processes. Often, processes are assumed to be subject to crash-failure: once crashed, they do not recover.In this paper, we focus on distributed programming and coordination problems in the area of asynchronous models. In particular we are interested in (1) how the problems are typically specified, (2) how algorithmic solutions to such problems are described, and (3) how the solutions are shown to be correct with respect to their specification. In our opinion, specifications, solutions and correctness arguments are often presented at a too informal level. The offered amount of detail is not sufficient to fully convince the reader (especially an outsider to the field) of the validity of the arguments: a particular reader who wants to verify the correctness of some proofs often has to prove by herself substantial parts or entire sub-results, for which only informal arguments were given. In contrast, at the core of this paper, we propose a rigorous method to formally describe problems, algorithmic solutions and their respective correctness proofs at a fine-grained level of detail. The method builds upon a largely syntax-free modeling of algorithms as executable transitions systems. It is exemplified on the well-known problem of Distributed Consensus (or shortly: Consensus).Specification. Usually, distributed programming problems are specified in terms of (often temporal) properties of admissible executions. Such an execution, also called system run, represents a (potentially infinite) computation, starting from some initial state, and describing the global behavior of a system as a sequence of actions and configurations according to some discrete time-line. An algorithm is an artifact that generates system runs. Often, some characteristics of components are given with respect to actions that do or do not happen in system runs. For example, a process is called correct in a given run, if it does not crash in that run. A solution to a problem is an algorithm that only generates system runs that respect the required properties. In the case of Consensus, a correct algorithm should only originate system runs that satisfy the following three properties: * The original publication is available at www....

show abstract

Much Ado About Nothing?

Fuzzati¹,

Nestmann²

2006

Electronic Notes in Theoretical Computer Science

View full text Add to dashboard Cite

Advances in the Design and Implementation of Group Communication Middleware

Bünzli

Fuzzati

Mena

et al. 2006

View full text Add to dashboard Cite

Abstract. Group communication is a programming abstraction that allows a distributed group of processes to provide a reliable service in spite of the possibility of failures within the group. The goal of the project was to improve the state of the art of group communication in several directions: protocol frameworks, group communication stacks, specification, verification and robustness. The paper discusses the results obtained.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rachele Fuzzati

Modeling Consensus in a Process Calculus

Unreliable Failure Detectors via Operational Semantics

Distributed Consensus, revisited

Much Ado About Nothing?

Advances in the Design and Implementation of Group Communication Middleware

Contact Info

Product

Resources

About