We give a process calculus model that formalizes a wellknown algorithm (introduced by Chandra and Toueg) solving consensus in the presence of a particular class of failure detectors (♦S); we use our model to formally prove that the algorithm satisfies its specification.
Abstract. The concept of unreliable failure detectors for reliable distributed systems was introduced by Chandra and Toueg as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem. In this paper, we provide a fresh look at failure detectors from the point of view of programming languages, more precisely using the formal tool of operational semantics. Inspired by this, we propose a new failure detector model that we consider easier to understand, easier to work with and more natural. Using operational semantics, we prove formally that representations of failure detectors in the new model are equivalent to their original representations within the model used by Chandra and Toueg. Executive SummaryBackground In the field of Distributed Algorithms, a widely-used computation model is based on asynchronous communication between a fixed number n of connected processes, where no timing assumptions can be made. Moreover, processes are subject to crash-failure: once crashed, they do not recover. The concept of unreliable failure detectors was introduced by Chandra and Toueg [CT96] as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem [CHT96].The two communities of Distributed Algorithms and Programming Languages do not always speak the same "language". In fact, it is often not easy to understand each other's terminology, concepts, and hidden assumptions. Thus, in this paper, we provide a fresh look at the concept of failure detectors from the point of view of programming languages, using the formal tool of operational semantics. This paper complements previous work [NFM03] in which we used an operational semantics for a distributed process calculus to formally prove that a particular algorithm (also presented in [CT96]) solves the Distributed Consensus problem. Readers who are interested in proofs about algorithms within our new model (rather than proofs about it) are thus referred to our previous paper.
We provide a novel model to formalize a well-known algorithm, by Chandra and Toueg, that solves Consensus among asynchronous distributed processes in the presence of a particular class of failure detectors (3S or, equivalently, Ω), under the hypothesis that only a minority of processes may crash. The model is defined as a global transition system that is unambigously generated by local transition rules. The model is syntax-free in that it does not refer to any form of programming language or pseudo code. We use our model to formally prove that the algorithm is correct. IntroductionIn the field of Distributed Algorithms, a widely-used computation model is based on asynchronous communication between a fixed number n of connected processes. No timing assumptions are made, neither on communications nor on local actions of processes. Often, processes are assumed to be subject to crash-failure: once crashed, they do not recover.In this paper, we focus on distributed programming and coordination problems in the area of asynchronous models. In particular we are interested in (1) how the problems are typically specified, (2) how algorithmic solutions to such problems are described, and (3) how the solutions are shown to be correct with respect to their specification. In our opinion, specifications, solutions and correctness arguments are often presented at a too informal level. The offered amount of detail is not sufficient to fully convince the reader (especially an outsider to the field) of the validity of the arguments: a particular reader who wants to verify the correctness of some proofs often has to prove by herself substantial parts or entire sub-results, for which only informal arguments were given. In contrast, at the core of this paper, we propose a rigorous method to formally describe problems, algorithmic solutions and their respective correctness proofs at a fine-grained level of detail. The method builds upon a largely syntax-free modeling of algorithms as executable transitions systems. It is exemplified on the well-known problem of Distributed Consensus (or shortly: Consensus).Specification. Usually, distributed programming problems are specified in terms of (often temporal) properties of admissible executions. Such an execution, also called system run, represents a (potentially infinite) computation, starting from some initial state, and describing the global behavior of a system as a sequence of actions and configurations according to some discrete time-line. An algorithm is an artifact that generates system runs. Often, some characteristics of components are given with respect to actions that do or do not happen in system runs. For example, a process is called correct in a given run, if it does not crash in that run. A solution to a problem is an algorithm that only generates system runs that respect the required properties. In the case of Consensus, a correct algorithm should only originate system runs that satisfy the following three properties: * The original publication is available at www....
Abstract. Group communication is a programming abstraction that allows a distributed group of processes to provide a reliable service in spite of the possibility of failures within the group. The goal of the project was to improve the state of the art of group communication in several directions: protocol frameworks, group communication stacks, specification, verification and robustness. The paper discusses the results obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.