This paper introduces a deterministic Byzantine consensus algorithm that relies on a new weak coordinator. As opposed to previous algorithms that cannot terminate in the presence of a faulty or slow coordinator, our algorithm can terminate even when its coordinator is faulty, hence the name weak coordinator. The key idea is to allow processes to complete asynchronous rounds as soon as they receive a threshold of messages, instead of having to wait for a message from a coordinator that may be slow.The resulting algorithm assumes partial synchrony, is resilience optimal, time optimal and does not need signatures. Our presentation *
The concept of unreliable failure detector was introduced by Chandra and Toueg [2/ as a mechanism that provides information about process failures. Depending on the properties the failure detectors guarantee, they proposed a taxonomy of failure detectors. It has been shown that one of the classes of this taxonomy, namely Eventually Strong (OS), is the weakest class allowing to solve the Consensus problem.In this paper; we present a new algorithm implementing OS. Our algorithm guarantees that eventually all the correct processes agree on a common correct process. This property trivially allows us to provide the accuracy and completeness properties required by OS. We show, then, that our algorithm is better than any other proposed implementation of O S in terms of the number of messages and the total amount of information periodically sent. In particular; previous algorithms require to periodically exchange at least a quadratic amount of information, while ours only requires O ( n log n ) (where n is the number ofprocesses).However; we also propose a new measure to evaluate the eficiency of this kind of algorithms, the eventual monitoring degree, which does not rely on a periodic behavior and expresses better the degree of processing required by the algorithms. We show that the runs of our algorithm have optimal eventual monitoring degree.of failure detectors they defined, namely Eventually Strong (OS), is the weakest class allowing to solve Consensus'.Since then, many distributed fault-tolerant algorithms have been designed based on Chandra-Toueg's unreliable failure detectors [5,6,9, 111. Almost all of them consider a system model in which the failure detector they require is available, i.e., an asynchronous system augmented with a failure detector, such that the algorithm is designed on top of it. In this work we have taken a different approach, investigating how to implement those unreliable failure detectors.From the results of Fischer et al. and those of Chandra and Toueg, it can be derived the impossibility of implementing failure detectors strong enough to solve the distributed Consensus problem in a pure asynchronous system. In [2], Chandra and Toueg presented a timeout-based algorithm implementing an Eventually Perfect ( O P ) failure detector -a class strictly stronger than OS-in models of partial synchrony [3]. This algorithm is based o n all-to-all communication: each process periodically sends an I-AM-ALIVE message to all processes, in order to inform them that it has not crashed, and thus requires a quadratic number of messages to be periodically sent. More recently, Larrea et al. [7] proposed more efficient algorithms implementing several classes of failure detectors, including O S and OP.These algorithms are based on a ring arrangement of the processes, and require only a linear number of messages to be periodically sent.
Non-blocking atomic commitment protocols enable a decision (commit or abort) to be reached at every correct participant, despite the failure of others. The cost for non-blocking implies however (I) a high number of messages and communication steps required to reach commit, and (2) a complicated termination protocol needed an the case of failure suspicions. In this paper, we present a non-blocking protocol, called MD3PC (Modular and Decentralized Three Phase Commit), which enables to trade resiliency against eficiency. A s conveyed by our performance measures, MD3PC is faster than existing non-blocking protocols, and in the case o j a broadcast network and a reasonable resiliency rate (e.g 2' or 5') is almost as eficient as the classical (blocking) 2PC. The termination protocol of MD3PC is encapsulated inside a majority consensusprotocol. This modularity leads to a simple structure of MD3PC and enables a precise characterization of its Eiveness an an asynchronous system with an unreliable failure detector.
The consensus problem is a fundamental paradigm in distributed systems, because it captures the difficulty to solve other agreement problems. Many current systems evolve with time, e.g., due to node mobility, and consensus has been little studied in these systems so far. Specifically, it is not well established how to define an appropriate set of assumptions for consensus in dynamic distributed systems. This paper studies a hierarchy of three classes of time-varying graphs, and provides a solution for each class to the problem of Terminating Reliable Broadcast (TRB). The classes introduce increasingly stronger assumptions on timeliness, so that the trade-off between weakness versus implementability and efficiency can be analysed. Being TRB equivalent to consensus in synchronous systems, the paper extends this equivalence to dynamic systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.