Citation for the published paper: Dolev, S. ; Georgiou, C. ; Marcoullis, I. et al. (2015) Abstract. Virtual synchrony (VS) is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is critical for the success of such implementations since large distributed systems can be highly available as long as they do not depend on the full operational status of every system participant. Self-stabilizing systems can tolerate transient faults that drive the system to an arbitrary unpredictable configuration. Such systems automatically regain consistency from any such configuration, and then produce the desired system behavior ensuring it for practically infinite number of successive steps, e.g., 2 64 steps. We present a new multi-purpose self-stabilizing counter algorithm establishing an efficient practically unbounded counter, that can directly yield a self-stabilizing Multiple-Writer Multiple-Reader (MWMR) register emulation. We use our counter algorithm, together with a selfstabilizing group membership and a self-stabilizing multicast service to devise the first practically stabilizing VS algorithm and a self-stabilizing VS-based emulation of state machine replication (SMR). As we base the SMR implementation on VS, rather than consensus, the system progresses in more extreme asynchronous settings in relation to consensusbased SMR.
Current reconfiguration techniques are based on starting the system in a consistent configuration, in which all participating entities are in a predefined state. Starting from that state, the system must preserve consistency as long as a predefined churn rate of processors joins and leaves is not violated, and unbounded storage is available. Many working systems cannot control this churn rate and do not have access to unbounded storage. System designers that neglect the outcome of violating the above assumptions may doom the system to exhibit illegal behaviors. We present the first automatically recovering reconfiguration scheme that recovers from transient faults, such as temporal violations of the above assumptions. Our self-stabilizing solutions regain safety automatically by assuming temporal access to reliable failure detectors. Once safety is re-established, the failure detector reliability is no longer needed. Still, liveness is conditioned by the failure detector's unreliable signals. We show that our self-stabilizing reconfiguration techniques can serve as the basis for the implementation of several dynamic services over message passing systems. Examples include self-stabilizing reconfigurable virtual synchrony, which, in turn, can be used for implementing a self-stabilizing reconfigurable state-machine replication and self-stabilizing reconfigurable emulation of shared memory.
The virtual synchrony abstraction was proven to be extremely useful for asynchronous, large-scale, message-passing distributed systems. Self-stabilizing systems can automatically regain consistency after the occurrence of transient faults.We present the first practically-self-stabilizing virtual synchrony algorithm that uses a new counter algorithm that establishes an efficient practically unbounded counter, which in turn can be directly used for emulating a self-stabilizing Multiple-Writer Multiple-Reader (MWMR). Other self-stabilizing services include membership, multicast, and replicated state machine (RSM) emulation. As we base the latter on virtual synchrony, rather than consensus, the system can progress in more extreme asynchronous executions than consensus-based RSM emulations. * A preliminary version of this work has appeared in the proceedings of the 17th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'15 ).Transient faults Transient violations of design assumptions can lead a system to an arbitrary state. For example, the assumption that error detection ensures the arrival of correct messages and the discarding of corrupted messages, might be violated since error detection is a probabilistic mechanism that may not detect a corrupt message. As a result, the message can be regarded as legitimate, driving the system to an arbitrary state after which, availability and functionality may be damaged forever, requiring human intervention. In the presence of transient faults, large multicomputer systems providing VS-based services can prove hard to manage and control. One key problem, not restricted to virtually synchronous systems, is catering for counters (such as view identifiers) reaching an arbitrary value. How can we deal with the fact that transient faults may force counters to wrap around to the zero value and violate important system assumptions and correctness invariants, such as the ordering of events? A self-stabilizing algorithm [13] can automatically recover from such unexpected failures, possibly as part of after-disaster recovery or even after benign temporal violations of the assumptions made in the design of the system. To the best of our knowledge, no stabilizing virtual synchrony solution exists. We tackle this issue in our work.Practically-self-stabilization A relatively new self-stabilization paradigm is practically-selfstabilization [1,7,16]. Consider an asynchronous system with bounded memory and data link capacity in which corrupt pieces of data (stale information) exist due to a transient fault. (Recall that transient faults can result in the appearance of corrupted information, which the system tends to spread and thus reach an arbitrary state.) These corrupted data may appear unexpectedly at any processor as they lie in communication links, or may (indefinitely) remain "hidden" in some processor's local memory until they are added to the communication links as a response to some other processor's input. Whilst these pieces of corrupted data ...
Numerous distributed applications, such as cloud computing and distributed ledgers, necessitate the system to invoke asynchronous consensus objects an unbounded number of times, where the completion of one consensus instance is followed by the invocation of another. With only a constant number of objects available, object reuse becomes vital.We investigate the challenge of object recycling in the presence of Byzantine processes, which can deviate from the algorithm code in any manner. Our solution must also be self-stabilizing, as it is a powerful notion of fault tolerance. Self-stabilizing systems can recover automatically after the occurrence of arbitrary transient-faults, in addition to tolerating communication and (Byzantine or crash) process failures, provided the algorithm code remains intact.We provide a recycling mechanism for asynchronous objects that enables their reuse once their task has ended, and all non-faulty processes have retrieved the decided values. This mechanism relies on synchrony assumptions and builds on a new self-stabilizing Byzantine-tolerant synchronous multivalued consensus algorithm, along with a novel composition of existing techniques.
Current reconfiguration techniques are based on starting the system in a consistent configuration, in which all participating entities are in a predefined state. Starting from that state, the system must preserve consistency as long as a predefined churn rate of processors joins and leaves is not violated, and unbounded storage is available. Many working systems cannot control this churn rate and do not have access to unbounded storage. System designers that neglect the outcome of violating the above assumptions may doom the system to exhibit illegal behaviors. We present the first automatically recovering reconfiguration scheme that recovers from transient faults, such as temporal violations of the above assumptions. Our self-stabilizing solutions regain safety automatically by assuming temporal access to reliable failure detectors. Once safety is re-established, the failure detector reliability is no longer needed. Still, liveness is conditioned by the failure detector's unreliable signals. We show that our self-stabilizing reconfiguration techniques can serve as the basis for the implementation of several dynamic services over message passing systems. Examples include self-stabilizing reconfigurable virtual synchrony, which, in turn, can be used for implementing a self-stabilizing reconfigurable state-machine replication and self-stabilizing reconfigurable emulation of shared memory.
Many distributed applications, such as cloud computing, service replication, load balancing, and distributed ledgers, e.g., Blockchain, require the system to solve consensus in which all nodes reliably agree on a single value. Binary consensus, where the set of values that can be proposed is either zero or one, is a fundamental building block for other "flavors" of consensus, e.g., multivalued, or vector, and of total order broadcast. At PODC 2014, Mostéfaoui, Moumen, and Raynal, in short MMR, presented a randomized signature-free asynchronous binary consensus algorithm. They demonstrated that their solution can deal with up to t Byzantine nodes, where t < n/3 and n is the number of nodes. MMR assumes the availability of a common coin service and fair scheduling of message arrivals, which does not depend on the current coin values. It terminates within O(1) expected time.Our study, which focuses on binary consensus, aims at the design of an even more robust consensus protocol. We do so by augmenting MMR with self-stabilization, a powerful notion of fault-tolerance. In addition to tolerating node and communication failures, self-stabilizing systems can automatically recover after the occurrence of arbitrary transient-faults; these faults represent any violation of the assumptions on which the system was designed to operate (provided that the algorithm code remains intact).We present the first loosely-self-stabilizing fault-tolerant asynchronous solution to binary consensus in Byzantine message-passing systems. This is achieved via an instructive transformation of MMR to a self-stabilizing solution that can violate safety requirements with the probability Pr = O(2 −M ), where M ∈ Z + is a predefined constant that can be set to any positive value at the cost of 3M n + log M bits of local memory. The obtained self-stabilizing version of the MMR algorithm considers a far broader fault-model since it recovers from transient faults. Additionally, the algorithm preserves the MMR's properties of optimal resilience and termination, i.e., t < n/3, and O(1) expected decision time. Furthermore, it only requires a bounded amount of memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.