Consensus encalpsulates the inherent problems of building fault tolerant distributed systems. In this context, the classic model of Byzantine faulty processes can be restated such that messages from a subset of processes can be arbitrarily corrupted (including addition and omission of messages).We consider the case of dynamic and transient faults, that may affect all processes and that are not permanent, and we model them via corrupted communication. For corrupted communication it is natural to distinguish between the safety of communication, which is concerned with the number of altered messages, and the liveness of communication, which restricts message loss.We present two consensus algorithms, together with sufficient conditions on the system to ensure correctness. Our first algorithm needs strong conditions on safety but requires weak conditions on liveness in order to terminate. Our second algorithm tolerates a lower degree of communication safety at the price of stronger liveness conditions.Our algorithms allow us to circumvent the resilience lower bounds from Santoro/Widmayer and Martin/Alvisi.
Counter abstraction is a powerful tool for parameterized model checking, if the number of local states of the concurrent processes is relatively small. In recent work, we introduced parametric interval counter abstraction that allowed us to verify the safety and liveness of threshold-based fault-tolerant distributed algorithms (FTDA). Due to state space explosion, applying this technique to distributed algorithms with hundreds of local states is challenging for state-of-the-art model checkers. In this paper, we demonstrate that reachability properties of FTDAs can be verified by bounded model checking. To ensure completeness, we need an upper bound on the diameter, i.e., on the longest distance between states. We show that the diameters of accelerated counter systems of FTDAs, and of their counter abstractions, have a quadratic upper bound in the number of local transitions. Our experiments show that the resulting bounds are sufficiently small to use bounded model checking for parameterized verification of reachability properties of several FTDAs, some of which have not been automatically verified before.
Automatic verification of threshold-based fault-tolerant distributed algorithms (FTDA) is challenging: they have multiple parameters that are restricted by arithmetic conditions, the number of processes and faults is parameterized, and the algorithm code is parameterized due to conditions counting the number of received messages. Recently, we introduced a technique that first applies data and counter abstraction and then runs bounded model checking (BMC). Given an FTDA, our technique computes an upper bound on the diameter of the system. This makes BMC complete: it always finds a counterexample, if there is an actual error. To verify state-of-the-art FTDAs, further improvement is needed. In this paper, we encode bounded executions over integer counters in SMT. We introduce a new form of offline partial order reduction that exploits acceleration and the structure of the FTDAs. This aggressively prunes the execution space to be explored by the solver. In this way, we verified safety of seven FTDAs that were out of reach before.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.