“…Our experiments show that this algorithm fails in the presence of 3 faulty processes, i.e., (C) and (R) are violated. Table 2 summarizes our experiments for the algorithms in [9], [6], and [27]. The specification (F) is related to agreement and was also used in [17].…”
Section: Experiments With Spinmentioning
confidence: 99%
“…As there are at least n − t correct processes, the guard cannot be blocked by faulty processes, which avoids the problems of Example (b). In the distributed algorithms literature, one finds a variety of different thresholds: Typical numbers are n/2 + 1 (for majority [13,27]), t + 1 (to wait for a message from at least one correct process [34,13]), or n − t (in the Byzantine case [34,2] to wait for at least t + 1 messages from correct processes, provided n > 3t).…”
“…For the encoding of the algorithm from [6] we were required to use two message types -opposed to the one type of the echo messages in Algorithm 1. Finally, we implemented the asynchronous condition-based consensus algorithm from [27]. We specialized it to binary consensus, which resulted in an encoding which requires four different message types.…”
Section: Experiments With Spinmentioning
confidence: 99%
“…As usual for fault-tolerant algorithms, this block has three logical parts: the receive part (lines [21][22][23][24], the computation part (lines [25][26][27][28][29][30][31][32], and the sending part (lines 33-38). As we have already discussed the encoding of message passing above, it remains to discuss the control flow of the algorithm.…”
Section: Encoding the Control Flowmentioning
confidence: 99%
“…These algorithms include several variants of the classic asynchronous broadcasting algorithm from [34] under various fault assumptions, the broadcasting algorithm from [6] tolerating Byzantine faults, the classic broadcasting algorithm found, e.g., in [9], that tolerates crash faults, as well as a condition-based consensus algorithm [27] that also tolerates crash faults.…”
Abstract. Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems. In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also in larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos) efficient methods to handle them bring us closer to the above mentioned goal. As case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.
“…Our experiments show that this algorithm fails in the presence of 3 faulty processes, i.e., (C) and (R) are violated. Table 2 summarizes our experiments for the algorithms in [9], [6], and [27]. The specification (F) is related to agreement and was also used in [17].…”
Section: Experiments With Spinmentioning
confidence: 99%
“…As there are at least n − t correct processes, the guard cannot be blocked by faulty processes, which avoids the problems of Example (b). In the distributed algorithms literature, one finds a variety of different thresholds: Typical numbers are n/2 + 1 (for majority [13,27]), t + 1 (to wait for a message from at least one correct process [34,13]), or n − t (in the Byzantine case [34,2] to wait for at least t + 1 messages from correct processes, provided n > 3t).…”
“…For the encoding of the algorithm from [6] we were required to use two message types -opposed to the one type of the echo messages in Algorithm 1. Finally, we implemented the asynchronous condition-based consensus algorithm from [27]. We specialized it to binary consensus, which resulted in an encoding which requires four different message types.…”
Section: Experiments With Spinmentioning
confidence: 99%
“…As usual for fault-tolerant algorithms, this block has three logical parts: the receive part (lines [21][22][23][24], the computation part (lines [25][26][27][28][29][30][31][32], and the sending part (lines 33-38). As we have already discussed the encoding of message passing above, it remains to discuss the control flow of the algorithm.…”
Section: Encoding the Control Flowmentioning
confidence: 99%
“…These algorithms include several variants of the classic asynchronous broadcasting algorithm from [34] under various fault assumptions, the broadcasting algorithm from [6] tolerating Byzantine faults, the classic broadcasting algorithm found, e.g., in [9], that tolerates crash faults, as well as a condition-based consensus algorithm [27] that also tolerates crash faults.…”
Abstract. Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems. In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also in larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos) efficient methods to handle them bring us closer to the above mentioned goal. As case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.
In recent work [12,10], we have introduced a technique for automatic verification of threshold-guarded distributed algorithms that have the following features: (1) up to t of processes may crash or behave Byzantine; (2) the correct processes count messages and progress when they receive sufficiently many messages, e.g., at least t + 1; (3) the number n of processes in the system is a parameter, as well as t; (4) and the parameters are restricted by a resilience condition, e.g., n > 3t. In this paper, we present Byzantine Model Checker that implements the above-mentioned technique. It takes two kinds of inputs, namely, (i) threshold automata (the framework of our verification techniques) or (ii) Parametric Promela (which is similar to the way in which the distributed algorithms were described in the literature). We introduce a parallel extension of the tool, which exploits the parallelism enabled by our technique on an MPI cluster. We compare performance of the original technique and of the extensions by verifying 10 benchmarks that model fault-tolerant distributed algorithms from the literature. For each benchmark algorithm we check two encodings: a manual encoding in threshold automata vs. a Promela encoding.
Threshold guards are a basic primitive of many fault-tolerant algorithms that solve classical problems of distributed computing, such as reliable broadcast, two-phase commit, and consensus. Moreover, threshold guards can be found in recent blockchain algorithms such as Tendermint consensus. In this article, we give an overview of the techniques implemented in Byzantine Model Checker (ByMC). ByMC implements several techniques for automatic verification of threshold-guarded distributed algorithms. These algorithms have the following features: (1) up to t of processes may crash or behave Byzantine; (2) the correct processes count messages and make progress when they receive sufficiently many messages, e.g., at least t + 1; (3) the number n of processes in the system is a parameter, as well as t; (4) and the parameters are restricted by a resilience condition, e.g., n > 3t. Traditionally, these algorithms were implemented in distributed systems with up to ten participating processes. Nowadays, they are implemented in distributed systems that involve hundreds or thousands of processes. To make sure that these algorithms are still correct for that scale, it is imperative to verify them for all possible values of the parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.