Chandy and Lamport have proposed a distributed snapshot algorithm (called CLDSA). One desired property of CLDSA is as follows. Let s1 be the state in which CLDSA initiates, s2 be the state in which CLDSA terminates, and s * be the snapshot taken, and then s * is reachable from s1 and s2 is reachable from s * . The property is called the distributed snapshot reachability (DSR) property. We give a more faithful formal definition of the property that involves two state machines MUDS and CL(MUDS), where MUDS is a state machine of an underlying distributed system (UDS) and CL(MUDS) is a state machine of the UDS superimposed by CLDSA, while the definition of the DSR property used in an existing study only involves CL(MUDS). We also prove a theorem on equivalence of the two definitions that guarantees the validity of the model checking approach used in the existing study.
This paper proposes an approach to the specification and model checking of a large, important class of distributed algorithms called control algorithms (CAs), which are superimposed on underlying distributed systems (UDSs). The approach is based on rewriting logic by moving from its object level to the meta-level. We introduce the idea of specifying CAs as meta-programs that take the specifications of UDSs and automatically generate the specifications of the UDSs on which the CAs are superimposed (UDS-CAs). Due to many options, such as network topologies, even fixing the number of each kind of entities, such as mobile support stations (MSSs) and mobile hosts (MHs) in a mobile checkpointing algorithm, there are many instances of a UDS. To address the problem, we generate all possible initial states of a UDS for a fixed number of each kind of entities such that some constraints, such as MSSs strongly connected with a wired network, are fulfilled and conduct model checking for each of the initial states. We demonstrate the usefulness by reporting on a case study where a counterexample is found for some specific initial states but not for the other initial states, detecting a subtle flaw lurking in a mobile checkpointing algorithm.
The first distributed snapshot algorithm was invented by Chandy and Lamport: Chandy-Lamport distributed snapshot algorithm (CLDSA). Distributed snapshot algorithms are crucial components to make distributed systems fault tolerant. Such algorithms are extremely important because many modern key software systems are in the form of distributed systems and should be fault tolerant. There are at least two desired properties such algorithms should satisfy: 1) the distributed snapshot reachability property (called the DSR property) and 2) the ability to run concurrently with, but not alter, an underlying distributed system (UDS). This paper identifies subtle errors in a paper on formalization of the DSR property and shows how to correct them. We give a more faithful formal definition of the DSR property; the definition involves two state machines -one state machine M U DS that formalizes a UDS and the other M CLDSA that formalizes the UDS on which CLDSA is superimposed (UDS-CLDSA) -and can be used to more precise model checking of the DSR property for CLDSA. We also prove a theorem on equivalence of our new definition and an existing one that only involves M CLDSA to guarantee the validity of the existing model checking approach. Moreover, we prove the second property, namely that CLDSA does not alter the behaviors of UDS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.