The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound Δ on the time required for a message to be sent from one processor to another and a known fixed upper bound Φ on the relative speeds of different processors. In an asynchronous system no fixed upper bounds Δ and Φ exist. In one version of partial synchrony, fixed bounds Δ and Φ exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds Δ and Φ. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time
T
, and protocols must be designed to work correctly regardless of when time
T
occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant “distributed clocks” that allow partially synchronous processors to reach some approximately common notion of time.
Abstract. The purpose of this paper is a study of computation that can be done locally in a distributed network, where \locally" means within time (or distance) independent of the size of the network. Locally Checkable Labeling (LCL) problems are considered, where the legality of a labeling can be checked locally (e.g., coloring). The results include the following:There are non-trivial LCL problems that have local algorithms.There is a variant of the dining philosophers problem that can be solved locally. Randomizationcannot make an LCL problem local; i.e., if a problem has a local randomized algorithm then it has a local deterministic algorithm. It is undecidable, in general, whether a given LCL has a local algorithm. However, it is decidable whether a given LCL has an algorithm that operates in a given time t. Any LCL problem that has a local algorithm has one that is order-invariant (the algorithm depends only on the order of the processor id's).
Reaching agreement is a primitive of distributed computing. While this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: a system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer, Lynch and Paterson [FLP] have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper we extend their work, identifying several critical system parameters, including various synchronicity conditions, and examine how varying these affects the number of faults which can be tolerated. Our proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.