P. M. Melliar-Smith scite author profile

Algorithms are described for maintaining clock synchrony in a distributed multiprocess system where each process has its own clock. These algorithms work in the presence of arbitrary clock or process failures, including “two-faced clocks” that present different values to different processes. Two of the algorithms require that fewer than one-third of the processes be faulty. A third algorithm works if fewer than half the processes are faulty, but requires digital signatures.

show abstract

The Totem single-ring ordering and membership protocol

Amir

Moser

Melliar-Smith

et al. 1995

ACM Trans. Comput. Syst.

226

195

View full text Add to dashboard Cite

Fault-tolerant distributed systems are becoming more important, but in existing systems, maintaining the consistency of replicated data is quite expensive. The Totem single-ring protocol supports consistent concurrent operations by placing a total order on broadcast messages. This total order is derived from the sequence number in a token that circulates around a logical ring imposed on a set of processors in a broadcast domain. The protocol handles reconfiguration of the system when processors fail and restart or when the network partitions and remerges. Extended virtual synchrony ensures that processors deliver messages and configuration changes to the application in a consistent, systemwide total order. An effective flow control mechanism enables the Totem single-ring protocol to achieve message-ordering rates significantly higher than the best prior total-ordering protocols.

show abstract

Totem

et al. 1996

View full text Add to dashboard Cite

Many applications can benefit from distributed systems based on multiple computers interconnected by a communication network. Distributed systems use inexpensive high-performance computers and can be configured closely to the application. Information can be replicated on several processors to improve performance and to provide fault tolerance. However, programming distributed applications is difficult, particularly when replicated information must remain consistent as it is updated in the presence of faults. Since many messages may be required, recovery from faults may introduce delays, making real-time performance objectives difficult to achieve.Ordered multicast group communication systems are a useful infrastructure on which complex distributed applications can be built.

show abstract

Extended virtual synchrony

et al.

View full text Add to dashboard Cite

SIFT: Design and analysis of a fault-tolerant computer for aircraft control

Wensley

Lamport²,

Goldberg³

et al. 1978

Proc. IEEE

460

View full text Add to dashboard Cite

A program structure for error detection and recovery

et al.

View full text Add to dashboard Cite

Broadcast protocols for distributed systems

Melliar-Smith

Moser

Agrawala

1990

IEEE Trans. Parallel Distrib. Syst.

209

View full text Add to dashboard Cite

We present an innovative approach to the design of faultprocessors agree on exactly the same sequence of broadcast tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The messages. approach is based on broadcast communication over a local area It is easy to demonstrate that placing a total order on network, such as an Ethernet or a token ring, and on two novel protocols, broadcast messages, so that every working processor procthe Tram protocol, which provides efficient reliable broadcast communi-esses the same messages in the same order, provides an cation, and the Total protocol, which with high probability promptly immediate solution to the agreement problem. Once this total places a total order on messages and achieves distributed agreement even in the presence of fail-stoo. omission. timing, and communication faults. order is determined, distributed actions can be carried out Reliable distributed operations such as locking, update and commitment, using simple sequential fault-tolerant algorithms. The strategy typically require only a single broadcast message rather than the several is very efficient: for example, locking records in a distributed tens of messages required by current algorithms. database typically requires only a single broadcast message to claim a lock and a single broadcast message to release it.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

P. M. Melliar-Smith

An analysis of the optimum node density for ad hoc mobile networks

Synchronizing clocks in the presence of faults

The Totem single-ring ordering and membership protocol

Totem

Extended virtual synchrony

SIFT: Design and analysis of a fault-tolerant computer for aircraft control

A program structure for error detection and recovery

Broadcast protocols for distributed systems

Contact Info

Product

Resources

About