View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented group communication systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This survey provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over 30 published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis, and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the presented specifications. This survey is aimed at both system builders and theoretical researchers. The specification framework presented in this article will help builders of group communication systems understand and specify their service semantics; the extensive
We investigate the problem of designing a scalable overlay network to support decentralized topic-based pub/sub communication. We introduce a new optimization problem, called Minimum Topic-Connected Overlay (Min-TCO), that captures the tradeoff between the scalability of the overlay (in terms of the nodes' fanout) and the message forwarding overhead incurred by the communicating parties. Roughly, the Min-TCO problem is as follows: Given a collection of nodes and their subscriptions, connect the nodes using the minimum possible number of edges so that for each topic t, a message published on t could reach all the nodes interested in t by being forwarded by only the nodes interested in t.We show that the decision version of Min-TCO is NPcomplete, and present a polynomial algorithm that approximates the optimal solution within a logarithmic factor with respect to the number of edges in the constructed overlay. We further prove that this approximation ratio is almost tight by showing that no polynomial algorithm can approximate Min-TCO within a constant factor (unless P=NP). We show experimentally that on typical inputs, the fanout of the overlay constructed by our approximation algorithm is significantly lower than that of the overlays built by the existing algorithms, and that its running time is just a small fraction of the analytical worst case bound. As Min-TCO can be shown to capture several important aspects of most known overlay-based pub/sub implementations, our study sheds light on the inherent limitations of the existing systems and provides an insight into the best possible feasible solution.Finally, we introduce a flexible framework that generalizes Min-TCO and formalizes most similar overlay design problems that occur in scalable pub/sub systems. We also briefly discuss several examples of such problems, and show some results with respect to their complexity.
We present Byzantine Disk Paxos, an asynchronous sharedmemory consensus protocol that uses a collection of n > 3t disks, t of which may fail by becoming non-responsive or arbitrarily corrupted. We give two constructions of this protocol; that is, we construct two different building blocks, each of which can be used, along with a leader oracle, to solve consensus. One building block is a shared wait-free safe register. The second building block is a regular register that satisfies a weaker termination (liveness) condition than wait freedom: its write operations are wait-free, whereas its read operations are guaranteed to return only in executions with a finite number of writes. We call this termination condition finite writes (FW), and show that consensus is solvable with FW-terminating registers and a leader oracle. We construct each of these reliable registers from n > 3t base registers, t of which can be non-responsive or Byzantine. All the previous wait-free constructions in this model used at least 4t + 1 fault-prone registers, and we are not familiar with any prior FW-terminating constructions in this model.
The 2008 LADIS workshop on Large Scale Distributed Systems brought together leaders from the commercial cloud computing community with researchers working on a variety of topics in distributed computing. The dialog yielded some surprises: some hot research topics seem to be of limited near-term importance to the cloud builders, while some of their practical challenges seem to pose new questions to us as systems researchers. This brief note summarizes our impressions.
We present a simple and efficient protocol for mutual exclusion in synchronous, message-passing distributed systems subject to failures.
Group communication is a useful abstraction in the development of highly available distributed and communication-oriented applications in wide area networks (WANs). The most important aspects of this abstraction are the dynamic maintenance of group membership and its diverse semantics for interleaving membership change noti cations within the ow of regular messages. In this paper we propose a new architecture for a scalable group membership service for wide area environments. Our architecture provides two di erent service levels and their semantics, each geared to di erent applications with di erent needs: The congress membership service which provides simple semantics of membership approximation, and the moshe service, which extends congress, provides full virtual synchrony semantics. The novelty of our design is in its client-server approach, which allows lightweight clients to bene t from advanced membership services. Furthermore, our design supports the coexistence of full-edged clients along with thin clients.
We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.