This paper formulates necessary and sufficient conditions on the information required for enforcing causal ordering in a distributed system with asynchronous communication. The paper then presents an algorithm for enforcing causal message ordering. The algorithm allows a process to multicast to arbitrary and dynamically changing process groups. We show that the algorithm is optimal in the space complexity of the overhead of control information in both messages and message logs. The algorithm achieves optimality by transmitting the bare minimum causal dependency information specified by the necessity conditions, and using an encoding scheme to represent and transmit this information. We show that, in general, the space complexity of causal message ordering in an asynchronous system is Ω(n 2 ), where n is the number of nodes in the system. Although the upper bound on space complexity of the overhead of control information in the algorithm is O(n 2 ), the overhead is likely to be much smaller on the average, and is always the least possible.
Background and previous workA distributed system consists of a collection of geographically dispersed autonomous sites connected by a communication network. The sites do not share any memory and communicate solely by message passing. Message propagation delay is finite but unpredictable, and between a pair of sites, messages may be delivered out of order. There is no common physical clock.The execution of a process at a site is modeled by three types of events, namely, message send, message delivery 1 , The results of this paper appear in [10] and a brief announcement of the optimal implementation of the results appears in [11]. Correspondence to: A.D. Kshemkalyani 1 It is important to distinguish between the arrival of a message and its delivery. The arrival of a message signifies that the communication and internal events. An internal event represents a local computation at the process, whereas message send and delivery events establish cause and effect relationships among the processes. The cause and effect relationship between the events of a distributed execution is captured by the happened before or causality relation (−→) [14] which defines a partial order on the events.There exist several paradigms for ordered delivery of messages in a distributed system (see Fig. 1). Synchronous communication between processes, that tantamounts to instantaneous message delivery, simplifies the design, verification, and analysis of distributed applications. However, it results in a loss in concurrency within the distributed application because each message exchange requires a handshake between the sender and the receiver. FIFO and non-FIFO communication on each channel are asynchronous and provide much more concurrency to the distributed application, but the asynchronous execution of processes and unpredictable communication delays create nondeterminism in distributed systems that complicates the design, verification, and analysis of distributed applications. To simplify the design and d...