We propose an algorithm for average consensus over a directed graph which is both fully asynchronous and robust to unreliable communications. We show its convergence to the average, while allowing for slowly growing but potentially unbounded communication failures. I. INTRODUCTIONConsider a set of agents, whose goal is to reach consensus by exchanging information locally with their neighbors through a directed graph. There is a large body of work on consensus algorithms. Ordinary consensus has been shown to converge asymptotically under various scenarios such as growing intercommunicating intervals [1], presence of delays and/or unbounded intercommunication intervals [2]. Another problem of interest for which extensive research has been carried out is average consensus. While most related works study asymptotic convergence, [3] studies average consensus in a finite number of steps. Push-sum is one of the many algorithms for average consensus that was first proposed by [4]. This algorithm has been widely used to develop protocols that reach average consensus, under different assumptions and scenarios; such as the presence of bounded delays [5], time varying graphs [6][7], or asynchronous communication [8].Since reliable communication is a very restrictive assumption in network applications, or expensive to enforce, recent work has considered algorithms that reach consensus in a setting where communication between agents is unreliable. While in this case, push-sum might not converge to average, exponential convergence still holds and the error between the final value and the true average can be characterized [9]. In [10], Vaidya et al. introduce the technique of running sums (counters) and modify push-sum to overcome possible packet drops and imprecise knowledge of the network in a synchronous communication setting. They prove almost surely convergence of their algorithms using weak ergodicity. Inspired by [10], [11] takes this further and develops an asynchronous algorithm for average consensus, which is robust to unreliable communication. This algorithm uses a broadcast asymmetric communication protocol; that is, at each iteration only one node is allowed to wake up and transmit information to its neighbors. Exponential convergence of this algorithm is proved under bounded consecutive link failures and nodes' update delays.Consensus and average consensus have a lot of application in other algorithms as well; they can be used as a building block to develop distributed optimization algorithms [12] [13]. For example, in [14] the authors use a robust version of push-sum as a building block to develop an asynchronous Newton-based distributed optimization algorithm, robust to packet losses.A lot of available works in the literature assume bounded intercommunication intervals; which motivated us to study and explore sufficient connectivity conditions which allow intercommunication intervals to slowly grow and potentially be unbounded. We propose logarithmically growing upper bounds which guarantee convergence.
We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among N workers, who can take SGD steps and coordinate with a central server. While it is possible to obtain a linear reduction in the variance by averaging all the stochastic gradient at every step, this requires a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs Ω( √ T ) communications for T local gradient steps in order for the error to scale proportionately to 1/(N T ), this has been successively improved in a string of papers, with the state-of-the-art requiring Ω (N ( polynomial in log (T ))) communications. In this paper, we suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows. Our analysis shows that this can achieve an error that scales as 1/(N T ) with a number of communications that is completely independent of T . In particular, we show that Ω(N ) communications are sufficient. Empirical evidence suggests this bound is close to tight as we further show that √ N or N 3/4 communications fail to achieve linear speed-up in simulations. Moreover, we show that under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging which only uses a single round of communication can also achieve the optimal convergence rate asymptotically.Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.