Abstract-We study distributed optimization problems when N nodes minimize the sum of their individual costs subject to a common vector variable. The costs are convex, have Lipschitz continuous gradient (with constant L), and bounded gradient. We propose two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establish their convergence rates in terms of the per-node communications K and the per-node gradient evaluations k. Our first method, Distributed Nesterov Gradient, achieves rates O (log K/K) and O (log k/k).
Abstract-We study distributed optimization where nodes cooperatively minimize the sum of their individual, locally known, convex costs fi(x)'s, x ∈ R d is global. Distributed augmented Lagrangian (AL) methods have good empirical performance on several signal processing and learning applications, but there is limited understanding of their convergence rates and how it depends on the underlying network. This paper establishes globally linear (geometric) convergence rates of a class of deterministic and randomized distributed AL methods, when the fi's are twice continuously differentiable and have a bounded Hessian. We give explicit dependence of the convergence rates on the underlying network parameters. Simulations illustrate our analytical findings.
Recently, there has been significant progress in the development of distributed first order methods. (At least) two different types of methods, designed from very different perspectives, have been proposed that achieve both exact and linear convergence when a constant step size is used -a favorable feature that was not achievable by most prior methods. In this paper, we unify, generalize, and improve convergence speed of these exact distributed first order methods. We first carry out a novel unifying analysis that sheds light on how the different existing methods compare. The analysis reveals that a major difference between the methods is on how a past dual gradient of an associated augmented Lagrangian dual function is weighted. We then capitalize on the insights from the analysis to derive a novel method -with a tuned past gradient weightingthat improves upon the existing methods. We establish for the proposed generalized method global R-linear convergence rate under strongly convex costs with Lipschitz continuous gradients.
We study distributed optimization in networked systems, where nodes cooperate to find the optimal quantity of common interest, x = x . The objective function of the corresponding optimization problem is the sum of private (known only by a node,) convex, nodes' objectives and each node imposes a private convex constraint on the allowed values of x. We solve this problem for generic connected network topologies with asymmetric random link failures with a novel distributed, decentralized algorithm. We refer to this algorithm as AL-G (augmented Lagrangian gossiping,) and to its variants as AL-MG We prove convergence for all proposed algorithms and demonstrate by simulations the effectiveness on two applications: l 1 -regularized logistic regression for classification and cooperative spectrum sensing for cognitive radio networks.
We study, by large deviations analysis, the asymptotic performance of Gaussian running consensus based distributed detection over random networks; in other words, we determine the exponential decay rate of the detection error probability. With running consensus, at each time step, each sensor averages its decision variable with the neighbors decision variables and accounts on-the-fly for its new observation.We show that: 1) when the rate of network information flow (the speed of averaging) is above a threshold, then Gaussian running consensus is asymptotically equivalent to the optimal centralized detector, i.e., the exponential decay rate of the error probability for running consensus equals the Chernoff information;and 2) when the rate of information flow is below a threshold, running consensus achieves only a fraction of the Chernoff information rate. We quantify this achievable rate as a function of the network rate of information flow. Simulation examples demonstrate our theoretical findings on the behavior of running consensus based detection over random networks.
We design the weights in consensus algorithms with spatially correlated random topologies. These arise with: 1) networks with spatially correlated random link failures and 2) networks with randomized averaging protocols. We show that the weight optimization problem is convex for both symmetric and asymmetric random graphs. With symmetric random networks, we choose the consensus mean squared error (MSE) convergence rate as optimization criterion and explicitly express this rate as a function of the link formation probabilities, the link formation spatial correlations, and the consensus weights. We prove that the MSE convergence rate is a convex, nonsmooth function of the weights, enabling global optimization of the weights for arbitrary link formation probabilities and link correlation structures. We extend our results to the case of asymmetric random links. We adopt as optimization criterion the mean squared deviation (MSdev) of the nodes' states from the current average state. We prove that MSdev is a convex function of the weights. Simulations show that significant performance gain is achieved with our weight design method when compared with methods available in the literature.
We establish the large deviations asymptotic performance (error exponent) of consensus+innovations distributed detection over random networks with generic (non-Gaussian) sensor observations. At each time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and2) assimilate their new observations (innovations). This paper shows for general non-Gaussian distributions that consensus+innovations distributed detection exhibits a phase transition behavior with respect to the network degree of connectivity. Above a threshold, distributed is as good as centralized, with the same optimal asymptotic detection performance, but, below the threshold, distributed detection is suboptimal with respect to centralized detection. We determine this threshold and quantify the performance loss below threshold. Finally, we show the dependence of the threshold and performance on the distribution of the observations: distributed detectors over the same random network, but with different observations' distributions, for example, Gaussian, Laplace, or quantized, may have different asymptotic performance, even when the corresponding centralized detectors have the same asymptotic performance. the network connectivity | log r|, the optimal detector threshold is γ = 0, mimicking the (asymptotically) optimal threshold for the centralized detector. However, below the critical connectivity, we show by a numerical example that the optimal distributed detector threshold might be non zero.Brief review of the literature. Distributed detection has been extensively studied, in the context of parallel fusion architectures, e.
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -dubbed DQN-0, DQN-1 and DQN-2 -which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods.Here, J logis (·) is the logistic loss J logis (z) = log(1 + e −z ), and τ is a positive regularization parameter. Note that, in this example, we have
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.