An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs

Rogozin, Alexander; Bochko, Mikhail; Dvurechensky, Pavel; Gasnikov, Alexander; Lukoshkin, Vladislav

doi:10.48550/arxiv.2103.15598

Cited by 3 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For saddle-point problems, by replacing in the Smoothing scheme the batched-consensus Accelerated gradient method [56], which is optimal for decentralized convex problems, with the batched-consensus Extragradient method [7], which is optimal for decentralized convex-concave saddle-point problems, we loose ∼ √ d-factor in the number of communication rounds in comparison with optimal gradientfree methods for non-smooth decentralized saddle-point problems. To sum up, in distributed optimization, for the first time, we have a situation where the Smoothing scheme generates a non-optimal method from an optimal one.…”

Section: Distributed Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems

Gasnikov¹,

Novitskii²,

Novitskii³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Gradient-free/zeroth-order methods for black-box convex optimization have been extensively studied in the last decade with the main focus on oracle calls complexity. In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for non-smooth convex optimization problems. Our approach not only leads to optimal oracle complexity, but also allows to obtain iteration complexity similar to first-order methods, which, in turn, allows to exploit parallel computations to accelerate the convergence of our algorithms. We also elaborate on extensions for stochastic optimization problems, saddle-point problems, and distributed optimization.1 Note that, for most of the algorithms in this paper, we can make these assumptions only on the intersection of Qγ and the ball x 0 + B d p (R) for some p ∈ [1, 2], where x 0 is the starting point of the algorithm and R = O x 0 − x * p ln d with x * being a solution of (1) closest to x 0 [32].

show abstract

Section: Distributed Optimizationmentioning

confidence: 99%

“…surveys [34,17]. In particular, there exists a batched-consensus-projected Accelerated gradient method [56] that, for µ-strongly convex in 2-norm f from (11) with L-Lipschitz gradient in 2-norm, requires…”

Section: Distributed Optimizationmentioning

confidence: 99%

The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems

Gasnikov¹,

Novitskii²,

Novitskii³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…It was shown in [46] that to obtain ǫ-optimal solutions, the gradient computation complexity is lower bounded by O √ κ log 1 ǫ , and the communication complexity is lower bounded by O κ θ log 1 ǫ . To obtain better complexities, many accelerated decentralized gradient-type methods have been developed (e.g., [11,12,16,18,20,21,22,23,24,38,41,42,43,46,54,57,61,62]). There exist dual-based methods such as [46] that achieve optimal complexities.…”

mentioning

confidence: 99%

“…In this paper, we focus on dual-free methods or gradient-type methods only. Some algorithms, for instance [16,22,23,41,42,61,62], rely on inner loops to guarantee desirable convergence rates. However, inner loops place a larger communication burden [24,38] which may limit the applications of these methods, since communication has often been recognized as the major bottleneck in distributed or decentralized optimization.…”

mentioning

confidence: 99%

“…Gradient tracking (GT) is one of the most popular techniques used for developing accelerated decentralized optimization methods. Most of the existing GT-based accelerated methods rely on inner loops of multiple consensus steps or Chebyshev acceleration (CA) to reduce the consensus errors (the difference among the local variables of different agents) between consecutive steps of the outer loop; see for instance [16,22,23,41,42,61,62]. Among single-loop methods, Acc-DNGD-SC [38] achieves O κ 5/7 θ 3/2 log 1 ǫ gradient computation and communication complexities.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Optimal Gradient Tracking for Decentralized Optimization

Song

Shi

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we focus on solving the decentralized optimization problem of minimizing the sum of n objective functions over a multi-agent network. The agents are embedded in an undirected graph where they can only send/receive information directly to/from their immediate neighbors. Assuming smooth and strongly convex objective functions, we propose an Optimal Gradient Tracking (OGT) method that achieves the optimal gradient computation complexity O √ κ log 1 ǫ and the optimal communication complexity O

show abstract

A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation

Zhang¹,

Pananjady²,

Romberg³

2022

Preprint

View full text Add to dashboard Cite

Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed optimization problem that involves averaging several stochastic, online observations on a network. We design a dual-based method for this consensus problem with Polyak-Ruppert averaging and analyze its behavior. We show that this algorithm attains an accelerated deterministic error depending optimally on the condition number of the network, and also that it has order-optimal stochastic error. This improves on the guarantees of state-ofthe-art distributed optimization algorithms when specialized to this setting, and yields-among other things-corollaries for decentralized policy evaluation. Our proofs rely on explicitly studying the evolution of several relevant linear systems, and may be of independent interest. Numerical experiments are provided, which validate our theoretical results and demonstrate that our approach outperforms existing methods in finite-sample scenarios on several natural network topologies.

show abstract

An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs

Cited by 3 publications

References 0 publications

The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems

The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems

Optimal Gradient Tracking for Decentralized Optimization

A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation

Contact Info

Product

Resources

About