Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Xu, Jinming; Tian, Ye; Sun, Ying; Scutari, Gesualdo

doi:10.48550/arxiv.2002.11534

Cited by 5 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To achieve a faster convergence rate, we are motivated by the state-of-the-art NIDS [7] that has a linear convergence rate O(max{ L µ , 1 ρ } log 1 ) to find an -optimal solution of (1), i.e., x i − x 2 ≤ , ∀i ∈ V [37]. NIDS can be written compactly as follows (6) where W = 1 2 (I + W ) and the first iteration is initialized as X 1 = X 0 − γ∇F (X 0 ).…”

Section: B Development Of Coldmentioning

confidence: 99%

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Zhang¹,

You²,

Xie³

2021

Preprint

View full text Add to dashboard Cite

Information compression is essential to reduce communication cost in distributed optimization over peer-to-peer networks. This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems. By compressing innovation vectors, which are the differences between decision vectors and their estimates, COLD is able to achieve linear convergence for a class of δ-contracted compressors. We explicitly quantify how the compression affects the convergence rate and show that COLD matches the same rate of its uncompressed version. To accommodate a wider class of compressors that includes the binary quantizer, we further design a novel dynamical scaling mechanism and obtain the linearly convergent Dyna-COLD. Importantly, our results strictly improve existing results for the quantized consensus problem. Numerical experiments demonstrate the advantages of both algorithms under different compressors.

show abstract

Section: B Development Of Coldmentioning

confidence: 99%

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Zhang¹,

You²,

Xie³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent works [11][12][13][14] achieve robustness to heterogeneous environments by leveraging certain decentralized bias-correction techniques such as EXTRA (type) [15][16][17][18], gradient tracking [19][20][21][22][23][24][25], and primal-dual principles [16,[26][27][28]. Built on top of these bias-correction techniques, very recent works [29] and [30] propose D-GET and D-SPIDER-SFO respectively that further incorporate online SARAH/SPIDER-type variance reduction schemes [31][32][33] to achieve lower oracle complexities, when the SFO satisfies a mean-squared smoothness property.…”

Section: Related Workmentioning

confidence: 99%

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

Ran¹,

Khan²,

Kar³

2021

Preprint

View full text Add to dashboard Cite

This paper considers decentralized stochastic optimization over a network of n nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an -accurate first-order stationary point of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle that returns a noisy version of the exact gradient. In this context, we propose a novel single-loop decentralized hybrid variance-reduced stochastic gradient method, called GT-HSGD, that outperforms the existing approaches in terms of both the oracle complexity and practical implementation. The GT-HSGD algorithm implements specialized local hybrid stochastic gradient estimators that are fused over the network to track the global gradient. Remarkably, GT-HSGD achieves a network-independent oracle complexity of O(n −1 −3 ) when the required error tolerance is small enough, leading to a linear speedup with respect to the centralized optimal online variance-reduced approaches that operate on a single node. Numerical experiments are provided to illustrate our main technical results.

show abstract

“…We would like to highlight the fact that the convergence theory of DVR decomposes nicely into several building blocks, and thus simple rates are obtained. This is not so usual for decentralized algorithms, for instance many follow-up papers were needed to obtain a tight convergence theory for EXTRA [Shi et al, 2015, Jakovetić, 2018, Xu et al, 2020, Li and Lin, 2020. We now discuss the convergence rate of DVR more in details.…”

Section: Distributed Implementationmentioning

confidence: 99%

“…Decentralized adaptations of gradient descent in the smooth and strongly convex setting include EXTRA [Shi et al, 2015], DIGing [Nedic et al, 2017] or NIDS [Li et al, 2019]. These algorithms have sparked a lot of interest, and the latest convergence results [Jakovetić, 2018, Xu et al, 2020, Li and Lin, 2020 show that EXTRA and NIDS require time O((κ b + γ −1 )(m + τ )) log(ε −1 )) to reach precision ε. A generic acceleration of EXTRA using Catalyst [Li and Lin, 2020] obtains the (batch) optimal O( √ κ b (1 + τ / √ γ) log(ε −1 )) rate up to log factors.…”

Section: Introductionmentioning

confidence: 99%

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Hendrikx,

Bach,

Massoulié

2020

Preprint

View full text Add to dashboard Cite

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a 1/n fraction of the dataset, where n is the number of nodes.To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.

show abstract

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Cited by 5 publications

References 23 publications

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Contact Info

Product

Resources

About