DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Li, Boyue; Li, Zhize; Chi, Yuejie

doi:10.48550/arxiv.2110.01165

Cited by 2 publications

(3 citation statements)

References 12 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among them, gradient tracking (Qu and Li, 2017;Di Lorenzo and Scutari, 2016;Nedic et al, 2017), which applies the idea of dynamic average consensus (Zhu and Martínez, 2010) to global gradient estimation, provides a systematic approach to reduce the variance and has been successfully applied to decentralize many algorithms with faster rates of convergence (Li et al, 2020a;Sun et al, 2019). For nonconvex problems, a small sample of gradient tracking aided algorithms include GT-SAGA (Xin et al, 2021), D-GET (Sun et al, 2020), GT-SARAH (Xin et al, 2020), and DESTRESS (Li et al, 2021a). Our BEER algorithm also leverages gradient tracking to eliminate the strong bounded gradient and bounded dissimilarity assumptions.…”

Section: Assumptionsmentioning

confidence: 99%

See 1 more Smart Citation

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

Zhao¹,

Li²,

Li³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence O((G/T ) 2/3 ) compared with their uncompressed counterpart, where G measures the data heterogeneity across different clients, and T is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O(1/T ). This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.

show abstract

Section: Assumptionsmentioning

confidence: 99%

“…Note that Theorem 4.2 is a strict generalization of Theorem 4.1, and thus we will directly prove Theorem 4.2. This proof makes use of Lemma B.3 and Lemma B.4, by constructing some proper Lyapunov function and demonstrate its descending property using a linear system argument, which is also used in, e.g.,Li et al (2021a);Liao et al (2021).…”

mentioning

confidence: 99%

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

Zhao¹,

Li²,

Li³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For instance, Sun et al (2020) employ a scheme with both gradient-tracking and variancereduction to solve a smooth (probably non-convex) problem and show that it converges to a stationary point sublinearly. Li et al (2021) proposed a similar algorithm with a nested loop structure for the sake of improving its overall complexity. Xin et al (2020) and Jiang et al (2022) consider a similar GT-VR framework and obtain a linear rate for strongly convex problems and O (1/k) rate for non-convex setting, respectively.…”

Section: Introductionmentioning

confidence: 99%

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

Huang¹,

Sun²,

Zehan³

et al. 2022

Preprint

View full text Add to dashboard Cite

We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. The framework hinges on the introduction of an augmented graph consisting of nodes modeling the samples and edges modeling both the inter-device communication and intra-device stochastic gradient computation. By designing properly the topology of the augmented graph, we are able to recover as special cases the renowned Local-SGD and DSGD algorithms, and provide a unified perspective for variance-reduction (VR) and gradienttracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. We also provide a unified convergence analysis for smooth and (strongly) convex objectives relying on a proper structured Lyapunov function, and the obtained rate can recover the best known results for many existing algorithms. The rate results further reveal that VR and GT methods can effectively eliminate data heterogeneity within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution. Numerical experiments confirm the findings in this paper.

show abstract

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Cited by 2 publications

References 12 publications

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

Contact Info

Product

Resources

About