2021
DOI: 10.48550/arxiv.2110.01594
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Stochastic Proximal Gradient Framework for Decentralized Non-Convex Composite Optimization: Topology-Independent Sample Complexity and Communication Efficiency

Ran Xin,
Subhro Das,
Usman A. Khan
et al.

Abstract: Decentralized optimization is a promising parallel computation paradigm for large-scale data analytics and machine learning problems defined over a network of nodes. This paper is concerned with decentralized non-convex composite problems with population or empirical risk. In particular, the networked nodes are tasked to find an approximate stationary point of the average of local, smooth, possibly non-convex risk functions plus a possibly non-differentiable extended valued convex regularizer. Under this gener… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(20 citation statements)
references
References 49 publications
1
14
0
Order By: Relevance
“…To measure the non-stationarity in Problem (2), one should not only consider the stationarity violation at each node but also the consensus errors over the network. Therefore, Xin et al [2021a] and Mancino-Ball et al [2022] define an -stationary point…”
Section: Notion Of Stationaritymentioning
confidence: 99%
See 2 more Smart Citations
“…To measure the non-stationarity in Problem (2), one should not only consider the stationarity violation at each node but also the consensus errors over the network. Therefore, Xin et al [2021a] and Mancino-Ball et al [2022] define an -stationary point…”
Section: Notion Of Stationaritymentioning
confidence: 99%
“…Wang et al [2021] proposes SPPDM, which uses a proximal primal-dual approach to achieve O( −2 ) sample complexity. ProxGT-SA and ProxGT-SR-O [Xin et al, 2021a] incorporate stochastic gradient tracking and multi-consensus update in proximal gradient methods and obtain O(n −1 −2 ) and O(n −1 −1.5 ) sample complexity respectively, where the latter further uses a SARAH type variance Table 1: Comparison of decentralized proximal gradient based algorithms to find an -stationary solution to stochastic composite optimization in the nonconvex setting. The sample complexity is defined as the number of required samples per agent to obtain an -stationary point (see Definition 1).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To avoid this inaccuracy while keeping linear convergence, recent work [28], [29], [35]- [40] propose a gradient tracking technique that allows each node to estimate the global gradient with only local communications. Of note are also distributed stochastic problems where gradient tracking is combined with variance reduction to achieve state-of-the-art results for several different classes of problems [38], [41]- [46].…”
Section: A Related Workmentioning
confidence: 99%
“…When the network topology is sparse (e.g., a ring or a one-peer exponential graph [3], [33]), each partial averaging step (5) incurs O(1) latency and O(1) transmission time (the inverse of bandwidth), which are independent of n. Since each node only synchronizes with its direct neighbors, there is low synchronization overhead. effective in aggregating information than global averaging, some decentralized algorithms can match or exceed the performance of global-averaging-based distributed algorithms: [1], [29] established that decentralized SGD can achieve the same asymptotic linear speedup in convergence rate as (parameter server based) distributed SGD; [3], [33] used exponential graph topologies to realize both efficient communication and effective aggregation by partial averaging; [37], [38], [31], [39] improved the convergence rate of decentralized SGD by removing data heterogeneity between nodes; [40], [4], [30], [41] enhanced the effectiveness of partial averaging by periodically calling global averaging. BlueFog can implement all these algorithms including those use global averaging.…”
Section: A Concepts and Theoretical Foundationsmentioning
confidence: 99%