2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00302
|View full text |Cite
|
Sign up to set email alerts
|

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(25 citation statements)
references
References 22 publications
0
25
0
Order By: Relevance
“…A limitation of the D 2 algorithm is that it is not clear how it can be applied to time-varying topologies, and that it can only be used on constant mixing topologies with negative eigenvalue bounded from below by − 1 3 . Other authors proposed algorithms that perform well on heterogeneous DL tasks [25,59], but theoretical proofs that these algorithms are independent of the degree of heterogeneity are still pending.…”
Section: Pcεmentioning
confidence: 99%
“…A limitation of the D 2 algorithm is that it is not clear how it can be applied to time-varying topologies, and that it can only be used on constant mixing topologies with negative eigenvalue bounded from below by − 1 3 . Other authors proposed algorithms that perform well on heterogeneous DL tasks [25,59], but theoretical proofs that these algorithms are independent of the degree of heterogeneity are still pending.…”
Section: Pcεmentioning
confidence: 99%
“…Communication efficiency. When the network topology is sparse (e.g., a ring or a one-peer exponential graph [3], [33]), each partial averaging step (5) incurs O(1) latency and O(1) transmission time (the inverse of bandwidth), which are independent of n. Since each node only synchronizes with its direct neighbors, there is low synchronization overhead. effective in aggregating information than global averaging, some decentralized algorithms can match or exceed the performance of global-averaging-based distributed algorithms: [1], [29] established that decentralized SGD can achieve the same asymptotic linear speedup in convergence rate as (parameter server based) distributed SGD; [3], [33] used exponential graph topologies to realize both efficient communication and effective aggregation by partial averaging; [37], [38], [31], [39] improved the convergence rate of decentralized SGD by removing data heterogeneity between nodes; [40], [4], [30], [41] enhanced the effectiveness of partial averaging by periodically calling global averaging.…”
Section: A Concepts and Theoretical Foundationsmentioning
confidence: 99%
“…They can, however, obtain the same results through local dynamics, namely, a series of computation and agent-to-agent direct communication steps. On large-scale optimization tasks involving distributed datasets, recent decentralized computational methods have shown strong, sometimes superior, performance [1], [2], [3], [4], [5].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations