Exponential Graph is Provably Efficient for Decentralized Deep Training

Ying, Bicheng; Yuan, Kun; Chen, Yiming; Hu, Hanbin; Pan, Pan; Yin, Wotao

doi:10.48550/arxiv.2110.13363

Cited by 1 publication

(7 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Push matrix W is also used with directed graph. 3) Standard weight matrix W satisfies both W 1 = 1 and 1 T W = 1 T and used for undirected graph, as well as special directed graphs such as the exponential graph [33]. See Fig.…”

Section: A Concepts and Theoretical Foundationsmentioning

confidence: 99%

“…When the network topology is sparse (e.g., a ring or a one-peer exponential graph [3], [33]), each partial averaging step (5) incurs O(1) latency and O(1) transmission time (the inverse of bandwidth), which are independent of n. Since each node only synchronizes with its direct neighbors, there is low synchronization overhead. effective in aggregating information than global averaging, some decentralized algorithms can match or exceed the performance of global-averaging-based distributed algorithms: [1], [29] established that decentralized SGD can achieve the same asymptotic linear speedup in convergence rate as (parameter server based) distributed SGD; [3], [33] used exponential graph topologies to realize both efficient communication and effective aggregation by partial averaging; [37], [38], [31], [39] improved the convergence rate of decentralized SGD by removing data heterogeneity between nodes; [40], [4], [30], [41] enhanced the effectiveness of partial averaging by periodically calling global averaging. BlueFog can implement all these algorithms including those use global averaging.…”

Section: A Concepts and Theoretical Foundationsmentioning

confidence: 99%

“…Prague [68] wraps all-reduce or broadcast operations, which is smart but relatively inefficient and restrictive. In particular, Prague forms a ring from a random subset of nodes and then ring-allreduces over them, thus ruling out other effective topologies such as the exponential graph [3], [33].…”

Section: B Related Workmentioning

confidence: 99%

“…Code. We set the topology as the static exponential graph in the following DGD implementation, which is established in [33] to be both sparse and well-connected. Note that such static exponential graph and its associated weight matrix W has already been implemented in the BlueFog library.…”

Section: A Partial Averaging Over Static Graphs For Linear Regressionmentioning

confidence: 99%

“…It keeps evolving during the past two years, and the progress was reported in some conferences 1 . BlueFog has provided demos for several distributed and decentralized algorithms introduced in [32], and it has supported all deep learning experiments in [4], [5], [33]. All BlueFog code and supporting documents can be found at https://github.com/Bluefog-Lib.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning

Ying¹,

Yuan²,

Hu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized algorithms for deep learning has attracted great attention. They are considered as low-communication-overhead alternatives to those using a parameter server or the Ring-Allreduce protocol. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. To fill the gap, we introduce BlueFog, a python library for straightforward, high-performance implementations of diverse decentralized algorithms. Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms, from those using a static, undirected graph for synchronous operations to those using dynamic and directed graphs for asynchronous operations. BlueFog also adopts several system-level acceleration techniques to further optimize the performance on the deep learning tasks. On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall 1.2× ∼ 1.8× speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce. BlueFog is open source at https://github.com/Bluefog-Lib/bluefog.

show abstract

Section: A Concepts and Theoretical Foundationsmentioning

confidence: 99%