2007 IEEE International Parallel and Distributed Processing Symposium 2007
DOI: 10.1109/ipdps.2007.370405
|View full text |Cite
|
Sign up to set email alerts
|

Bandwidth Efficient All-reduce Operation on Tree Topologies

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(30 citation statements)
references
References 11 publications
(22 reference statements)
0
30
0
Order By: Relevance
“…AllReduce methods [12,33,34] also lack the flexibility to tackle straggler issues, which is more challenging than PS architecture due to the more restrictive communication pattern between workers. There are some works focusing on straggler issues in AllReduce [29,30], but their methods may lead to deadlocks and may not be able to deal with complex straggler patterns, such as transient stragglers.…”
Section: Stragglers In Distributed Model Trainingmentioning
confidence: 99%
“…AllReduce methods [12,33,34] also lack the flexibility to tackle straggler issues, which is more challenging than PS architecture due to the more restrictive communication pattern between workers. There are some works focusing on straggler issues in AllReduce [29,30], but their methods may lead to deadlocks and may not be able to deal with complex straggler patterns, such as transient stragglers.…”
Section: Stragglers In Distributed Model Trainingmentioning
confidence: 99%
“…For the AR architecture, there is no process dedicated just for holding variables, as shown in Figure 1(b). Rather, all workers are given a replica of variables and share locally computed gradients via collective communication primitives such as AllReduce [25,30] and AllGatherv [39]. AllReduce reduces values from all processes to a single value, while AllGatherv simply gathers the values from all processes.…”
Section: Data Parallel Training Architecturesmentioning
confidence: 99%
“…AllReduce is commonly implemented with either tree [1] or butterfly [14] topologies, as shown in Figure 1a and 1b. The tree topology uses the lowest overall bandwidth, but effectively maximizes latency since the delay is set by the slowest path in the tree.…”
Section: Introductionmentioning
confidence: 99%