IEEE INFOCOM 2019 - IEEE Conference on Computer Communications 2019
DOI: 10.1109/infocom.2019.8737602
|View full text |Cite
|
Sign up to set email alerts
|

The Role of Network Topology for Distributed Machine Learning

Abstract: Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantify how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
40
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(41 citation statements)
references
References 13 publications
(39 reference statements)
1
40
0
Order By: Relevance
“…Narrowing its focus to node capabilities, [5] aims at choosing the set of learning nodes that results in the shortest learning time, solving a double-edged conundrum. On the one hand, more nodes mean that convergence can be reached in fewer iterations; on the other hand, the duration of each iteration is determined by the slowest node [7]. In a similar spirit, [2] addresses the problem of jointly selecting the learning nodes to use for the learning process and assigning them the wireless resources they need to communicate effectively.…”
Section: Federated Learning In Edge Scenariosmentioning
confidence: 99%
See 3 more Smart Citations
“…Narrowing its focus to node capabilities, [5] aims at choosing the set of learning nodes that results in the shortest learning time, solving a double-edged conundrum. On the one hand, more nodes mean that convergence can be reached in fewer iterations; on the other hand, the duration of each iteration is determined by the slowest node [7]. In a similar spirit, [2] addresses the problem of jointly selecting the learning nodes to use for the learning process and assigning them the wireless resources they need to communicate effectively.…”
Section: Federated Learning In Edge Scenariosmentioning
confidence: 99%
“…Following an orthogonal, more theoretical approach, several works [7], [10] aim at characterizing the learning performance, deriving closed-form expressions for their (expected) training time. Such a characterization is then exploited to make optimal or near-optimal decisions on the cooperation among nodes [7] and the equilibrium between local learning and global updates [10].…”
Section: Federated Learning In Edge Scenariosmentioning
confidence: 99%
See 2 more Smart Citations
“…One approach, followed in [4], [5], is to assign more resources (e.g., radio resource blocks) to the nodes that need them the most (e.g., experience connectivity issues), so as to avoid performance bottlenecks. Another option is simply to drop overly-slow nodes ("stragglers") from the learning process, thus making individual iterations faster [3], [4], [10].…”
Section: A Client Selection and Model Weighting In Flmentioning
confidence: 99%