Proceedings of the 27th ACM Symposium on Operating Systems Principles 2019
DOI: 10.1145/3341301.3359642
|View full text |Cite
|
Sign up to set email alerts
|

A generic communication scheduler for distributed DNN training acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
151
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 249 publications
(167 citation statements)
references
References 10 publications
1
151
0
Order By: Relevance
“…E.g., ResNet-50 and ResNet-152 [25] models that achieve up to 75% accuracy in classifying images [19] are 100MB and 240MB in size respectively. For such large models, it is well known [39] that the overall training time is dominated by communication time taken to share updates among many parallel workers; for ResNet-50 model trained with 30 GPUs (NVIDIA P100), we observe that per-iteration communication time (320 ) is 3× the gradient computation time (100 ).…”
Section: Dml Performance Analysismentioning
confidence: 93%
See 4 more Smart Citations
“…E.g., ResNet-50 and ResNet-152 [25] models that achieve up to 75% accuracy in classifying images [19] are 100MB and 240MB in size respectively. For such large models, it is well known [39] that the overall training time is dominated by communication time taken to share updates among many parallel workers; for ResNet-50 model trained with 30 GPUs (NVIDIA P100), we observe that per-iteration communication time (320 ) is 3× the gradient computation time (100 ).…”
Section: Dml Performance Analysismentioning
confidence: 93%
“…Algorithms: We evaluate the following: PS-based asynchronous and synchronous variants of MLfabric, or MLfabric-A and MLfabric-S, respectively; vanilla PS-based asynchronous (Async); and AllReduce-based (using NCCL library) synchronous algorithms using ring-reduce communication (RR-Sync). We also compare with other state-of-the-art approaches: (1) SwitchML [42], where aggregation of gradient updates happen on P4 [37] network switches as opposed to end hosts, and (2) BytePS [39], an alternate communication library based on parameter server architecture.…”
Section: Datasets and ML Modelsmentioning
confidence: 99%
See 3 more Smart Citations