2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 2018
DOI: 10.1109/dasc/picom/datacom/cyberscitec.2018.000-4
|View full text |Cite
|
Sign up to set email alerts
|

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Abstract: Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution and stochastic gradient descent (SGD), but the running performance of different frameworks might be different even running the same deep model on the same GPU hardware. In this study, we evaluate the running performance of four stateof-the-art distributed deep learning framewo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 90 publications
(61 citation statements)
references
References 26 publications
0
51
0
Order By: Relevance
“…Other models limit the degree of communication to less frequent synchronization points while allowing the individual models to temporarily diverge. Gossip Learning [139] is built around the idea that models are mobile and perform independent random walks through the peer-to-peer network. Since this forms a data-and model-parallel processing framework, the models evolve dierently and need to be combined through ensembling.…”
Section: Topologiesmentioning
confidence: 99%
See 1 more Smart Citation
“…Other models limit the degree of communication to less frequent synchronization points while allowing the individual models to temporarily diverge. Gossip Learning [139] is built around the idea that models are mobile and perform independent random walks through the peer-to-peer network. Since this forms a data-and model-parallel processing framework, the models evolve dierently and need to be combined through ensembling.…”
Section: Topologiesmentioning
confidence: 99%
“…Shi and Chu [139] shows Tensorow achieving about 50% eciency on 4-node, InniBandconnected cluster training of ResNet-50He et al [68], and about 75% eciency on GoogleNet [148], showing that the communication overhead plays an important role, and also depends on architecture of the neural network to optimize.…”
Section: Dianne (Distributed Articial Neural Network)mentioning
confidence: 99%
“…Guignard et al [21] presented detailed characterization results of a set of archetypal state-of-the-art DL workloads to identify the performance bottlenecks and to guide the design of prospective acceleration platforms in a more effective manner. Shi et al [36] evaluated the performance of four state-of-the-art distributed DL frameworks over different GPU hardware environments. They built performance models of standard processes in training DNNs with SGD, and then benchmark the performance of the frameworks with three neural networks (i.e., AlexNet, GoogleNet and ResNet-50).…”
Section: Performance Measurement Of Deep Learningmentioning
confidence: 99%
“…Our study aims to understand the training performance of cheap transient servers that have dynamic availability, revocation patterns, and unit costs. In addition, these previous studies often focus on measuring training speed using the average time to process one minibatch [25], [26], [36]. While in this work, we consider multiple important performance metrics-including training time, cost, and accuracy-that could be impacted by training on transient servers.…”
Section: Related Workmentioning
confidence: 99%