2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC) 2016
DOI: 10.1109/mlhpc.2016.006
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Abstract: Abstract-This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neural Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

5
65
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 74 publications
(70 citation statements)
references
References 21 publications
5
65
0
Order By: Relevance
“…Redundancy reduction can also be seen in the context of distributed systems. Training DNNs on such systems is an active field of research [11,12,13]. A problem is the transfer of the weight updates in the form of gradients between the different nodes.…”
Section: Related Workmentioning
confidence: 99%
“…Redundancy reduction can also be seen in the context of distributed systems. Training DNNs on such systems is an active field of research [11,12,13]. A problem is the transfer of the weight updates in the form of gradients between the different nodes.…”
Section: Related Workmentioning
confidence: 99%
“…The compute complexity is high; medium sized experiments and popular benchmarks can take days to run [8], severely compromising the productivity of the data scientist. Distributed scaling stalls only after a dozen nodes due to locking, messaging, synchronization and data locality issues.…”
Section: Challenges With Machine Learningmentioning
confidence: 99%
“…Gradient descent optimization is an indispensable element of solving many realworld problems including but not limited to training deep neural networks [14,19]. Because of its inherent sequentiality it is also particularly difficult to parallelize [17]. Recently a number of advances in developing distributed versions of gradient descent algorithms have been made [15,11,38,39,36].…”
Section: Introductionmentioning
confidence: 99%