2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2019
DOI: 10.1109/ccgrid.2019.00064
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Abstract: The current wave of advances in Machine Learning (ML) and Deep Learning (DL) have been triggered by the availability of large-scale datasets, efficient CPU and GPU hardware, and development of easy-to-use software frameworks like TensorFlow (TF), Caffe and Torch. TensorFlow has been, by far, the most widely adopted ML/DL framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that n… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(15 citation statements)
references
References 21 publications
(19 reference statements)
0
9
0
Order By: Relevance
“…NCCL is a set of powerful collective communication primitives for GPU which has already demonstrated accelerated performance for deep learning applications [29], [5], [7], [6], [8]. However, the utilization of NCCL for NMF has been unexplored so far.…”
Section: Rationale For Pydnmf-gpumentioning
confidence: 99%
“…NCCL is a set of powerful collective communication primitives for GPU which has already demonstrated accelerated performance for deep learning applications [29], [5], [7], [6], [8]. However, the utilization of NCCL for NMF has been unexplored so far.…”
Section: Rationale For Pydnmf-gpumentioning
confidence: 99%
“…There exist a few works that specifically evaluate and/or improve the MPI CCPs for DL, for example, taking into account the special characteristics of the messages that are exchanged in this type of applications [3,4,18,23]. In addition, MPI-based software has been developed for distributed DNN training; for example, MVAPICH2-GDR 1 from Ohio State University or oneAPI 2 from Intel.…”
Section: Mpi Collective Communication Primitivesmentioning
confidence: 99%
“…There exist a number of instances of the MPI library, with some prominent examples being OpenMPI, 3 MPICH, 4 MVAPICH, 5 and Intel MPI. 6 All these implementations adhere to the functionality and specification defined by the MPI API, while distinct realizations of the standard vary in the implementation of the primitives and, quite often, the performance they attain.…”
Section: A Family Of Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…Several efforts have focused on the investigation of the behavior of deep learning applications from different perspectives: performance and power characteristics [19], scalability and fine-tuning [20], GPU optimizations [21], I/O workloads [22], [23]. However, a systematic understanding of fine-grain behavior at tensor level that explains the interplay of layer-wise pipelining and all-reduce in synchronous data parallel training is missing.…”
Section: B Horovodmentioning
confidence: 99%