Proceedings of the Machine Learning on HPC Environments 2017
DOI: 10.1145/3146347.3146356
|View full text |Cite
|
Sign up to set email alerts
|

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(23 citation statements)
references
References 3 publications
0
20
0
Order By: Relevance
“…The individual benchmarks are described below: Figure 6 and Figure 7 show the complete source code. The matrix multiplication and convolution kernels were selected for their dominance of the training and inference time of the most classical networks [4,75]. The other kernels bring interesting computation patterns to enable expressiveness and performance comparisons in more diverse network architectures.…”
Section: Performance Resultsmentioning
confidence: 99%
“…The individual benchmarks are described below: Figure 6 and Figure 7 show the complete source code. The matrix multiplication and convolution kernels were selected for their dominance of the training and inference time of the most classical networks [4,75]. The other kernels bring interesting computation patterns to enable expressiveness and performance comparisons in more diverse network architectures.…”
Section: Performance Resultsmentioning
confidence: 99%
“…Several existing research efforts have shown the impact of different hardware platforms on the performance of DL frameworks [8], [10], [11], [26], and compared the performance of different DL frameworks with respect to their DNN structures and their default configuration settings [9], [27]. Thus, in this paper, we engage our empirical measurement study and comparison on characterization and analysis of DL frameworks in terms of how they respond to different configurations of their hyper-parameters, different types of datasets and different choices of parallel computing libraries.…”
Section: Methodology and Baselinesmentioning
confidence: 99%
“…It is widely recognized that choosing the right DL framework for the right applications becomes a daunting task for many researchers, developers and domain scientists. Although there are some existing DL benchmarking efforts, most of them have centered on studying different CPU-GPU configurations and their impact on different DL frameworks with standard datasets [8], [9], [10], [11]. Even under the same CPU-GPU configuration, no single DL framework dominates the performance and accuracy for standard datasets, such as MNIST [12], CIFAR [13], ImageNet [14].…”
Section: Introductionmentioning
confidence: 99%
“…In consequence, the intelligence services are mostly running in remote servers and not directly in the devices where the applications are executed. At the same time, the data required to train the models are centrally collected, and in turn facilitating the training process of the models used by the intelligent services [6].…”
Section: Intelligent Services Decouplingmentioning
confidence: 99%