2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS) 2020
DOI: 10.1109/dls51937.2020.00007
|View full text |Cite
|
Sign up to set email alerts
|

Time-Based Roofline for Deep Learning Performance Analysis

Abstract: Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach to performance analysis to facilitate the optimization of these applications. This approach is an extension of the Roofline model widely used in traditional highperformance computing applications, and it incorporates both compute/bandwidth complexity and run time i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
2

Relationship

3
7

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 30 publications
(30 reference statements)
0
8
0
1
Order By: Relevance
“…Ren et al [48] proposed the first algorithm-hardware co-designing framework with a combination of weight pruning and quantization, to reduce performance overhead due to irregular sparsity. Wang et al [61] extended Roofline model into deep learning area and incorporated computational complexity and run-time into models, which made it possible to analyze code performance for deep learning applications systematically.…”
Section: Dynamic Neural Networkmentioning
confidence: 99%
“…Ren et al [48] proposed the first algorithm-hardware co-designing framework with a combination of weight pruning and quantization, to reduce performance overhead due to irregular sparsity. Wang et al [61] extended Roofline model into deep learning area and incorporated computational complexity and run-time into models, which made it possible to analyze code performance for deep learning applications systematically.…”
Section: Dynamic Neural Networkmentioning
confidence: 99%
“…Most studies using memory profilers are based on high-level understanding of the individual DNN layers or analytical models such as the Roofline [30] (roofline analysis helps to visualize the limits imposed by the hardware, as well as to determine the main limiting factor -memory bandwidth or computational capacity -thus leading to an ideal roadmap of possible optimization steps [29]). These approaches do not capture the complex interaction between the CPU, memory, and accelerator devices.…”
Section: Introductionmentioning
confidence: 99%
“…To facilitate the Roofline study, a range of tools have sprung to life, for more accurate machine characterization such as the Empirical Roofline Toolkit (ERT) [7], [8], and for more streamlined methods to collect Roofline performance data using open-source tools or workflows [3], [9]- [11]. Other than tools development, there are also many studies on the application of the Roofline model in both traditional HPC [3], [12]- [14] and the new, emerging field of Machine Learning [3], [15], [16], and the extension and refinement of the model, such as instruction Roofline [17], Roofline scaling trajectories [18], performance portability based on Roofline [8], and power and energy Roofline [19], [20].…”
Section: Introductionmentioning
confidence: 99%