2021
DOI: 10.1007/978-3-030-80126-7_35
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Roofline Performance Analysis for Deep Learning Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1
1

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 17 publications
0
8
0
Order By: Relevance
“…80 × 8 × 1.312 × 4 3 × 2 = 107.479 TFLOP/s. ERT was recently updated with more precision and architecture support, and the implementation details are discussed in [35]. Note that the machine balance diagonal presented in this paper is the Tensor Core peak performance divided by the HBM bandwidth (107479 / 828.8 = 129.68).…”
Section: B Roofline Data Collectionmentioning
confidence: 99%
See 1 more Smart Citation
“…80 × 8 × 1.312 × 4 3 × 2 = 107.479 TFLOP/s. ERT was recently updated with more precision and architecture support, and the implementation details are discussed in [35]. Note that the machine balance diagonal presented in this paper is the Tensor Core peak performance divided by the HBM bandwidth (107479 / 828.8 = 129.68).…”
Section: B Roofline Data Collectionmentioning
confidence: 99%
“…A set of metrics are collected to measure the kernel run time, computational complexity, and bandwidth complexity (please refer to [35], [24] for more details). Throughout this paper, computational complexities are treated equally thus precisionagnostic.…”
Section: B Roofline Data Collectionmentioning
confidence: 99%
“…On NVIDIA GPUs, an nvprof [9] based methodology was first proposed in [17], then an Nsight Compute [5] metrics based one developed in [22], [30]. These methodologies require a dozen of metrics to be collected for hierarchical Roofline analysis, and could incur significant profiling overhead when the number of kernels in the code is high.…”
Section: B Roofline Data Collection On Nvidia Gpusmentioning
confidence: 99%
“…To facilitate the Roofline study, a range of other tools have sprung to life as well, for example, the Empirical Roofline Toolkit (ERT) for more accurate machine characterization [12], [13], and [14], [15], [16], [17], [18] for more streamlined data collection methods. Other than tools development, there are many studies on the application of the Roofline model in traditional HPC [19], [17], [18], [20], [21] and Machine Learning [17], [18], [22], [23], and extension and refinement of the model to related topics in HPC, such as instruction Roofline [24], time-based Roofline [23], Roofline scaling trajectory [25], performance portability analysis based on Roofline [13], and power and energy Roofline [26], [27].…”
Section: Introductionmentioning
confidence: 99%
“…To facilitate the Roofline study, a range of tools have sprung to life, for more accurate machine characterization such as the Empirical Roofline Toolkit (ERT) [7], [8], and for more streamlined methods to collect Roofline performance data using open-source tools or workflows [3], [9]- [11]. Other than tools development, there are also many studies on the application of the Roofline model in both traditional HPC [3], [12]- [14] and the new, emerging field of Machine Learning [3], [15], [16], and the extension and refinement of the model, such as instruction Roofline [17], Roofline scaling trajectories [18], performance portability based on Roofline [8], and power and energy Roofline [19], [20].…”
Section: Introductionmentioning
confidence: 99%