2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) 2018
DOI: 10.1109/p3hpc.2018.00005
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 38 publications
(30 citation statements)
references
References 9 publications
0
29
1
Order By: Relevance
“…Here, threads within a warp update access every n th element (threads stride by 32∗ n words instead of the nominal Stride‐32). Unlike our previous work, which focused solely on HBM Rooflines for GPUs, the GPP hierarchical Roofline shows that the L1 and L2 cache behave quite differently from HBM. Whereas HBM intensity decreases linearly with increasing stride up to Stride‐4 (4 double complex words = 64 Bytes), L1 and L2 intensity stops decreasing beyond Stride‐2 (32B).…”
Section: Resultscontrasting
confidence: 65%
See 3 more Smart Citations
“…Here, threads within a warp update access every n th element (threads stride by 32∗ n words instead of the nominal Stride‐32). Unlike our previous work, which focused solely on HBM Rooflines for GPUs, the GPP hierarchical Roofline shows that the L1 and L2 cache behave quite differently from HBM. Whereas HBM intensity decreases linearly with increasing stride up to Stride‐4 (4 double complex words = 64 Bytes), L1 and L2 intensity stops decreasing beyond Stride‐2 (32B).…”
Section: Resultscontrasting
confidence: 65%
“…In this paper, we leverage the proof of concept methodology developed by Yang et al and extend it to support both hierarchical (L1, L2, HBM, System Memory) Roofline analysis as well as FP32 and FP16 precision (including Tensor Core). To that end, we use nvprof to collect a set of metrics for each kernel in an application.…”
Section: Roofline Methodology On Nvidia Gpusmentioning
confidence: 99%
See 2 more Smart Citations
“…We validated the Roofline model using the Empirical Roofline Tool (ERT) [56]. We re-wrote the micro-kernels in ERT to run them under host and device placement.…”
Section: Roofline Validationmentioning
confidence: 99%