2018
DOI: 10.1007/978-3-319-92040-5_12
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization

Abstract: With energy-efficient architectures, including accelerators and many-core processors, gaining traction, application developers face the challenge of optimizing their applications for multiple hardware features including many-core parallelism, wide processing vector-units and onchip high-bandwidth memory. In this paper, we discuss the development and utilization of a new application performance tool based on an extension of the classical roofline-model for simultaneously profiling multiple levels in the cache-m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 22 publications
(25 reference statements)
0
18
0
Order By: Relevance
“…Over the years, the Classical Roofline model [36] has been formulated for multicore [19,23] and GPU [18,40] architectures. Moreover, assisted methodologies and automatic tools [5,22,28,29] have been introduced to ease Roofline model generation for scientific and HPC application optimization.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Over the years, the Classical Roofline model [36] has been formulated for multicore [19,23] and GPU [18,40] architectures. Moreover, assisted methodologies and automatic tools [5,22,28,29] have been introduced to ease Roofline model generation for scientific and HPC application optimization.…”
Section: Related Workmentioning
confidence: 99%
“…Condensing the optimization space in a single performance figure, this model provides intuitive guidance to optimize complex applications. In this way, the Roofline model has become a confirmed methodology to optimize HPC applications targeting multicore [19,23] and GPU [18,40] architectures. With Field-Programmable Gate Array (FPGA) devices becoming an appealing solutions to accelerate HPC applications, a dual Roofline model for reconfigurable devices is becoming of real interest.…”
Section: Introductionmentioning
confidence: 99%
“…Previously, the Roofline model was expanded to support the full memory hierarchy by adding additional bandwidth “ceilings.” Similarly, additional ceilings beneath the Roofline can be added to represent performance bottlenecks arising from lack of vectorization or the failure to exploit fused multiply‐add (FMA) instructions.…”
Section: Introductionmentioning
confidence: 99%
“…Orthogonal to the Roofline description of hardware is characterizing applications in terms of Roofline‐related coordinates, ie, Performance (GFLOP/s) and Arithmetic Intensity (FLOPs/Byte). One can employ a variety of methods to calculate these terms ranging from hand counting FLOPs and estimating bytes, to performance counters, to software simulators that trade performance for accuracy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation