Performance tuning and analysis of future vector processors based on the roofline model

Sato, Yasumoto; Nagaoka, Ryuichi; Musa, Akihiro; Egawa, Ryusuke; Takizawa, Hiroyuki; Okabe, Kimiko; Kobayashi, Hiroaki

doi:10.1145/1621960.1621962

Cited by 7 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, the Roofline model [10] provides insights into inherent architectural bottlenecks and potential application optimizations. Its usefulness is patent in several works [9], both at the application [4], [6], [8] and at the architectural level [5], [7].…”

Section: Introductionmentioning

confidence: 99%

Cache-aware Roofline model: Upgrading the loft

Ilić

Pratas

Sousa

2014

IEEE Comput. Arch. Lett.

107

View full text Add to dashboard Cite

The Roofline model graphically represents the attainable upper bound performance of a computer architecture. This paper analyzes the original Roofline model and proposes a novel approach to provide a more insightful performance modeling of modern architectures by introducing cache-awareness, thus significantly improving the guidelines for application optimization. The proposed model was experimentally verified for different architectures by taking advantage of built-in hardware counters with a curve fitness above 90%.

show abstract

Section: Introductionmentioning

confidence: 99%

Cache-aware Roofline model: Upgrading the loft

Ilić

Pratas

Sousa

2014

IEEE Comput. Arch. Lett.

107

View full text Add to dashboard Cite

show abstract

“…This model has been applied to reallife codes in the past to analyze and report performance including oceanic climate models [5], combustion modeling [6] and even seismic imaging [7]. It has also been used to evaluate the effectiveness of implementation-time optimizations like autotuning [8], or cache-blocking on specific hardware platforms like vector processors [9] and GPUs [10]. Tools are available to plot the machine-specific parameters of the roofline model automatically [11].…”

Section: Roofline Performance Analysismentioning

confidence: 99%

Performance prediction of finite-difference solvers for different computer architectures

Louboutin

Lange

Herrmann

et al. 2017

Computers & Geosciences

View full text Add to dashboard Cite

The life-cycle of a partial differential equation (PDE) solver is often characterized by three development phases: the development of a stable numerical discretization; development of a correct (verified) implementation; and the optimization of the implementation for different computer architectures. Often it is only after significant time and effort has been invested that the performance bottlenecks of a PDE solver are fully understood, and the precise details varies between different computer architectures. One way to mitigate this issue is to establish a reliable performance model that allows a numerical analyst to make reliable predictions of how well a numerical method would perform on a given computer architecture, before embarking upon potentially long and expensive implementation and optimization phases. The availability of a reliable performance model also saves developer effort as it both informs the developer on what kind of optimisations are beneficial, and when the maximum expected performance has been reached and optimisation work should stop. We show how discretization of a wave-equation can be theoretically studied to understand the performance limitations of the method on modern computer architectures. We focus on the roofline model, now broadly used in the high-performance computing community, which considers the achievable performance in terms of the peak memory bandwidth and peak floating point performance of a computer with respect to algorithmic choices. A first principles analysis of operational intensity for key time-stepping finite-difference algorithms is presented. With this information available at the time of algorithm design, the expected performance on target computer systems can be used as a driver for algorithm design

show abstract

“…The roofline model [1], proposed in 2008, is a visual performance model that makes the identification of potential bottlenecks easier and provides a guideline to explore the architecture. It has been proved to be flexible enough to characterize not only multicore architectures but also innovative architectures ( [2][3][4]). In the GPU community the model has been well accepted ( [5][6][7]), due to the similarity of GPU architectures and multicore processors.…”

Section: Related Workmentioning

confidence: 99%

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

Silva

Braeken

D’Hollander

et al. 2013

International Journal of Reconfigurable Computing

View full text Add to dashboard Cite

The potential of FPGAs as accelerators for high-performance computing applications is very large, but many factors are involved in their performance. The design for FPGAs and the selection of the proper optimizations when mapping computations to FPGAs lead to prohibitively long developing time. Alternatives are the high-level synthesis (HLS) tools, which promise a fast design space exploration due to design at high-level or analytical performance models which provide realistic performance expectations, potential impediments to performance, and optimization guidelines. In this paper we propose the combination of both, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer. Our proposed model extends the roofline model, by considering the resource consumption and the parameters used in the HLS tools, to maximize the performance and the resource utilization within the area of the FPGA. The proposed model is applied to optimize the design exploration of a class of window-based image processing applications using two different HLS tools. The results show the accuracy of the model as well as its flexibility to be combined with any HLS tool.

show abstract

Performance tuning and analysis of future vector processors based on the roofline model

Cited by 7 publications

References 13 publications

Cache-aware Roofline model: Upgrading the loft

Cache-aware Roofline model: Upgrading the loft

Performance prediction of finite-difference solvers for different computer architectures

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

Contact Info

Product

Resources

About