Charlene Yang scite author profile

Summary The Roofline performance model provides an intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In preparation for the next‐generation supercomputer Perlmutter at NERSC, this paper presents a methodology to construct a hierarchical Roofline on NVIDIA GPUs and extends it to support reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory, and system memory bandwidths into one single figure, and it offers more profound insights into performance analysis than the traditional DRAM‐only Roofline. We use our Roofline methodology to analyze three proxy applications: GPP from BerkeleyGW, HPGMG from AMReX, and conv2d from TensorFlow. In doing so, we demonstrate the ability of our methodology to readily understand various aspects of performance and performance bottlenecks on NVIDIA GPUs and motivate code optimizations.

show abstract

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

Yang¹,

Gayatri²,

Kurth³

et al. 2018

View full text Add to dashboard Cite

Accelerating Large-Scale Excited-State GW Calculations on Leadership HPC Systems

Ben

Yang

et al. 2020

View full text Add to dashboard Cite

A Case Study for Performance Portability Using OpenMP 4.5

Gayatri

Yang

Kurth

et al. 2019

View full text Add to dashboard Cite

Fast optical absorption spectra calculations for periodic solid state systems

Henneke

Lin

Vorwerk

et al. 2020

Commun. Appl. Math. Comput. Sci.

View full text Add to dashboard Cite

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization

Koskela

Matveev²,

Yang

et al. 2018

View full text Add to dashboard Cite

With energy-efficient architectures, including accelerators and many-core processors, gaining traction, application developers face the challenge of optimizing their applications for multiple hardware features including many-core parallelism, wide processing vector-units and onchip high-bandwidth memory. In this paper, we discuss the development and utilization of a new application performance tool based on an extension of the classical roofline-model for simultaneously profiling multiple levels in the cache-memory hierarchy. This tool presents a powerful visual aid for the developer and can be used to frame the many-dimensional optimization problem in a tractable way. We show case studies of real scientific applications that have gained insights from the Integrated Roofline Model.

show abstract

Hierarchical Roofline Performance Analysis for Deep Learning Applications

Yang

Wang

Kurth

et al. 2021

View full text Add to dashboard Cite

Time-Based Roofline for Deep Learning Performance Analysis

Wang

Yang

Farrell

et al. 2020

View full text Add to dashboard Cite

Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach to performance analysis to facilitate the optimization of these applications. This approach is an extension of the Roofline model widely used in traditional highperformance computing applications, and it incorporates both compute/bandwidth complexity and run time in its formulae to provide insights into deep learning-specific characteristics. We take two sets of representative kernels, 2D convolution and long short-term memory, to validate and demonstrate the use of this new approach, and investigate how arithmetic intensity, cache locality, auto-tuning, kernel launch overhead, and Tensor Core usage can affect performance. Compared to the common ad-hoc approach, this study helps form a more systematic way to analyze code performance and identify optimization opportunities for deep learning applications.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Charlene Yang

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

Accelerating Large-Scale Excited-State GW Calculations on Leadership HPC Systems

A Case Study for Performance Portability Using OpenMP 4.5

Fast optical absorption spectra calculations for periodic solid state systems

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization

Hierarchical Roofline Performance Analysis for Deep Learning Applications

Time-Based Roofline for Deep Learning Performance Analysis

Contact Info

Product

Resources

About