Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

Jiao, Qing; Lu, Ming–Wei; Huynh, Huynh Phung; Mitra, Tulika

doi:10.1109/cgo.2015.7054182

Cited by 40 publications

(22 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Jiao et al studied the GPU core and memory frequency scaling for two concurrent kernels on the Kepler GT640 GPU [30]. They took a set of kernels from the CUDA SDK and Rodinia benchmark and measured their energy efficiency (GFlops/Watt) with different core-memory frequency settings.…”

Section: A Experimental Studiesmentioning

confidence: 99%

A survey and measurement study of GPU DVFS on energy conservation

Mei

Wang

Chu

2017

Digital Communications and Networks

View full text Add to dashboard Cite

Energy efficiency has become one of the top design criteria for current computing systems. The dynamic voltage and frequency scaling (DVFS) has been widely adopted by laptop computers, servers, and mobile devices to conserve energy, while the GPU DVFS is still at a certain early age. This paper aims at exploring the impact of GPU DVFS on the application performance and power consumption, and furthermore, on energy conservation. We survey the state-of-the-art GPU DVFS characterizations, and then summarize recent research works on GPU power and performance models. We also conduct real GPU DVFS experiments on NVIDIA Fermi and Maxwell GPUs. According to our experimental results, GPU DVFS has significant potential for energy saving. The effect of scaling core voltage/frequency and memory voltage/frequency depends on not only the GPU architectures, but also the characteristic of GPU applications.

show abstract

Section: A Experimental Studiesmentioning

confidence: 99%

A survey and measurement study of GPU DVFS on energy conservation

Mei

Wang

Chu

2017

Digital Communications and Networks

View full text Add to dashboard Cite

show abstract

“…These generally describe power or energy as linear systems where the cost of executing various types of instructions, accessing different cache hierarchies, disks or network interfaces is found using different methodologies. While some authors have used neural networks to estimate these costs [7,10] the vast majority use multivariable, linear regression. The typical way of describing for example the power usage of GPUs [5,9,18] and CPUs [14,19] is of the form:…”

Section: Background and Related Workmentioning

confidence: 99%

A high-precision, hybrid GPU, CPU and RAM power model for generic multimedia workloads

Stokke

Stensland

Griwodz

et al. 2016

Proceedings of the 7th International Conference on Multimedia Systems

View full text Add to dashboard Cite

Energy efficiency of multimedia processing is a hot topic in modern, mobile computing where the lifetime of batterypowered devices is low. Authors often use power models as tools to evaluate the energy-efficiency of multimedia workloads and processing schemes. A challenge with these models is that they are built without sufficiently deep hardware knowledge and as a result they have the potential to mispredict substantially depending on hardware configuration. Typical rate-based power models can for example mispredict up to 70 % on the Tegra K1 SoC. Inspired by multimedia workloads, we introduce a modelling methodology which can be used to build a generic, high-precision power model for the Tegra K1's GPU and memory. By considering hardware utilisation, rail voltages, leakage currents and clocks, the model achieves an average accuracy above 99 % over all operating frequencies, and has been rigorously tested on several multimedia workloads. Our method exposes detailed insight into hardware and how it consumes energy. This knowledge is not only useful for researchers to understand how power models should be built, but also helps to understand what developers can do to minimise power usage. For example, experiments show that for a DCT benchmark, 3 % power can be saved by utilising non-coherent caches and smaller datatypes. CCS Concepts •Computer systems organization → Heterogeneous (hybrid) systems; •Human-centered computing → Mobile devices; •Applied computing → Electronics;

show abstract

“…From higher to lower, we may distinguish the following. Software: For example, changing the frequency of the GPU core and video memory according to compute‐ and memory‐bound CUDA kernels or combining DVFS with a concurrent kernel execution to improve the performance‐per‐watt behavior compared with their sequential execution Compiler: Wu et al integrated a prototype of a DVFS mechanism into a dynamic compilation system, which is fine‐grained and code‐aware.…”

Section: Introductionmentioning

confidence: 99%

Energy‐based tuning of metaheuristics for molecular docking on multi‐GPUs

Pérez-Serrano

Imbernón

Cecilia

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary Virtual Screening (VS) methods simulate molecular interactions in silico to look for the best chemical compound that interacts with a given molecular target. VS is becoming increasingly popular to accelerate the drug discovery process and constitute hard optimization problems with a huge computational cost. To deal with these two challenges, we have created METADOCK, an application that (1) enables a wide range of metaheuristics through a parametrized schema and (2) promotes the use of a multi‐GPU environment within a heterogeneous cluster. Metaheuristics provide approximate solutions in a reasonable time frame, but, given the stochastic nature of real‐life procedures, the energy budget goes hand in hand with acceleration to validate the proposed solution. This paper evaluates energy trade‐offs and correlations with performance for a set of metaheuristics derived from METADOCK. We establish a solid inference from minimal power to maximal performance in GPUs, and from there, to optimal energy consumption. This way, ideal heuristics can be chosen according not only to best accuracy and performance but also to energy requirements. Our study starts with a preselection of parameterized metaheuristic functions, building blocks where we will find optimal patterns from power criteria while preserving parallelism through a GPU execution. We then establish a methodology to figure out the best instances of the parameterized kernels based on energy patterns obtained, which are analyzed from different viewpoints, ie, performance, average power, and total energy consumed. We also compare the best workload distributions for optimal performance and power efficiency among Pascal and Maxwell GPUs on popular Titan models. Our experimental results demonstrate that the most power efficient GPU can be overloaded in order to reduce the total amount of energy required by as much as 20%, finding unique scenarios where Maxwell does it better in execution time, but with Pascal always ahead in performance per watt, reaching peaks of up to 40%.

show abstract

Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

Cited by 40 publications

References 10 publications

A survey and measurement study of GPU DVFS on energy conservation

A survey and measurement study of GPU DVFS on energy conservation

A high-precision, hybrid GPU, CPU and RAM power model for generic multimedia workloads

Energy‐based tuning of metaheuristics for molecular docking on multi‐GPUs

Contact Info

Product

Resources

About