Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Wang, Farui; Zhang, Weizhe; Lai, Shichao; Hao, Meng; Wang, Zheng

doi:10.1109/tpds.2021.3137867

Cited by 15 publications

(9 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Graphical Processing Units (GPU) are processors capable of parallel processing instructions. Standard GPU deep learning speedup techniques include convolutional layer reuse, featuremap reuse and filter reuse, and memory access is a common bottleneck [163]. The basic idea is that functions that are computed many times should be optimized on all levels, from high to low, including the instruction set level.…”

Section: Stack Optimizations For Deep Learningmentioning

confidence: 99%

Advances of machine learning in materials science: Ideas and techniques

Sin¹,

Ng²,

Wang³

et al. 2023

Front. Phys.

View full text Add to dashboard Cite

In this big data era, the use of large dataset in conjunction with machine learning (ML) has been increasingly popular in both industry and academia. In recent times, the field of materials science is also undergoing a big data revolution, with large database and repositories appearing everywhere. Traditionally, materials science is a trial-and-error field, in both the computational and experimental departments. With the advent of machine learning-based techniques, there has been a paradigm shift: materials can now be screened quickly using ML models and even generated based on materials with similar properties; ML has also quietly infiltrated many sub-disciplinary under materials science. However, ML remains relatively new to the field and is expanding its wing quickly. There are a plethora of readily-available big data architectures and abundance of ML models and software; The call to integrate all these elements in a comprehensive research procedure is becoming an important direction of material science research. In this review, we attempt to provide an introduction and reference of ML to materials scientists, covering as much as possible the commonly used methods and applications, and discussing the future possibilities.

show abstract

Section: Stack Optimizations For Deep Learningmentioning

confidence: 99%

Advances of machine learning in materials science: Ideas and techniques

Sin¹,

Ng²,

Wang³

et al. 2023

Front. Phys.

View full text Add to dashboard Cite

show abstract

“…As an example, for V100 average energy savings of 24%–33%, for EDP 23%–27% with performance loss of 13%–21% and for EDS (

k=2

) 23.5%–27.3% with performance loss of 4.5%–13.8% were observed. In Reference 36, a GPOEO solution was proposed, developed specifically for iterative machine learning applications. The tool measures performance counter as well as energy online and time and energy models are used to find best predicted configuration that optimizes a function of time and energy.…”

Section: Related Workmentioning

confidence: 99%

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Czarnul

2023

Concurrency and Computation

View full text Add to dashboard Cite

SummaryIn the article, we have proposed a framework that allows programming a parallel application for a multi‐node system, with one or more graphical processing units (GPUs) per node, using an OpenMP+extended CUDA API. OpenMP is used for launching threads responsible for management of particular GPUs and extended CUDA calls allow to transfer data and launch kernels on local and remote GPUs. The framework hides inter‐node MPI communication from the programmer. For optimization, the implementation takes advantage of the MPI_THREAD_MULTIPLE mode allowing: multiple threads handling distinct GPUs as well as overlapping communication and computations transparently using multiple CUDA streams. The solution allows data parallelization across available GPUs in order to minimize execution time and supports a power‐aware mode in which GPUs are automatically selected for computations using a greedy approach in order not to exceed an imposed power limit. We have implemented and benchmarked three parallel applications including: finding the largest divisors; verification of the Collatz conjecture; finding patterns in vectors. These were tested on three various systems: a GPU cluster with 16 nodes, each with NVIDIA GTX 1060 GPU; a powerful 2‐node system—one node with 8 NVIDIA Quadro RTX 6000 GPUs, the second with 4 NVIDIA Quadro RTX 5000 GPUs; a heterogeneous environment with one node with 2 NVIDIA RTX 2080 and 2 nodes with NVIDIA GTX 1060 GPUs. We demonstrated effectiveness of the framework through execution times versus power caps within ranges of 100–1400 W, 250–3000 W, and 125–600 W for these systems respectively as well as gains from using two versus one CUDA streams per GPU. Finally, we have shown that for the testbed applications the solution allows to obtain high speed‐ups between 89.3% and 97.4% of the theoretically assessed ideal ones, for 16 nodes and 2 CUDA streams, demonstrating very good parallel efficiency.

show abstract

“…This approach considers both the model's quality and energy consumption. In [21], the authors presented an online GPU energy optimization framework for assessing iterative ML workloads and automatically predicting the best energy configuration.…”

Section: A Cpu-gpu Based Systemsmentioning

confidence: 99%

Sustainable Artificial Intelligence Systems: An Energy Efficiency Approach

Hidalgo,

Fenández-de_Vega,

Ceberio

et al. 2023

Preprint

View full text Add to dashboard Cite

<p>The energy consumption of Artificial Intelligence (AI) systems has increased 300,000-fold from 2012 to now, and data centers running massive AI software produce up to 5-9% of global electricity demand and 2% of all CO2 emissions. Such an increase in energy consumption has been partially motivated by the strong development of new AI-specific architectures to improve the performance of AI models. Nevertheless, the AI community has recently become aware of the importance of considering energy efficiency as a metric when developing AI techniques. To date, a great effort has been made to find optimal AI model configurations that provide the best solution in the shortest possible time. However, only a few works have sought a compromise between energy cost and system performance. This paper analyses recent efforts in these directions and proposes the path toward energy-efficient AI. We describe a set of energy efficiency strategies for applying and deploying AI models on different computing infrastructures in search of democratizing an environmentally sustainable AI. To that end, we propose a full-stack approach of energy-efficient AI and analyze the role that different types of users should play, tackling the energy-focused optimization in all steps of the AI model design flow, from the high levels of models and algorithm design to the low levels ones, more related to the hardware and architecture.<br> </p>

show abstract

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Cited by 15 publications

References 19 publications

Advances of machine learning in materials science: Ideas and techniques

Advances of machine learning in materials science: Ideas and techniques

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

Sustainable Artificial Intelligence Systems: An Energy Efficiency Approach

Contact Info

Product

Resources

About