Coordinated Batching and DVFS for DNN Inference on GPU Accelerators

Nabavinejad, Seyed Morteza; Reda, Sherief; Ebrahimi, Masoumeh

doi:10.1109/tpds.2022.3144614

Cited by 32 publications

(7 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When we focus on the application level, improving the computation efficiency by executing a batch of input together has been successfully applied in various fields [26,28,31]. This trend is even more common in machine learning for accelerating training and inference [15,18]. However, they target a more dynamic environment, considering the batch size as a tuning parameter [3].…”

Section: State Of the Artmentioning

confidence: 99%

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Accordi,

Gadioli,

Vitali

et al. 2024

J Supercomput

View full text Add to dashboard Cite

Virtual screening is an early stage in the drug discovery process that selects the most promising candidates. In the urgent computing scenario, finding a solution in the shortest time frame is critical. Any improvement in the performance of a virtual screening application translates into an increase in the number of candidates evaluated, thereby raising the probability of finding a drug. In this paper, we show how we can improve application throughput using Out-of-kernel optimizations. They use input features, kernel requirements, and architectural features to rearrange the kernel inputs, executing them out of order, to improve the computation efficiency. These optimizations’ implementations are designed on an extreme-scale virtual screening application, named LiGen, that can hinge on CUDA and SYCL kernels to carry out the computation on modern supercomputer nodes. Even if they are tailored to a single application, they might also be of interest for applications that share a similar design pattern. The experimental results show how these optimizations can increase kernel performance by 2$$\times$$ × , respectively, up to 2.2$$\times$$ × in CUDA and up to 1.9$$\times$$ × , in SYCL. Moreover, the reported speedup can be achieved with the best-proposed parameterization, as shown by the data we collected and reported in this manuscript.

show abstract

Section: State Of the Artmentioning

confidence: 99%

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Accordi,

Gadioli,

Vitali

et al. 2024

J Supercomput

View full text Add to dashboard Cite

show abstract

“…The mapping algorithm balances the positive error and the negative error of approximation to maximize the energy reduction while minimizing the overall approximation error. Finally, to tackle DVFS for power reduction, the forefront work of [151] introduces a new control knob based on the size of input batches fed to the DNN inference in the GPU. The authors first analyzed the effects of batch size on power and performance.…”

Section: ) Thermal Managementmentioning

confidence: 99%

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Moghaddasi,

Gorgin,

Lee

2023

IEEE Access

View full text Add to dashboard Cite

Nowadays, artificial intelligence (AI) and deep learning (DL) progressively adapt to various spheres of our lives. These disciplines contain safety-critical applications such as autonomous driving with a high risk of human injury in the case of malfunction, requiring a high promise of dependability. Even the dependability becomes more crucial as shrinking CMOS technology feature size worsens the resilience concerns due to factors like aging. This paper addresses the overarching dependability issue of advanced deep neural networks (DNN) accelerators from the aging perspective. Especially, a comprehensive survey and taxonomy of techniques used to evaluate and mitigate aging effects are introduced. We cover different aging effects like permanent faults, timing errors, and lifetime issues. We review research by the layer-wise approach and categorize several resilience classes to bring out major features. The concluding part of this review highlights the questions answered and several future research directions. This study is expected to benefit researchers in different areas of DNN deployment, especially the dependability of this emergent paradigm.

show abstract

“…This makes it possible for DNN to be applied to communication systems without being limited by specific mathematical models. Moreover, DNN computations can be easily parallelized, which means they can take advantage of modern hardware accelerators such as GPUs to achieve faster training and inference speeds [8].…”

Section: Introductionmentioning

confidence: 99%

A Deep-Neural-Network-Based Decoding Scheme in Wireless Communication Systems

Lei

Song

et al. 2023

Electronics

View full text Add to dashboard Cite

With the flourishing development of wireless communication, further challenges will be introduced by the future demands of emerging applications. However, in the face of more complex communication scenarios, favorable decoding results may not be yielded by conventional channel decoding schemes based on mathematical models. The remarkable contributions of deep neural networks (DNNs) in various fields have garnered widespread recognition, which has ignited our enthusiasm for their application in wireless communication systems. Therefore, a reliable DNN-based decoding scheme designed for wireless communication systems is proposed. This scheme comprises efficient local decoding using linear and nonlinear operations. To be specific, linear operations are carried out on the edges connecting neurons, while nonlinear operations are performed on each neuron. After forward propagation through the DNN, the loss value is estimated based on the output, and backward propagation is employed to update the weights and biases. This process is performed iteratively until a near-optimal message sequence is recovered. Various factors within the DNN are considered in the simulation and the potential impacts of each factor are analyzed. Simulation results indicate that our proposed DNN-based decoding scheme is superior to the conventional hard decision.

show abstract

Coordinated Batching and DVFS for DNN Inference on GPU Accelerators

Cited by 32 publications

References 87 publications

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

A Deep-Neural-Network-Based Decoding Scheme in Wireless Communication Systems

Contact Info

Product

Resources

About