2022
DOI: 10.1109/tpds.2022.3144614
|View full text |Cite
|
Sign up to set email alerts
|

Coordinated Batching and DVFS for DNN Inference on GPU Accelerators

Abstract: Deployment of real-time ML services on warehouse-scale infrastructures is on the increase. Therefore, decreasing latency and increasing throughput of deep neural network (DNN) inference applications that empower those services have attracted attention from both academia and industry. A common solution to address this challenge is leveraging hardware accelerators such as GPUs. To improve the inference throughput of DNNs deployed on GPU accelerators, two common approaches are employed: Batching and Multi-Tenancy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
2
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 87 publications
0
2
0
Order By: Relevance
“…When we focus on the application level, improving the computation efficiency by executing a batch of input together has been successfully applied in various fields [26,28,31]. This trend is even more common in machine learning for accelerating training and inference [15,18]. However, they target a more dynamic environment, considering the batch size as a tuning parameter [3].…”
Section: State Of the Artmentioning
confidence: 99%
“…When we focus on the application level, improving the computation efficiency by executing a batch of input together has been successfully applied in various fields [26,28,31]. This trend is even more common in machine learning for accelerating training and inference [15,18]. However, they target a more dynamic environment, considering the batch size as a tuning parameter [3].…”
Section: State Of the Artmentioning
confidence: 99%
“…The mapping algorithm balances the positive error and the negative error of approximation to maximize the energy reduction while minimizing the overall approximation error. Finally, to tackle DVFS for power reduction, the forefront work of [151] introduces a new control knob based on the size of input batches fed to the DNN inference in the GPU. The authors first analyzed the effects of batch size on power and performance.…”
Section: ) Thermal Managementmentioning
confidence: 99%
“…This makes it possible for DNN to be applied to communication systems without being limited by specific mathematical models. Moreover, DNN computations can be easily parallelized, which means they can take advantage of modern hardware accelerators such as GPUs to achieve faster training and inference speeds [8].…”
Section: Introductionmentioning
confidence: 99%