SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism

Yan, Feng; Ruwase, Olatunji; He, Yuxiong; Smirni, Evgenia

doi:10.1109/sc.2016.25

Cited by 20 publications

(9 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that each GPU server executes only one DL task at a time. Referring to [23], we assume that both the power consumption and the DL performance are linearly proportional to the number of GPU devices, as defined in ( 2) and (3). β i j,c pkq, β i j,m pkq, β i j,e pkq denote iteration time model coefficients at time k respectively, similar to α i j,c pkq, α i j,m pkq, α i j,e pkq [13].…”

Section: B Frequency Scaling Based Power and Performance Modelmentioning

confidence: 99%

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Kang

Peng

et al. 2022

IEEE Trans. Ind. Electron.

View full text Add to dashboard Cite

The recent GPU-based clusters that handle deep learning (DL) tasks have the features of GPU device heterogeneity, a variety of deep neural network (DNN) models, and high computational complexity. Thus, the traditional power capping methods for CPU-based clusters or small-scale GPU devices do not apply to the GPU-based clusters handling DL tasks. This paper develops a cooperative distributed GPU power capping (CD-GPC) system for GPU-based clusters, aiming to minimize the training completion time of invoked DL tasks without exceeding the limited power budget. Specifically, we first design the frequency scaling (FS) approach using the online model estimation based on the recursive least square (RLS) method. This approach achieves the accurate tuning for DL task training time and power usage of GPU devices without needing offline profiling. Then, we formulate the proposed FS problem as a Lagrangian dual decomposition-based economic model predictive control (EMPC) problem for large-scale heterogeneous GPU clusters. We conduct both the NVIDIA GPU-based lab-scale real experiments and real job trace-based simulation experiments for performance evaluation. Experimental results validate that the proposed system improves the power capping accuracy to have a mean absolute error ă 1%, and reduces the deadline violation ratio of invoked DL tasks by 21.5% compared with other recent counterparts.

show abstract

Section: B Frequency Scaling Based Power and Performance Modelmentioning

confidence: 99%

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Kang

Peng

et al. 2022

IEEE Trans. Ind. Electron.

View full text Add to dashboard Cite

show abstract

“…The primary objective of distributed machine learning is to minimise the time required to execute computing tasks [111]. While parallelisation and distributed architectures increase the available computing resources, applying them naively can harm, rather than improve system performance [171]. For model training, the system optimization, resource allocation and scheduling challenges essentially are concerned with determining how to partition a model, where to place model parts and when to train which part of the model [110] in the shortest possible time, fully utilising available computing resources.…”

Section: System Optimizationmentioning

confidence: 99%

“…Automatically mapping tasks to hardware resources, scheduling and balancing workloads and determining the task execution order is thus important. Recent efforts have investigated adaptive, dynamic load balancing [97], optimal resource allocation and dynamic scheduling [171], and automated, dependence-aware scheduling [168] to improve training speed and system response to varying loads. Metaoptimizations that can be automated to improve model and system performance are parameter search, hyper-parameter search and neural architecture search [17].…”

Section: System Optimizationmentioning

confidence: 99%

Machine Learning Systems for Intelligent Services in the IoT: A Survey

Toussaint,

Ding

2020

Preprint

View full text Add to dashboard Cite

Machine learning (ML) technologies are emerging in the Internet of Things (IoT) to provision intelligent services. This survey moves beyond existing ML algorithms and cloud-driven design to investigate the less-explored systems, scaling and socio-technical aspects for consolidating ML and IoT. It covers the latest developments (up to 2020) on scaling and distributing ML across cloud, edge, and IoT devices. With a multi-layered framework to classify and illuminate system design choices, this survey exposes fundamental concerns of developing and deploying ML systems in the rising cloud-edge-device continuum in terms of functionality, stakeholder alignment and trustworthiness. CCS Concepts: • Computing methodologies → Machine learning; • Computer systems organization → Embedded and cyber-physical systems.

show abstract

“…The outcome gets attracted by Cosmetatos' calculations 14 where it calculates m / d int erf / n using the m / m / n with adjustment and exact things. These calculations methods are maddened to resolve the interference‐aware scenario and resolve m / d int erf / n within two methods, ie, (1) solving m / d int erf / n queue has interference‐aware exponential service time.…”

Section: Proposed Heterogeneous Hybridized Fuzzy‐based Dijkstra's Schmentioning

confidence: 99%

“…Section 3 describes about the problem statement describing about the major issues, which are considered in these tasks. Section 4 explains regarding the explained HHFDS methodology that hybridizes a fuzzy Dijstra's algorithm with deep neural network 14,15 . Section 5 provides comparative analysis to depict with increased production of the explained algorithm.…”

Section: Introductionmentioning

confidence: 99%

HHFDS: Heterogeneous hybridized fuzzy‐based Dijkstra's multitask scheduling in WSN

Prakash¹,

Chelliah²,

Ramanujam

2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary Wireless Sensor Network (WSN) has to diffuse the independent wireless sensor nodes, which can track the physical or environmental status. In previous works, many issues prevailed with multitask scheduling such as interference, high energy consumption, throughput, etc. Moreover, some issues arise while working with heterogeneous WSN. This work deals with these issues in heterogeneous WSN with aid of the multitask scheduling. For this purpose, a Heterogeneous Hybridized Fuzzy‐based (depends on degree of truth) Dijkstra's organizing algorithm (HHFDS) was proposed. This technique integrates light weight characteristics as well as a queuing‐based analysis methods, hence identifying the greatest parallel arrangements for which the provided data packet effectively. In this technique, fuzzy Dijkstra's algorithm is hybridized with deep neural network. This also depicts in what way to utilize the identifications outcome with varying data packets and thereby helps with various heterogeneous organizing objectives. The proposed HHFDS algorithm is implemented via the use of the network simulator (NS2). The results of the proposed and existing methods are measured in terms of the metrics like Energy Consumption, Time Consumption, Transmission Delay, and Average Throughput. The simulation results are depicted as follows and the comparison result shown as follows demonstrates the effectiveness of the proposed algorithm.

show abstract

SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism

Cited by 20 publications

References 28 publications

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Cooperative Distributed GPU Power Capping for Deep Learning Clusters

Machine Learning Systems for Intelligent Services in the IoT: A Survey

HHFDS: Heterogeneous hybridized fuzzy‐based Dijkstra's multitask scheduling in WSN

Contact Info

Product

Resources

About