IEEE INFOCOM 2019 - IEEE Conference on Computer Communications 2019
DOI: 10.1109/infocom.2019.8737460
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning-based Job Placement in Distributed Machine Learning Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 105 publications
(47 citation statements)
references
References 21 publications
0
47
0
Order By: Relevance
“…Gao et al [14] solve a training time minimization problem to find the best device placement of a deep neural network, using a reinforcement learning algorithm. Bao et al [7] propose a deep learning-based job placement algorithm to minimize interference among co-located ML jobs. Resource allocation among multiple jobs is not considered by these work.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Gao et al [14] solve a training time minimization problem to find the best device placement of a deep neural network, using a reinforcement learning algorithm. Bao et al [7] propose a deep learning-based job placement algorithm to minimize interference among co-located ML jobs. Resource allocation among multiple jobs is not considered by these work.…”
Section: Related Workmentioning
confidence: 99%
“…subject to: This maximization problem involves integer variables, non-linear constraint (2b) (2c) and constraints concerning multiplication of variables (2f)(2h)(7b). To address these challenges, we first apply the compact-exponential techniques [36] to reformulate problem (7) into an equivalent conventional integer linear program (ILP) with packing structure: (8) subject to:…”
Section: The Maximum Weighted Schedule Problemmentioning
confidence: 99%
“…The recent advance of RL has expedited automation of system operations in many areas. They include energy optimization in data centers [38], [39], cluster resource management [5]- [7], job placement in cloud networks [40], network slicing [41], and compiler optimization [42]. In this paper, we adopt RL for the GFPS problem, which is considered challenging in the area of real-time systems.…”
Section: Name Descriptionmentioning
confidence: 99%
“…In output layer of the policy NN, we mask invalid actions, which points to a direction of obstacles within one meter from the walker, by setting their probability to 0 in the probability distribution. Then we re-scale the probabilities of all actions such that the sum still equals 1 (Bao et al, 2019). The walker will then move one meter toward the chosen direction.…”
Section: Action Spacementioning
confidence: 99%