2016
DOI: 10.1109/tc.2015.2481403
|View full text |Cite
|
Sign up to set email alerts
|

Heterogeneity and Interference-Aware Virtual Machine Provisioning for Predictable Performance in the Cloud

Abstract: GPUs are essential to accelerating the latency-sensitive deep neural network (DNN) inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling. However, GPU sharing inevitably brings severe performance interference among co-located inference workloads, as motivated by an empirical measurement study of DNN inference on EC2 GPU instances. While existing works on guaranteeing inference performance servic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
50
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 83 publications
(51 citation statements)
references
References 27 publications
1
50
0
Order By: Relevance
“…Constraint (17) gives the load intensity at the second queue. Constraints (18) and (19) give the expression for the expected completion time and expected AoI for all jobs j, respectively. Finally, Constraint (22) ensures the positivity of the scheduling probabilities.…”
Section: A Joint Optimization For Aoi and Completion Timementioning
confidence: 99%
“…Constraint (17) gives the load intensity at the second queue. Constraints (18) and (19) give the expression for the expected completion time and expected AoI for all jobs j, respectively. Finally, Constraint (22) ensures the positivity of the scheduling probabilities.…”
Section: A Joint Optimization For Aoi and Completion Timementioning
confidence: 99%
“…Applications that are running inside VMs are affected by many factors including virtualisation and co-allocation [17]. In the literature [20], [21], [22], [23], different approaches have been suggested to manage datacenter resources in an energy, performance and cost-efficient way. These can be categorized as: (i) resource provisioning; (ii) consolidation with migration; and (iii) methods that describe the trade-off between energy, performance and cost.…”
Section: Related Workmentioning
confidence: 99%
“…Xu et al have introduced a heterogeneity and interference‐aware VM provisioning framework (Heifer) for tenant applications by focusing on MapReduce as a representative cloud application. This framework calculated the MapReduce application's performance by developing a lightweight performance model based on the online‐measured resource utilization and capturing VM interference.…”
Section: Literature Reviewmentioning
confidence: 99%