2021
DOI: 10.1109/mm.2021.3097287
|View full text |Cite
|
Sign up to set email alerts
|

Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 3 publications
0
2
0
Order By: Relevance
“…1 Service-level: AI-centric cloud services handle millions of service queries simultaneously [44]. With the massive computing capacity of GPUs, multiple DL queries could be strategically co-located for efficient concurrent execution, which is one key difference between multi-tenant GPU computing versus traditional CPU multi-tasking.…”
Section: A Challenges For Multi-tenant DL Computingmentioning
confidence: 99%
“…1 Service-level: AI-centric cloud services handle millions of service queries simultaneously [44]. With the massive computing capacity of GPUs, multiple DL queries could be strategically co-located for efficient concurrent execution, which is one key difference between multi-tenant GPU computing versus traditional CPU multi-tasking.…”
Section: A Challenges For Multi-tenant DL Computingmentioning
confidence: 99%
“…Such optimization mainly lie in data center-level management for optimal infrastructure utilization and cost. Currently, public available works [18,47,50] mainly target at optimizing training jobs consuming more resources (e.g., 4/8-GPU machines, taking hours to days). There are still limited public works targeting at inference MIMD optimizations.…”
Section: Large-scale DL Serving System: a Novel Taxonomymentioning
confidence: 99%