Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads

Wesolowski, Lukasz; Acun, Bilge; Andrei, Valentin; Aziz, Adnan; Dankel, Gisle; Gregg, Chris; Meng, Xiaofeng; Meurillon, Cyril; Sheahan, Denis; Tian, Lei; Yang, Janet Z.; Yu, Peifeng; Hazelwood, Kim

doi:10.1109/mm.2021.3097287

Cited by 4 publications

(2 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1 Service-level: AI-centric cloud services handle millions of service queries simultaneously [44]. With the massive computing capacity of GPUs, multiple DL queries could be strategically co-located for efficient concurrent execution, which is one key difference between multi-tenant GPU computing versus traditional CPU multi-tasking.…”

Section: A Challenges For Multi-tenant DL Computingmentioning

confidence: 99%

A Survey of Multi-Tenant Deep Learning Inference on GPU

Yu¹,

Wang²,

Shangguan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Learning (DL) models have achieved superior performance. Meanwhile, the computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2× throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU become widely deployed to improve resource utilization, enhance serving throughput, and reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technique advances, we hope that this survey could shed lights on new optimization perspectives and motivate novel works in future large-scale DL system optimization.

show abstract

Section: A Challenges For Multi-tenant DL Computingmentioning

confidence: 99%

A Survey of Multi-Tenant Deep Learning Inference on GPU

Yu¹,

Wang²,

Shangguan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Such optimization mainly lie in data center-level management for optimal infrastructure utilization and cost. Currently, public available works [18,47,50] mainly target at optimizing training jobs consuming more resources (e.g., 4/8-GPU machines, taking hours to days). There are still limited public works targeting at inference MIMD optimizations.…”

Section: Large-scale DL Serving System: a Novel Taxonomymentioning

confidence: 99%

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

Yu¹,

Wang²,

Shangguan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc. With the fast development, both DL applications and the underlying serving hardware have demonstrated strong scaling trends, i.e., Model Scaling and Compute Scaling, for example, the recent pre-trained model with hundreds of billions of parameters with ∼TB level memory consumption, as well as the newest GPU accelerators providing hundreds of TFLOPS. With both scaling trends, new problems and challenges emerge in DL inference serving systems, which gradually trends towards Large-scale Deep learning Serving system (LDS). This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. By providing a novel taxonomy, summarizing the computing paradigms, and elaborating the recent technique advances, we hope that this survey could shed lights on new optimization perspectives and motivate novel works in large-scale deep learning system optimization.

show abstract