Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference 2017
DOI: 10.1145/3135974.3135993
|View full text |Cite
|
Sign up to set email alerts
|

Swayam

Abstract: Developers use Machine Learning (ML) platforms to train ML models and then deploy these ML models as web services for inference (prediction). A key challenge for platform providers is to guarantee response-time Service Level Agreements (SLAs) for inference workloads while maximizing resource e ciency. Swayam is a fully distributed autoscaling framework that exploits characteristics of production ML inference workloads to deliver on the dual challenge of resource e ciency and SLA compliance. Our key contributio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 75 publications
(23 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…MS [38] INFaaS [30] Cocktail [20] VPA [9] InfAdapter Cost Optimization ✕ ✓ ✓ * ✓ ✓ Accuracy Maximization ✓ ✕ ✓ ✕ ✓ Predictive Decision-Making ✓ ✕ ✓ ✓ ✓ Container as a Service (CaaS) ✕ ✕ ✕ ✓ ✓ Latency SLO-aware ✓ ✓ ✓ ✕ ✓ machine translation, chatbots, medical, and recommender systems, are running in data centers [13,28,32,34], comprising more than 90% of computing resources allocated to ML [10,13,25]. ML inference services are user-facing, which mandates high responsiveness [18,37]. Moreover, high accuracy is crucial for these services [20,26].…”
Section: Featurementioning
confidence: 99%
See 4 more Smart Citations
“…MS [38] INFaaS [30] Cocktail [20] VPA [9] InfAdapter Cost Optimization ✕ ✓ ✓ * ✓ ✓ Accuracy Maximization ✓ ✕ ✓ ✕ ✓ Predictive Decision-Making ✓ ✕ ✓ ✓ ✓ Container as a Service (CaaS) ✕ ✕ ✕ ✓ ✓ Latency SLO-aware ✓ ✓ ✓ ✕ ✓ machine translation, chatbots, medical, and recommender systems, are running in data centers [13,28,32,34], comprising more than 90% of computing resources allocated to ML [10,13,25]. ML inference services are user-facing, which mandates high responsiveness [18,37]. Moreover, high accuracy is crucial for these services [20,26].…”
Section: Featurementioning
confidence: 99%
“…Moreover, high accuracy is crucial for these services [20,26]. Consequently, inference systems must deliver highly accurate predictions with fewer computing resources (cost-efficient) while meeting latency constraints under workload variations [18,19,29,37].…”
Section: Featurementioning
confidence: 99%
See 3 more Smart Citations