2022 IEEE Real-Time Systems Symposium (RTSS) 2022
DOI: 10.1109/rtss55097.2022.00032
|View full text |Cite
|
Sign up to set email alerts
|

Jellyfish: Timely Inference Serving for Dynamic Edge Networks

Abstract: General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 31 publications
(42 reference statements)
0
3
0
Order By: Relevance
“…ML inference services are user-facing, which mandates high responsiveness [18,37]. Moreover, high accuracy is crucial for these services [20,26]. Consequently, inference systems must deliver highly accurate predictions with fewer computing resources (cost-efficient) while meeting latency constraints under workload variations [18,19,29,37].…”
Section: Featurementioning
confidence: 99%
See 2 more Smart Citations
“…ML inference services are user-facing, which mandates high responsiveness [18,37]. Moreover, high accuracy is crucial for these services [20,26]. Consequently, inference systems must deliver highly accurate predictions with fewer computing resources (cost-efficient) while meeting latency constraints under workload variations [18,19,29,37].…”
Section: Featurementioning
confidence: 99%
“…Conversely, overprovisioning wastes computing resources [30,36]. To address these problems caused by dynamic workloads, Autoscaling [2,9,17,18,30,36] resizes the resources of the service, and Model-switching [26,38] switches between ML model variants that differ in their inference latency and accuracy (higher accuracy, higher latency); the former tries to be cost-efficient, and the latter tries to be more accurate, while both guarantee latency SLOs.…”
Section: Featurementioning
confidence: 99%
See 1 more Smart Citation