Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2021
DOI: 10.1145/3458817.3476143
|View full text |Cite
|
Sign up to set email alerts
|

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
2
1

Relationship

3
6

Authors

Journals

citations
Cited by 38 publications
(17 citation statements)
references
References 34 publications
0
17
0
Order By: Relevance
“…Ebird [13] also proposes a batchbased approach to enable concurrent execution of DNNs with high data transfer-compute overlapping. Abacus [14] proposes an operator overlapping strategy based on precise latency prediction. Besides the multi-DNN serving scenario, emerging microservicebased workloads also have complex inner structures similar to DNN models [20], to which our design may also be applied.…”
Section: Multi-tenant Deep Learning Servicementioning
confidence: 99%
“…Ebird [13] also proposes a batchbased approach to enable concurrent execution of DNNs with high data transfer-compute overlapping. Abacus [14] proposes an operator overlapping strategy based on precise latency prediction. Besides the multi-DNN serving scenario, emerging microservicebased workloads also have complex inner structures similar to DNN models [20], to which our design may also be applied.…”
Section: Multi-tenant Deep Learning Servicementioning
confidence: 99%
“…I1: Deterministic online execution [28,51]. Different from offline training which could be resource-intensive and last for days or weeks, the inference for each query is often completed with sub-second response time and consumes much less resources.…”
Section: Trainingmentioning
confidence: 99%
“…Reducing the parallelism of execution eliminates the interference from other tasks, but inevitably brings lower throughput and resource utilization. To address this issue, Abacus [29] tries to guarantee SLO for query requests under the GPU co-location scenarios. It controls the execution sequence and the co-location situation proactively, rather than the default random-ordered execution overlap.…”
Section: Efficiencymentioning
confidence: 99%
“…Ebird [59] also proposes a batch-based approach to enable concurrent execution of DNNs with high data transfercompute overlapping. Abacus [60] proposes an operator overlapping strategy based on precise latency prediction. Besides the multi-DNN serving scenario, emerging microservice-based workloads also have complex inner structures similar to DNN models [61], to which our design may also be applied.…”
Section: Multi-tenant Deep Learning Servicementioning
confidence: 99%