2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 2020
DOI: 10.1109/isca45697.2020.00081
|View full text |Cite
|
Sign up to set email alerts
|

A Multi-Neural Network Acceleration Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
50
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 74 publications
(53 citation statements)
references
References 43 publications
1
50
0
Order By: Relevance
“…So, to fully utilize the hardware and increase the energy efficiency, increasing deep learning service providers begin to introduce the task-level parallelism by sharing one computing hardware among multiple customers/requests which is called multi-tenant deep learning service, by either temporal multiplexing (e.g. PREMA [12], AI-MT [2]) or spatial multiplexing (e.g. NVIDIA Multi-Process Service [44], NVIDIA Ampere Multi-Instance GPU [47]).…”
Section: Dnn Execution Characterization On Cpumentioning
confidence: 99%
See 3 more Smart Citations
“…So, to fully utilize the hardware and increase the energy efficiency, increasing deep learning service providers begin to introduce the task-level parallelism by sharing one computing hardware among multiple customers/requests which is called multi-tenant deep learning service, by either temporal multiplexing (e.g. PREMA [12], AI-MT [2]) or spatial multiplexing (e.g. NVIDIA Multi-Process Service [44], NVIDIA Ampere Multi-Instance GPU [47]).…”
Section: Dnn Execution Characterization On Cpumentioning
confidence: 99%
“…Our main finding is two-fold. First, a fixed scheduling granularity, such as the entire model [12] or the sub-layer block [2,21], leads to the sub-optimal performance, owing to the diversity of DNN models and their distinctive inner characteristics. Second, the performance of the existing compilation strategies, which aim to maximize the code performance under the solo-run case, degrades significantly when multiple DNN models run together and interfere with each other.…”
Section: Optimization Space Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the wide spectrum of NPUs and DNN models, it is nearly impossible to balance the usage of NPU resources for all DNNs. Recently, several proposals have addressed this problem by time-multiplexing layer-wise execution of multiple DNN models with opposing characteristics (e.g., memory-intensive and compute-intensive) to saturate both compute and memory bandwidth [5], [12]. Figure 1 vidual requests.…”
Section: Limitations Of the Prior Artmentioning
confidence: 99%