Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture 2019
DOI: 10.1145/3352460.3358302
|View full text |Cite
|
Sign up to set email alerts
|

Simba

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
60
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 232 publications
(67 citation statements)
references
References 40 publications
0
60
0
Order By: Relevance
“…• For networks with low operation count per layer (e.g., ResNet and AlexNet), they can benefit a lot from layer fusion provided by our work (Comparing to the rulesbased fusion in TensorRT, which means one fusion block contain only one CONV at most). • For mobileNet, although it has low operation count per layer, it uses depthwise separable convolution with different access patterns with low data reuse (Shao et al 2019;Sandler et al 2018) compared to ordinary convolution, which need manually tunning of even lowerlevel code than provided SDK. So, the fine-tunned Ten-sorRT outperforms our work on mobileNet.…”
Section: Discussionmentioning
confidence: 99%
“…• For networks with low operation count per layer (e.g., ResNet and AlexNet), they can benefit a lot from layer fusion provided by our work (Comparing to the rulesbased fusion in TensorRT, which means one fusion block contain only one CONV at most). • For mobileNet, although it has low operation count per layer, it uses depthwise separable convolution with different access patterns with low data reuse (Shao et al 2019;Sandler et al 2018) compared to ordinary convolution, which need manually tunning of even lowerlevel code than provided SDK. So, the fine-tunned Ten-sorRT outperforms our work on mobileNet.…”
Section: Discussionmentioning
confidence: 99%
“…There has been an incredible amount of interest in DNN hardware acceleration. Broadly speaking, the architecture community has focused on designing efficient dataflows to maximize local reuse of data and functional unit utilization [4,10,11,15,28,34,37,39], explore the space of possible dataflows and mappings [26,45,74], exploit model sparsity and data quantization [17,21,29,38,46,53,71,73,78], map DNN accelerators to FPGAs [20,66,69], and explore alternative compute, memory, and packaging technologies [35,58,59,67]. All of these works are highly relevant to this field.…”
Section: Related Workmentioning
confidence: 99%
“…Image classification applications widely use deep convolutional neural networks (CNNs) and are deployed from cloud to edge computational frameworks for varieties of scenarios, such as search engines and self-driving cars [1,2,3,4,5,6]. As the complexity of these applications and the resolution of images continue to increase, conventional homogeneous architectures (such as multi-core CPU/GPU) are constrained due to an excessive long latency and significant power dissipation [7,8,9]. To efficiently process these applications, heterogeneous architectures have been proposed with pre-processing and inference cores [7,8,9,10,11,12,13].…”
Section: Introductionmentioning
confidence: 99%
“…As the complexity of these applications and the resolution of images continue to increase, conventional homogeneous architectures (such as multi-core CPU/GPU) are constrained due to an excessive long latency and significant power dissipation [7,8,9]. To efficiently process these applications, heterogeneous architectures have been proposed with pre-processing and inference cores [7,8,9,10,11,12,13].…”
Section: Introductionmentioning
confidence: 99%