2020
DOI: 10.1109/les.2019.2947312
|View full text |Cite
|
Sign up to set email alerts
|

Exploring NEURAghe: A Customizable Template for APSoC-Based CNN Inference at the Edge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…We do not account for contention on the off-chip memory, since previous experiments have shown that the effect of this issue is limited. [51] shows this for the case of multiple NEURAghe instances insisting on the same DDR memory. We also assume, as in most of the approaches in the literature, communication with the host to be asynchronous, thus we do not consider the host CPU intervention to become a bottleneck.…”
Section: Cnn Metric Aggregationmentioning
confidence: 82%
See 1 more Smart Citation
“…We do not account for contention on the off-chip memory, since previous experiments have shown that the effect of this issue is limited. [51] shows this for the case of multiple NEURAghe instances insisting on the same DDR memory. We also assume, as in most of the approaches in the literature, communication with the host to be asynchronous, thus we do not consider the host CPU intervention to become a bottleneck.…”
Section: Cnn Metric Aggregationmentioning
confidence: 82%
“…Despite systems with multiple accelerators being hardly available in the embedded domain, some research efforts have demonstrated that such a design technique can be useful [51], [52]. Thus our aggregation methodology enables the estimation to take into account an arbitrary number of processing elements.…”
Section: Cnn Metric Aggregationmentioning
confidence: 99%
“…In Table VIII we report comparative results with state-of-the-art on two well known networks for image classification, ResNet-18 and VGG-16. We compare to other accelerator architectures, [44], [45] and [46], that are implemented on the same kind of hardware and use the same 16-bit data precision. On VGG-16, that exposes quite regular kernel sizes and stride values, our work shows comparable performance with respect to the alternatives.…”
Section: E Comparison To Other Apsoc Based Acceleratorsmentioning
confidence: 99%
“…On VGG-16, that exposes quite regular kernel sizes and stride values, our work shows comparable performance with respect to the alternatives. It executes convolutions slightly faster than [44] and [45] and around 13% slower than [46]. On ResNet-18, which exposes more variable kernel sizes and strides, we can reduce a lot of overheads that must be payed by more static architectures, executing the whole convolution workload 40% faster than [44].…”
Section: E Comparison To Other Apsoc Based Acceleratorsmentioning
confidence: 99%
See 1 more Smart Citation