2021 18th International SoC Design Conference (ISOCC) 2021
DOI: 10.1109/isocc53507.2021.9613997
|View full text |Cite
|
Sign up to set email alerts
|

CNN Accelerator with Minimal On-Chip Memory Based on Hierarchical Array

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 2 publications
0
5
0
Order By: Relevance
“…A Hybrid Precision FP MAC (HP-MAC) unit is presented as an example implementation of an accelerator for YOLOv2-Tiny, which consists of 3 × 3 convolution kernels in all nine layers based on the diagonal cyclic array proposed by [40]. The input activations are propagated horizontally through each row, while weight parameters are propagated vertically through each column of the 3 × 3 array, as illustrated in Figure 10.…”
Section: Hpfp Multiplication and Accumulation (Hpfp Mac)mentioning
confidence: 99%
“…A Hybrid Precision FP MAC (HP-MAC) unit is presented as an example implementation of an accelerator for YOLOv2-Tiny, which consists of 3 × 3 convolution kernels in all nine layers based on the diagonal cyclic array proposed by [40]. The input activations are propagated horizontally through each row, while weight parameters are propagated vertically through each column of the 3 × 3 array, as illustrated in Figure 10.…”
Section: Hpfp Multiplication and Accumulation (Hpfp Mac)mentioning
confidence: 99%
“…Including state machine control, register configuration, and address updates during continuous computing. In addition to the convolution layer, the operation core also supports activation and pooling, and the three functional modules are cascaded [5] .…”
Section: Convolutional Layer Operation Analysismentioning
confidence: 99%
“…Next, we present two experiments to compare the inference computation times of quantized models from the ONNX Run-Time dynamic and the proposed method. The first experiment is based on actual inference on a GPU-based PC, while the second experiment is based on the estimation of computation time on an NPU architecture [48,49]. In the first experiment, the quantized YOLOv5 models are tested using a GPU-based PC with an Intel(R) Core (TM) i7-9700 CPU @ 3.00GHz, GPU NVIDIA GeForce RTX 2060 (6GB).…”
Section: Cnn Model Number Of Parameterized Layersmentioning
confidence: 99%
“…Table 2 demonstrates that the proposed method offers a substantially higher speed improvement for deeper and more complex CNNs. In the second experiment, we estimated the computational time based on the NPU architecture simulator reported in [48,49] using YOLOv5-n (3) model, as shown in Table 3. Table 3 compares the NPU's estimated inference time for ONNX Run-Time dynamic and USPIQ.…”
Section: Cnn Model Number Of Parameterized Layersmentioning
confidence: 99%
See 1 more Smart Citation