2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) 2016
DOI: 10.1109/reconfig.2016.7857144
|View full text |Cite
|
Sign up to set email alerts
|

A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(16 citation statements)
references
References 15 publications
0
15
0
Order By: Relevance
“…Frequent execution of data caching and parameter loading will be limited by the bandwidth. Therefore, in many studies, their hardware structures of CNNs are designed mainly for the two bottlenecks of floating-point resources and bandwidth [12], [13], [16], [17].…”
Section: B Cnns Implemented By Fpgasmentioning
confidence: 99%
“…Frequent execution of data caching and parameter loading will be limited by the bandwidth. Therefore, in many studies, their hardware structures of CNNs are designed mainly for the two bottlenecks of floating-point resources and bandwidth [12], [13], [16], [17].…”
Section: B Cnns Implemented By Fpgasmentioning
confidence: 99%
“…In detail, each of the DMACs serves as a bus master and accesses the DRAM subsystem through the on-chip bus. The hardware accelerator is assumed to be equipped with either one ( [1], [8,9]) or multiple DMACs ( [2], [5]- [7]) and each of the DMACs accesses the DRAM subsystem as a bus master. For example,…”
Section: System Under Considerationmentioning
confidence: 99%
“…After all, it follows that the processor core makes it possible to reconfigure the hardware accelerator according to the bank allocations and number of DMACs. Since a hardware accelerator is usually designed as a standalone IP block, a standardized interface may ease the integration into the system [1,2], [6]- [9]. The AMBA AXI4 interface, the standardized interface used in [27], is assumed in this work, as illustrated in Figure 1 (a).…”
Section: On-chip Off-chipmentioning
confidence: 99%
See 1 more Smart Citation
“…For embedded systems that allow the FPGA accelerator itself [10] [11] to proactively fetch data from main into local FPGA memory, the state of the art is still copy-based shared memory. The main memory is statically split into two sections: one exclusively accessed by the host via cached, paged virtual addressing, and a second that is accessed by both the host and the FPGA via uncached, contiguous physical addressing.…”
Section: Introductionmentioning
confidence: 99%