2020
DOI: 10.1109/access.2020.3031055
|View full text |Cite
|
Sign up to set email alerts
|

CNN Acceleration With Hardware-Efficient Dataflow for Super-Resolution

Abstract: The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the intensive computation with severely unbalanced computation load among layers. Various light-weighted SR networks have been researched with little performance degradation. However, the hardware-efficient dataflow is also required to efficiently accelerate inference hardware within limited resources. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(23 citation statements)
references
References 36 publications
1
21
0
Order By: Relevance
“…The experiments, performed on the Xilinx XC7K410T field programmable gate array (FPGA) chip, demonstrated the benefits of the proposed approach in terms of area occupancy and energy saving over several state-of-the-art counterparts. In fact, the new accelerator exhibited a logic resource requirement and a power consumption up to ~63% and ~48% lower, respectively, than previous designs [ 11 , 13 , 14 , 15 , 16 , 17 ]. The adopted parallelism and the achieved 227 MHz running frequency allow the above advantages to be obtained without compromising the competitiveness of the proposed design in terms of speed performance.…”
Section: Introductionmentioning
confidence: 93%
See 2 more Smart Citations
“…The experiments, performed on the Xilinx XC7K410T field programmable gate array (FPGA) chip, demonstrated the benefits of the proposed approach in terms of area occupancy and energy saving over several state-of-the-art counterparts. In fact, the new accelerator exhibited a logic resource requirement and a power consumption up to ~63% and ~48% lower, respectively, than previous designs [ 11 , 13 , 14 , 15 , 16 , 17 ]. The adopted parallelism and the achieved 227 MHz running frequency allow the above advantages to be obtained without compromising the competitiveness of the proposed design in terms of speed performance.…”
Section: Introductionmentioning
confidence: 93%
“…Unfortunately, these characteristics may represent a bottleneck for those application scenarios in which real time and low power are mandatory. For this reason, designing ad-hoc hardware accelerators suitable for exploitation also within time- and power-constrained operating environments has recently received a great deal of attention [ 11 , 12 , 13 , 14 , 15 , 16 , 17 , 19 , 20 , 21 , 22 , 23 ]. Among the possible hardware realization platforms, FPGAs are widely recognized as powerful solutions [ 11 , 13 , 15 , 17 , 20 ] for merging the benefits from custom hardware designs, such as computational parallelism and limited energy consumption, with the strengths of software designs, including reconfigurability and short time to market.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Then, the WCB generates the final 8-b output by combining four 4-b elements of data. The four VAs of kth kernel operation represent the product as 15 15…”
Section: B Macro Architecturementioning
confidence: 99%
“…This problem is referred to as the von Neumann bottleneck or memory wall [11]. Several innovative approaches have been presented to address this issue [12]- [15].…”
Section: Introductionmentioning
confidence: 99%