2018 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2018
DOI: 10.23919/date.2018.8342188
|View full text |Cite
|
Sign up to set email alerts
|

Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA

Abstract: Deep convolutional neural networks have achieved remarkable progress in recent years. However, the large volume of intermediate results generated during inference poses a significant challenge to the accelerator design for resourceconstraint FPGA. Due to the limited on-chip storage, partial results of intermediate layers are frequently transferred back and forth between on-chip memory and off-chip DRAM, leading to a non-negligible increase in latency and energy consumption. In this paper, we propose block conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(30 citation statements)
references
References 19 publications
0
29
0
Order By: Relevance
“…Since the majority of the computations in a network are matrix-matrix/matrix-vector multiplication, it is critical to deal with the massive nested loops to achieve high throughput. Loop optimization is one of the most frequently adopted techniques in accelerator design [92,56,73,2,88,49], including loop tiling, loop unrolling, loop interchange, etc. Loop tiling is used to divide all of the data into multiple small blocks in order to alleviate the pressure of onchip storage [56,2,64], while loop unrolling attempts to improve the parallelism of the computing engine for high speed [56,64].…”
Section: Optimizing For High Throughputmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the majority of the computations in a network are matrix-matrix/matrix-vector multiplication, it is critical to deal with the massive nested loops to achieve high throughput. Loop optimization is one of the most frequently adopted techniques in accelerator design [92,56,73,2,88,49], including loop tiling, loop unrolling, loop interchange, etc. Loop tiling is used to divide all of the data into multiple small blocks in order to alleviate the pressure of onchip storage [56,2,64], while loop unrolling attempts to improve the parallelism of the computing engine for high speed [56,64].…”
Section: Optimizing For High Throughputmentioning
confidence: 99%
“…[70] designed a flexible data buffing scheme to reduce bandwidth requirements, and [2] and [88] proposed a fusion-based method to reduce off-chip traffic. Most recently, [49] presented a block-based convolution that can completely avoid offchip transfers of intermediate data in VGG-16 with high throughput.…”
Section: Optimizing For Low Energy Consumptionmentioning
confidence: 99%
“…However, as discussed in [33], in state-of-the-art deep CNNs, CONVs consume most of the computational time, thus becoming one of the most critical tasks responsible for limiting reachable speed performances. For this reason, the design of hardware parallel convolutional engines suitable for the inference of deep CNNs in high-performance low-power applications has recently received a great deal of attention [26][27][28][29][30][31]34]. The most exploited design techniques aim to boost the achievable performances by increasing the level of parallelism with which data is processed [28][29][30][31]34].…”
Section: Background and Motivationsmentioning
confidence: 99%
“…For this reason, the design of hardware parallel convolutional engines suitable for the inference of deep CNNs in high-performance low-power applications has recently received a great deal of attention [26][27][28][29][30][31]34]. The most exploited design techniques aim to boost the achievable performances by increasing the level of parallelism with which data is processed [28][29][30][31]34]. Indeed, as is visible in Figure 1a, most of the computations involved in a convolutional layer are independent from each other, offering the possibility of parallelizing the operations within the kernel and across both ifmaps and ofmaps.…”
Section: Background and Motivationsmentioning
confidence: 99%
See 1 more Smart Citation