2019
DOI: 10.1587/elex.16.20190633
|View full text |Cite
|
Sign up to set email alerts
|

A streaming accelerator of Convolutional Neural Networks for resource-limited applications

Abstract: Convolutional Neuronal Networks (CNN) implementation on embedded devices is restricted due to the number of layers of some CNN models. In this context, this paper describes a novel architecture based on Layer Operation Chaining (LOC) which uses fewer convolvers than convolution layers. A reutilization of hardware convolvers is promoted through kernel decomposition. Thus, an architectural design with reduced resources utilization is achieved, suitable to be implemented on low-end devices as a solution for porta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 29 publications
(52 reference statements)
0
5
0
Order By: Relevance
“…The processing time per image for our architecture is 245.6, 161.83, and 261.75 us for the one-, two-, and three-convolution layer CNNs, respectively. In the case of the custom CNN with three convolution layers, this time is less than that obtained by our previous works [22,53] and that reported in [54] that uses a CNN of similar complexity. The throughput for our architecture is in the range of [0.95, 2.71] GOp/s, which is superior to the works that are based on a similar CNN model.…”
Section: Resultsmentioning
confidence: 60%
See 1 more Smart Citation
“…The processing time per image for our architecture is 245.6, 161.83, and 261.75 us for the one-, two-, and three-convolution layer CNNs, respectively. In the case of the custom CNN with three convolution layers, this time is less than that obtained by our previous works [22,53] and that reported in [54] that uses a CNN of similar complexity. The throughput for our architecture is in the range of [0.95, 2.71] GOp/s, which is superior to the works that are based on a similar CNN model.…”
Section: Resultsmentioning
confidence: 60%
“…This table shows the design strategy (method) for each work, the complexity of the CNN model, the throughput (quantified in millions of operations per second), and the time required to process an image. In the case of complexity, for the works [22,53] and ours, the total number of multiplications required by the network is considered. For the remaining works, it is assumed that complexity is expressed in multiplications-accumulations (GMACs or GOps) since no information is provided on how this metric was quantified.…”
Section: Resultsmentioning
confidence: 99%
“…The designs proposed in [16,17,21] can deal with convolutions of several common kernel sizes, but it is still not applicable to convolutions of any kernel sizes. The authors in [22][23][24][25][26] adopt multiple computing engines to deal with the convolution with different kernel sizes for improving performance. However, this design costs too much hardware resources, which is not suitable for low-cost FPGAs.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, without elaborately design, this mismatch between data and computation elements may lead to ultra-low utilization of FPGA resources, which is undesirable for low-cost FPGAs. For example, in [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], although the size of the CNN model is significanly reduced, the bit utilization of FPGA resources is very low. To solve the above problems, this brief proposes an efficient hardware accelerator for low-bit quantized lightweight CNN models and the contribution can be summarized as follows:…”
Section: Introductionmentioning
confidence: 99%
“…Several machine learning (ML) techniques have been implemented to classify brain activity, diseases, and behaviours and have achieved approximate solutions from this discipline [3][4][5][6]. Even due to the remarkable results of these techniques, dedicated hardware is currently being created for ML tasks [7][8][9]. An interesting example of ML applied to MRIs is presented by [10]; in their study, they investigated deep learning framework algorithms for predicting the Soil Organic Matter (SOM) content by VIS-NIR spectroscopy.…”
Section: Introductionmentioning
confidence: 99%