A streaming accelerator of Convolutional Neural Networks for resource-limited applications

Arredondo-Velázquez, Moisés; Díaz-Carmona, Javier; Torres-Huitzil, César; Barranco-Gutiérrez, Alejandro I.; Padilla-Medina, José A.; Prado-Olivarez, Juan

doi:10.1587/elex.16.20190633

Cited by 5 publications

(6 citation statements)

References 29 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The processing time per image for our architecture is 245.6, 161.83, and 261.75 us for the one-, two-, and three-convolution layer CNNs, respectively. In the case of the custom CNN with three convolution layers, this time is less than that obtained by our previous works [22,53] and that reported in [54] that uses a CNN of similar complexity. The throughput for our architecture is in the range of [0.95, 2.71] GOp/s, which is superior to the works that are based on a similar CNN model.…”

Section: Resultsmentioning

confidence: 60%

“…This table shows the design strategy (method) for each work, the complexity of the CNN model, the throughput (quantified in millions of operations per second), and the time required to process an image. In the case of complexity, for the works [22,53] and ours, the total number of multiplications required by the network is considered. For the remaining works, it is assumed that complexity is expressed in multiplications-accumulations (GMACs or GOps) since no information is provided on how this metric was quantified.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications

et al. 2022

Self Cite

View full text Add to dashboard Cite

This paper introduces a flexible convolver capable of adapting to the different convolution layer configurations of state-of-the-art Convolution Neural Networks (CNNs). The use of two proposed programmable components achieves this adaptability. A Programmable Line Buffer (PLB) based on Programmable Shift Registers (PSRs) allows the generation of the required convolution masks required for each processed CNN layer. The convolution layer computing is performed through a proposed programmable systolic array configured according to the target device resources. In order to maximize the device resource usage and to achieve a shortened processing time, the filter, data, and loop parallelisms are leveraged. These characteristics allow the described architecture to be scalable and implemented on any FPGA device targeting different applications. The convolver description was written in VHDL using the Intel Cyclone V 5CSXFC6D6F31C6N device as a reference. The experimental results show that the proposed computing method allows the processing of any CNN without requiring special adaptation for a specific application since the standard convolution algorithm is used. The proposed flexible convolver achieves competitive performance compared with those reported in related works.

show abstract

Section: Resultsmentioning

confidence: 60%

Section: Resultsmentioning

confidence: 99%

Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…The designs proposed in [16,17,21] can deal with convolutions of several common kernel sizes, but it is still not applicable to convolutions of any kernel sizes. The authors in [22][23][24][25][26] adopt multiple computing engines to deal with the convolution with different kernel sizes for improving performance. However, this design costs too much hardware resources, which is not suitable for low-cost FPGAs.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, without elaborately design, this mismatch between data and computation elements may lead to ultra-low utilization of FPGA resources, which is undesirable for low-cost FPGAs. For example, in [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], although the size of the CNN model is significanly reduced, the bit utilization of FPGA resources is very low. To solve the above problems, this brief proposes an efficient hardware accelerator for low-bit quantized lightweight CNN models and the contribution can be summarized as follows:…”

Section: Introductionmentioning

confidence: 99%

Design and implementation of an efficient CNN accelerator for low-cost FPGAs

Wang

et al. 2022

IEICE Electron. Express

View full text Add to dashboard Cite

This paper proposes a computation-array-centered dataflow, which adjusts the convolution with different kernel sizes to a unified computing manner and reduces the dimension of computation array from 2D to 1D, so as to maximize the utilization of the computation elements offered by the accelerator. Furthermore, a single unit multiple data (SUMD) strategy is proposed to effectively alleviate the mismatch between the quantized data and the hardware resources with fixed bit width on FPGA. As a case study, an 8-bit MobileNetV2 model has been implemented on the low-cost ZYNQ XC7Z020 FPGA, whose FPS/DSP and GOPS/DSP achieve upto 0.55 and 0.35 respectively.

show abstract

“…Several machine learning (ML) techniques have been implemented to classify brain activity, diseases, and behaviours and have achieved approximate solutions from this discipline [3][4][5][6]. Even due to the remarkable results of these techniques, dedicated hardware is currently being created for ML tasks [7][8][9]. An interesting example of ML applied to MRIs is presented by [10]; in their study, they investigated deep learning framework algorithms for predicting the Soil Organic Matter (SOM) content by VIS-NIR spectroscopy.…”

Section: Introductionmentioning

confidence: 99%

Machine Learning for Brain Images Classification of Two Language Speakers

Barranco-Gutiérrez

2020

Computational Intelligence and Neuroscience

Self Cite

View full text Add to dashboard Cite

The image analysis of the brain with machine learning continues to be a relevant work for the detection of different characteristics of this complex organ. Recent research has observed that there are differences in the structure of the brain, specifically in white matter, when learning and using a second language. This work focuses on knowing the brain from the classification of Magnetic Resonance Images (MRIs) of bilingual and monolingual people who have English as their common language. Different artificial neural networks of a hidden layer were tested until reaching two neurons in that layer. The number of entries used was nine hundred and the classifier registered a high percentage of effectiveness. The training was supervised which could be improved in a future investigation. This task is usually carried out by an expert human with Tract-Based Spatial Statistics analysis and fractional anisotropy expressed in different colors on a screen. So, this proposal presents another option to quantitatively analyse this type of phenomena which allows to contribute to neuroscience by automatically detecting bilingual people of monolinguals by using machine learning from MRIs. This reinforces what is reported in manual detections and the way that a machine can do it.

show abstract

A streaming accelerator of Convolutional Neural Networks for resource-limited applications

Cited by 5 publications

References 29 publications

Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications

Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications

Design and implementation of an efficient CNN accelerator for low-cost FPGAs

Machine Learning for Brain Images Classification of Two Language Speakers

Contact Info

Product

Resources

About