Customizable FPGA-Based Hardware Accelerator for Standard Convolution Processes Empowered with Quantization Applied to LiDAR Data

Silva, João; Pereira, Pedro; Machado, Rui; Névoa, Rafael; Melo‐Pinto, Pedro; Fernandes, Duarte

doi:10.3390/s22062184

Cited by 7 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quantization of the ANN model is a crucial step for successful deployment. It is the process of reducing the bit width of a deep learning model's weights and activation functions by sharing parameters, decreasing hardware resource usage, and consequently optimizing the model for the target FPGA [40]. Apache TVM can convert a high-level ANN model into a deployable quantized module on a range of hardware platforms.…”

Section: Ai-accelerator Vtamentioning

confidence: 99%

“…In addition, it utilizes VTA to perform further quantization to optimize the ANN models by creating a balance between model accuracy and FPGA resource constraints. The purpose of such a quantization process is to enhance time efficiency and better resource utilization since quantization affects the performance of a model as a function of the model depth [40]. Other than quantization, VTA also performs other operations, including the fetch, load, compute, and store, that work together to manage the data flow and optimize the performance of the inference process.…”

Section: Ai-accelerator Vtamentioning

confidence: 99%

See 1 more Smart Citation

Time-Efficient Identification Procedure for Neurological Complications of Rescue Patients in an Emergency Scenario Using Hardware-Accelerated Artificial Intelligence Models

2023

View full text Add to dashboard Cite

During an emergency rescue operation, rescuers have to deal with many different health complications like cardiovascular, respiratory, neurological, psychiatric, etc. The identification process of the common health complications in rescue events is not very difficult or time-consuming because the health vital symptoms or primary observations are enough to identify, but it is quite difficult with some complications related to neurology e.g., schizophrenia, epilepsy with non-motor seizures, or retrograde amnesia because they cannot be identified with the trend of health vital data. The symptoms have a wide spectrum and are often non-distinguishable from other types of complications. Further, waiting for results from medical tests like MRI and ECG is time-consuming and not suitable for emergency cases where a quick treatment path is an obvious necessity after the diagnosis. In this paper, we present a novel solution for overcoming these challenges by employing artificial intelligence (AI) models in the diagnostic procedure of neurological complications in rescue situations. The novelty lies in the procedure of generating input features from raw rescue data used in AI models, as the data are not like traditional clinical data collected from hospital repositories. Rather, the data were gathered directly from more than 200,000 rescue cases and required natural language processing techniques to extract meaningful information. A step-by-step analysis of developing multiple AI models that can facilitate the fast identification of neurological complications, in general, is presented in this paper. Advanced data analytics are used to analyze the complete record of 273,183 rescue events in a duration of almost 10 years, including rescuers’ analysis of the complications and their diagnostic methods. To develop the detection model, seven different machine learning algorithms-Support Vector Machine (SVM), Random Forest (RF), K-nearest neighbor (KNN), Extreme Gradient Boosting (XGB), Logistic Regression (LR), Naive Bayes (NB) and Artificial Neural Network (ANN) were used. Observing the model’s performance, we conclude that the neural network and extreme gradient boosting show the best performance in terms of selected evaluation criteria. To utilize this result in practical scenarios, the paper also depicts the possibility of embedding such machine learning models in hardware like FPGA. The goal is to achieve fast detection results, which is a primary requirement in any rescue mission. An inference time analysis of the selected ML models and VTA AI accelerator of Apache-TVM machine learning compiler used for the FPGA is also presented in this research.

show abstract

Section: Ai-accelerator Vtamentioning

confidence: 99%

Section: Ai-accelerator Vtamentioning

confidence: 99%

Time-Efficient Identification Procedure for Neurological Complications of Rescue Patients in an Emergency Scenario Using Hardware-Accelerated Artificial Intelligence Models

2023

View full text Add to dashboard Cite

show abstract

“…Compared to other hardware convolution implementations, the voting convolution has a low consumption of both area and power. The approach followed in [25], for an efficient convolution implementation inspired by [26], presents a total consumption of 10,832 LUTs and the number of DSPs is proportional to the filter size multiplied by the number of allocated processing elements. The declared total power consumption is 1.739 Watts for just one convolution, being almost 8.7 times higher than the consumption required by the Voting Block.…”

Section: Functional Validationmentioning

confidence: 99%

“…During the various tests, the dense convolution implemented in [25] was used as a reference. Equation ( 3) presents an approximation of the processing time of the traditional convolution implemented in [25] according to the size of the input feature map (considering both IFM_channels and OFM_channels equal to one).…”

Section: Sparsity Effectmentioning

confidence: 99%

Efficient Hardware Design and Implementation of the Voting Scheme-Based Convolution

Pereira

Silva

et al. 2022

Sensors

Self Cite

View full text Add to dashboard Cite

Due to a point cloud’s sparse nature, a sparse convolution block design is necessary to deal with its particularities. Mechanisms adopted in computer vision have recently explored the advantages of data processing in more energy-efficient hardware, such as the FPGA, as a response to the need to run these algorithms on resource-constrained edge devices. However, implementing it in hardware has not been properly explored, resulting in a small number of studies aimed at analyzing the potential of sparse convolutions and their efficiency on resource-constrained hardware platforms. This article presents the design of a customizable hardware block for the voting convolution. We carried out an in-depth analysis to determine under which conditions the use of the voting scheme is justified instead of dense convolutions. The proposed hardware design achieves an energy consumption about 8.7 times lower than similar works in the literature by ignoring unnecessary arithmetic operations with null weights and leveraging data dependency. Access to data memory was also reduced to the minimum necessary, leading to improvements of around 55% in processing time. To evaluate both the performance and applicability of the proposed solution, the voting convolution was integrated into the well-known PointPillars model, where it achieves improvements between 23.05% and 80.44% without a significant effect on detection performance.

show abstract

“…In most cases, the idea is to implement a processing element that can run convolutions efficiently, on which the different layers are sequentially mapped, storing the intermediate result in main or local memory. For instance, Silva et al develop a highly configurable convolution block on an FPGA to accelerate object detection in autonomous driving applications [ 40 ]. Yan et al also develop an accelerator on FPGA, optimizing the design using resource multiplexing and parallel processing, limiting the implementation to

kernels, to avoid issues with reconfiguration [ 41 ].…”

Section: Introductionmentioning

confidence: 99%

Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI

Pistellato

Bergamasco

Bigaglia

et al. 2023

Sensors

View full text Add to dashboard Cite

Over the past few years, several applications have been extensively exploiting the advantages of deep learning, in particular when using convolutional neural networks (CNNs). The intrinsic flexibility of such models makes them widely adopted in a variety of practical applications, from medical to industrial. In this latter scenario, however, using consumer Personal Computer (PC) hardware is not always suitable for the potential harsh conditions of the working environment and the strict timing that industrial applications typically have. Therefore, the design of custom FPGA (Field Programmable Gate Array) solutions for network inference is gaining massive attention from researchers and companies as well. In this paper, we propose a family of network architectures composed of three kinds of custom layers working with integer arithmetic with a customizable precision (down to just two bits). Such layers are designed to be effectively trained on classical GPUs (Graphics Processing Units) and then synthesized to FPGA hardware for real-time inference. The idea is to provide a trainable quantization layer, called Requantizer, acting both as a non-linear activation for neurons and a value rescaler to match the desired bit precision. This way, the training is not only quantization-aware, but also capable of estimating the optimal scaling coefficients to accommodate both the non-linear nature of the activations and the constraints imposed by the limited precision. In the experimental section, we test the performance of this kind of model while working both on classical PC hardware and a case-study implementation of a signal peak detection device running on a real FPGA. We employ TensorFlow Lite for training and comparison, and use Xilinx FPGAs and Vivado for synthesis and implementation. The results show an accuracy of the quantized networks close to the floating point version, without the need for representative data for calibration as in other approaches, and performance that is better than dedicated peak detection algorithms. The FPGA implementation is able to run in real time at a rate of four gigapixels per second with moderate hardware resources, while achieving a sustained efficiency of 0.5 TOPS/W (tera operations per second per watt), in line with custom integrated hardware accelerators.

show abstract

Customizable FPGA-Based Hardware Accelerator for Standard Convolution Processes Empowered with Quantization Applied to LiDAR Data

Cited by 7 publications

References 37 publications

Time-Efficient Identification Procedure for Neurological Complications of Rescue Patients in an Emergency Scenario Using Hardware-Accelerated Artificial Intelligence Models

Time-Efficient Identification Procedure for Neurological Complications of Rescue Patients in an Emergency Scenario Using Hardware-Accelerated Artificial Intelligence Models

Efficient Hardware Design and Implementation of the Voting Scheme-Based Convolution

Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI

Contact Info

Product

Resources

About