FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

Shawahna, Ahmad; Sait, Sadiq M.; Ahmad, Irfan

doi:10.1109/access.2022.3157893

Cited by 7 publications

(3 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the values of FMs and weight parameters are originally represented as 32-bit floating-point numbers. However, it has been demonstrated that fewer bits can be used to represent these values without a noticeable accuracy drop [25]. This reduces the hardware requirements for CNN implementation as well as reduces the inference latency.…”

Section: Related Workmentioning

confidence: 99%

Optimization of FPGA-based CNN accelerators using metaheuristics

2022

Self Cite

View full text Add to dashboard Cite

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to other computing technologies such as graphics processing units (GPUs). The current trend in FPGAbased CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver the optimal performance more challenging. This is because of the exponential increase in the design variables that must be considered when implementing a Multi-CLP accelerator as CNN's complexity increases. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a Multi-CLP accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors (DSPs), block random-access memories (BRAMs), and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The SA-/TS-based Multi-CLP achieves 1.31× − 2.37× higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.

show abstract

Section: Related Workmentioning

confidence: 99%

Optimization of FPGA-based CNN accelerators using metaheuristics

2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several other quantization methods applied optimization algorithms on quantized NN models but without any hardware model guidance [7,[50][51][52][53]. For instance, Loss Aware Post-training quantization (LAPQ) is a layer-wise iterative optimization algorithm to calculate the optimum quantization step for clipping [7].…”

Section: Group C: Optimization Without Hardware Awarenessmentioning

confidence: 99%

“…This assumption simplifies the computation of the overall sensitivity for different quantization configurations in the search space. Shawahna et al proposed post-training self-distillation and network prediction error function to search for optimal mixed-precision configuration for neural networks [51]. The method progressively quantizes the model to maximize the compression rate under a predefined accuracy threshold.…”

Section: Group C: Optimization Without Hardware Awarenessmentioning

confidence: 99%