2022
DOI: 10.1109/access.2022.3157893
|View full text |Cite
|
Sign up to set email alerts
|

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

Abstract: Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. In the past, graphic processing units enabled these breakthroughs because of their greater computational speed. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and dev… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 80 publications
0
3
0
Order By: Relevance
“…For example, the values of FMs and weight parameters are originally represented as 32-bit floating-point numbers. However, it has been demonstrated that fewer bits can be used to represent these values without a noticeable accuracy drop [25]. This reduces the hardware requirements for CNN implementation as well as reduces the inference latency.…”
Section: Related Workmentioning
confidence: 99%
“…For example, the values of FMs and weight parameters are originally represented as 32-bit floating-point numbers. However, it has been demonstrated that fewer bits can be used to represent these values without a noticeable accuracy drop [25]. This reduces the hardware requirements for CNN implementation as well as reduces the inference latency.…”
Section: Related Workmentioning
confidence: 99%
“…Several other quantization methods applied optimization algorithms on quantized NN models but without any hardware model guidance [7,[50][51][52][53]. For instance, Loss Aware Post-training quantization (LAPQ) is a layer-wise iterative optimization algorithm to calculate the optimum quantization step for clipping [7].…”
Section: Group C: Optimization Without Hardware Awarenessmentioning
confidence: 99%
“…This assumption simplifies the computation of the overall sensitivity for different quantization configurations in the search space. Shawahna et al proposed post-training self-distillation and network prediction error function to search for optimal mixed-precision configuration for neural networks [51]. The method progressively quantizes the model to maximize the compression rate under a predefined accuracy threshold.…”
Section: Group C: Optimization Without Hardware Awarenessmentioning
confidence: 99%