HPTQ: Hardware-Friendly Post Training Quantization

Habi, Hai Victor; Peretz, Reuven; Cohén, Esther; Dikstein, Lior; Dror, Oranit; Diamant, Idit; Jennings, Roy H.; Netzer, Arnon

doi:10.48550/arxiv.2109.09113

Cited by 3 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) Post-Training Quantization: The post-training quantization (PTQ) [75], [78], [82], [90] is a conversion technique in which all trained weights and activations of the NN model are converted to some fixed point representation, following some quantization precision established after the training phase. As indicated in Fig.…”

Section: Quantizationmentioning

confidence: 99%

Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation

Freire

Napoli

Costa

et al. 2023

J. Lightwave Technol.

View full text Add to dashboard Cite

In this paper, a new methodology is proposed that allows for the low-complexity development of neural network (NN) based equalizers for the mitigation of impairments in highspeed coherent optical transmission systems. In this work, we provide a comprehensive description and comparison of various deep model compression approaches that have been applied to feed-forward and recurrent NN designs. Additionally, we evaluate the influence these strategies have on the performance of each NN equalizer. Quantization, weight clustering, pruning, and other cutting-edge strategies for model compression are taken into consideration. In this work, we propose and evaluate a Bayesian optimization-assisted compression, in which the hyperparameters of the compression are chosen to simultaneously reduce complexity and improve performance. Next, this paper presents four distinct metrics (RMpS, BoP, NABS, and NLGs) that are discussed here that can be used to evaluate the amount of computing complexity required by various compression algorithms. These measurements can serve as a benchmark for evaluating the relative effectiveness of various NN equalizers when compression approaches are used. In conclusion, the trade-off between the complexity of each compression approach and its performance is evaluated by utilizing both simulated and experimental data in order to complete the analysis. By utilizing optimal compression approaches, we show that it is possible to design an NN-based equalizer that is simpler to implement and has better performance than the conventional digital back-propagation (DBP) equalizer with only one step per span. This is accomplished by reducing the number of multipliers used in the NN equalizer after applying the weighted clustering and pruning algorithms. Furthermore, we demonstrate that an equalizer based on NN can also achieve superior performance while still maintaining the same degree of complexity as the full electronic chromatic dispersion compensation block. We conclude our analysis by highlighting open questions and existing challenges, as well as possible future research directions.

show abstract

Section: Quantizationmentioning

confidence: 99%

Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation

Freire

Napoli

Costa

et al. 2023

J. Lightwave Technol.

View full text Add to dashboard Cite

show abstract

“…However, it may become detrimental where quantization is a mandatory operation for final deployment. For example, many well-known architectures have quantization collapse issues like MobileNet (Howard et al 2017;Sandler et al 2018;Howard et al 2019) and EfficientNet (Tan and Le 2019), which calls for remedy designs or advanced quantization schemes like (Sheng et al 2018;Yun and Wong 2021) and (Bhalgat et al 2020;Habi et al 2021) respectively.…”

Section: Introductionmentioning

confidence: 99%

Make RepVGG Greater Again: A Quantization-Aware Approach

Chu,

Li,

Zhang

2024

AAAI

View full text Add to dashboard Cite

The tradeoff between performance and inference speed is critical for practical applications. Architecture reparameterization obtains better tradeoffs and it is becoming an increasingly popular ingredient in modern convolutional neural networks. Nonetheless, its quantization performance is usually too poor to deploy (e.g. more than 20% top-1 accuracy drop on ImageNet) when INT8 inference is desired. In this paper, we dive into the underlying mechanism of this failure, where the original design inevitably enlarges quantization error. We propose a simple, robust, and effective remedy to have a quantization-friendly structure that also enjoys reparameterization benefits. Our method greatly bridges the gap between INT8 and FP32 accuracy for RepVGG. Without bells and whistles, the top-1 accuracy drop on ImageNet is reduced within 2% by standard post-training quantization. Extensive experiments on detection and semantic segmentation tasks verify its generalization.

show abstract