2021
DOI: 10.48550/arxiv.2103.07156
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Abstract: Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resourceconstrained devices. However, it is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models. To address this issue, we propose learnable companding quantization (LCQ) as a novel nonuniform quantization method for 2-, 3-, and 4-bit models. LCQ jointly optimizes model weights and learnable compandin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 18 publications
(38 reference statements)
0
0
0
Order By: Relevance
“…AI-based solutions on mobile devices requires a careful adaptation of the neural network architecture to the restrictions of AI hardware in mobiles. Such optimizations can include network pruning (Guo et al, 2021), low-bit quantization (Esser et al, 2020;Yamamoto, 2021;Young et al, 2021) or platform-aware neural architecture search (Wu et al, 2019;Kim et al, 2022). Specifically low-bit quantization allows reducing the precision needed to represent neuron's activations and network parameters, i.e., weights and biases, thus reducing the computation and memory requirements while minimizing the accuracy loss compared to a full-precision Artificial Neural Networks (ANN).…”
mentioning
confidence: 99%
“…AI-based solutions on mobile devices requires a careful adaptation of the neural network architecture to the restrictions of AI hardware in mobiles. Such optimizations can include network pruning (Guo et al, 2021), low-bit quantization (Esser et al, 2020;Yamamoto, 2021;Young et al, 2021) or platform-aware neural architecture search (Wu et al, 2019;Kim et al, 2022). Specifically low-bit quantization allows reducing the precision needed to represent neuron's activations and network parameters, i.e., weights and biases, thus reducing the computation and memory requirements while minimizing the accuracy loss compared to a full-precision Artificial Neural Networks (ANN).…”
mentioning
confidence: 99%