2019
DOI: 10.48550/arxiv.1912.10207
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Efficient Training for Neural Network Quantization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(31 citation statements)
references
References 25 publications
0
31
0
Order By: Relevance
“…For the initialization of quantization parameters, such as quantization scales and bit-widths, we set the initial bitwidths to be N + 1 when the complexity constraint is N -bit except that the patch embedding (first) layer and the classification (last) layer are 8-bit. Note that unlike previous works [21,40], the quantization parameters of first and last layer are optimized and are not fixed during the training. We initialize the all scales in the switchable scale vectors using a typical MSE-based approach.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the initialization of quantization parameters, such as quantization scales and bit-widths, we set the initial bitwidths to be N + 1 when the complexity constraint is N -bit except that the patch embedding (first) layer and the classification (last) layer are 8-bit. Note that unlike previous works [21,40], the quantization parameters of first and last layer are optimized and are not fixed during the training. We initialize the all scales in the switchable scale vectors using a typical MSE-based approach.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…For a further understanding about how Q-ViT works and its mechanism behind the performance improvements, we visualize the learned bit-width allocation of different components in ViT. First & Last layers: As we mentioned before, different from previous work [21,40], we enable bit-width learning for the patch embedding (first) layer and the classification layer (last) layer in ViT. The standard practice in quantization is that the first and last layer in a deep neural network are allocated with high bit-width, e.g.…”
Section: Learned Bit-width Allocationmentioning
confidence: 99%
“…Fig. 4 depicts the evolution of bitwidth for each layer when quantizing a 4-bit ResNet18 using DDQ with with gradient calibration (Jain et al, 2019;Esser et al, 2020;Jin et al, 2019;Bhalgat et al, 2020).…”
Section: Evaluation On Imagenetmentioning
confidence: 99%
“…To accelerate inference and save storage space for huge models without sacrificing performance, previous works propose to compress models with techniques including weight pruning [24], channel slimming [43,44], layer skipping [4,73], patterned or block pruning [17,35,40,42,49,50,51,52,56,57,82,84], and network quantization [12,18,30,31,32,38,75]. Specifically, these studies elaborate on compressing discriminative models for image classification, detection, or segmentation tasks.…”
Section: Introductionmentioning
confidence: 99%