ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747552
|View full text |Cite
|
Sign up to set email alerts
|

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 16 publications
1
24
0
Order By: Relevance
“…Lastly, a common application is to finetune (similar to training) BERT models to particular datasets. This not only decreases the model footprint and increases inference speed but adjusts the model to new data [2,31,73,53,74].…”
Section: Small Scale Low-precision Trainingmentioning
confidence: 99%
“…Lastly, a common application is to finetune (similar to training) BERT models to particular datasets. This not only decreases the model footprint and increases inference speed but adjusts the model to new data [2,31,73,53,74].…”
Section: Small Scale Low-precision Trainingmentioning
confidence: 99%
“…It creates extreme asymmetric and unbalanced distributions by converting to the exponent. Therefore, many methods are devoted to designing specific quantizers for the quantization of Softmax output to maximize the information, such as Segmental quantizers [12,47], Logarithmic quantizers [22,28] or apply sparsification before quantization [20]. As shown in Figure 5, the Logarithmic quantizer has the largest 3.82 mutual information.…”
Section: Matthew-effect Preserving Quantizationmentioning
confidence: 99%
“…Existing quantization methods can be post-training quantization (PTQ) or in-training / quantization aware training (QAT). PTQ is applied after the model training is complete by compressing models into 8-bit representations and is relatively well supported by various libraries [3,4,5,6,7,8], such as TensorFlow Lite [9] and AIMET [10] for on-device deployment. However, almost no existing PTQ supports customized quantization configurations to compress machine learning (ML) layers and kernels into sub-8-bit (S8B) regimes [11].…”
Section: Introductionmentioning
confidence: 99%