2021
DOI: 10.48550/arxiv.2101.01321
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

I-BERT: Integer-only BERT Quantization

Abstract: Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for many edge processors, and it has been a challenge to deploy these models for edge applications and devices that have resource constraints. While quantization can be a viable solution to this, previous work on quantizing Transformer based models uses floating-point arithmetic during inferen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(17 citation statements)
references
References 46 publications
0
15
0
Order By: Relevance
“…Furthermore, calculating these statistics requires floating point operations that would prevent us from doing integeronly quantization. Therefore, in this work, we only use static quantization where we pre-compute the clipping ranges and fix them during inference as in [17,18,31]. It is straightforward to pre-compute the ranges for weights as they are fixed during inference.…”
Section: A Basic Quantization Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Furthermore, calculating these statistics requires floating point operations that would prevent us from doing integeronly quantization. Therefore, in this work, we only use static quantization where we pre-compute the clipping ranges and fix them during inference as in [17,18,31]. It is straightforward to pre-compute the ranges for weights as they are fixed during inference.…”
Section: A Basic Quantization Methodsmentioning
confidence: 99%
“…Integer-only quantization [17,18,31] not only represents the model weights and activations with lowprecision integer values, but it also carries out the entire inference with integer arithmetic. Broadly speaking, the core of integer-only quantization is the linear property of the operations.…”
Section: B Integer-only Quantizationmentioning
confidence: 99%
See 3 more Smart Citations