2020
DOI: 10.48550/arxiv.2012.15701
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BinaryBERT: Pushing the Limit of BERT Quantization

Abstract: The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes. Therefore, we propose ternary weight splitting, which initializes the binary model by equivalent s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
33
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(33 citation statements)
references
References 34 publications
0
33
0
Order By: Relevance
“…Since the BERT era (Vaswani et al, 2017), quantization has been extensively studied to reduce the memory and computation complexity of the transformer architectures (Prato et al, 2019;Zafrir et al, 2019;Shen et al, 2020;Zhang et al, 2020;Bai et al, 2020). In detail, (Zafrir et al, 2019) finetunes the BERT with 8-bit quantization-aware training, and successfully compresses BERT with minimal accuracy loss.…”
Section: Transformer Quantizationmentioning
confidence: 99%
See 4 more Smart Citations
“…Since the BERT era (Vaswani et al, 2017), quantization has been extensively studied to reduce the memory and computation complexity of the transformer architectures (Prato et al, 2019;Zafrir et al, 2019;Shen et al, 2020;Zhang et al, 2020;Bai et al, 2020). In detail, (Zafrir et al, 2019) finetunes the BERT with 8-bit quantization-aware training, and successfully compresses BERT with minimal accuracy loss.…”
Section: Transformer Quantizationmentioning
confidence: 99%
“…The later Ternary-BERT (Zhang et al, 2020) proposes to use approximation based and loss-aware ternarization to ternarize the weights in the BERT, and use distillation to further reduce the accuracy drop caused by lower capacity. The BinaryBERT (Bai et al, 2020) suggests that it is difficult to train a binary BERT directly due to its complex loss landscape, and proposes ternary weight splitting strategy to make the binary BERT inherit the good performance of the ternary one. However, all of them are designed for NLP, not for the computer vision tasks.…”
Section: Transformer Quantizationmentioning
confidence: 99%
See 3 more Smart Citations