2022
DOI: 10.48550/arxiv.2205.13016
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BiT: Robustly Binarized Multi-distilled Transformer

Abstract: Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however is technically challenging from an optimization perspective. In this work, we identify a series of improvements which enables binary transformers at a much higher … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…Binary Distilled Transformer [55] A binarized multi-distilled transformer including a two-set binarization scheme, an elastic binary activation function with learned parameters, and a method for successively distilling models.…”
Section: Slq Training [52]mentioning
confidence: 99%
See 1 more Smart Citation
“…Binary Distilled Transformer [55] A binarized multi-distilled transformer including a two-set binarization scheme, an elastic binary activation function with learned parameters, and a method for successively distilling models.…”
Section: Slq Training [52]mentioning
confidence: 99%
“…The choice of the two modelling depends on the similarity among input data and task relation. A number of works of MTL are surveyed and compared in [55,235,237], illustrating the overview of the literature and recent advances. One important research challenge of MTL lies in the multi-task modelling to take into account task and data relations for parameter structure sharing.…”
Section: Multi-task Learningmentioning
confidence: 99%