Findings of the Association for Computational Linguistics: EMNLP 2022 2022
DOI: 10.18653/v1/2022.findings-emnlp.101
|View full text |Cite
|
Sign up to set email alerts
|

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Following recent research (Hassid et al 2022), we first replace the dynamic self-attention matrix A n with a constant attention matrix C n ∈ R N h ×Nt×Nt . We initialize C n with the average of A n in train set, i.e.,…”
Section: The Merge Module (Mm)mentioning
confidence: 99%
See 1 more Smart Citation
“…Following recent research (Hassid et al 2022), we first replace the dynamic self-attention matrix A n with a constant attention matrix C n ∈ R N h ×Nt×Nt . We initialize C n with the average of A n in train set, i.e.,…”
Section: The Merge Module (Mm)mentioning
confidence: 99%
“…Embedding resending helps to bypass the embedding table query operation and decouple the computation between forward representation learning and next token sampling. Besides, following the recent research (Hassid et al 2022) in attention mechanism, we approximate self-attention with constant attention matrices and merge tensor computations in the Transformer module before inference. Nevertheless, these two strategies are challenging because: 1) PLMs are usually sensitive to input embeddings, while there are some unavoidable errors in the generated embeddings; 2) constant attention in our merge module might hurt the performance of PLMs.…”
Section: Introductionmentioning
confidence: 99%