2021
DOI: 10.48550/arxiv.2112.05359
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Abstract: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via lowdimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 19 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?