2022
DOI: 10.48550/arxiv.2207.13955
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Architecture Search on Efficient Transformers and Beyond

Abstract: Recently, numerous efficient Transformers have been proposed to reduce the quadratic computational complexity of standard Transformers caused by the Softmax attention. However, most of them simply swap Softmax with an efficient attention mechanism without considering the customized architectures specially for the efficient attention. In this paper, we argue that the handcrafted vanilla Transformer architectures for Softmax attention may not be suitable for efficient Transformers. To address this issue, we prop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…In [160], the authors surveyed several NAS techniques for ViTs. To the best of our knowledge, there are limited studies on the NAS exploration in ViTs [161][162][163][164][165][166], and more attention is needed in the future. The NAS exploration for ViTs may a new direction for young investigators in the future.…”
Section: Neural Architecture Search (Nas)mentioning
confidence: 99%
“…In [160], the authors surveyed several NAS techniques for ViTs. To the best of our knowledge, there are limited studies on the NAS exploration in ViTs [161][162][163][164][165][166], and more attention is needed in the future. The NAS exploration for ViTs may a new direction for young investigators in the future.…”
Section: Neural Architecture Search (Nas)mentioning
confidence: 99%
“…Efficient Transformers. The concept of efficient Transformers was originally introduced in NLP, aiming to reduce the quadratic time and space complexity caused by the Transformer attention [41,75,37,38,87,51]. The mainstream methods use either patterns or kernels [76].…”
Section: Related Workmentioning
confidence: 99%
“…FBNet uses a proxy task, i.e., optimizing over a smaller dataset to evaluate candidate architectures. These architecture search techniques are compatible with both convolutional and transformerbased architectures [28].…”
Section: Architecture Search and Designmentioning
confidence: 99%