Neural Architecture Search on Efficient Transformers and Beyond

Liu, Zexiang; Liu, Dong; Lu, Kaiyue; Qin, Z. H.; Sun, Weixuan; Xu, Jiacheng; Zhong, Yiran

doi:10.48550/arxiv.2207.13955

Cited by 6 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [160], the authors surveyed several NAS techniques for ViTs. To the best of our knowledge, there are limited studies on the NAS exploration in ViTs [161][162][163][164][165][166], and more attention is needed in the future. The NAS exploration for ViTs may a new direction for young investigators in the future.…”

Section: Neural Architecture Search (Nas)mentioning

confidence: 99%

A Comprehensive Survey of Transformers for Computer Vision

Jamil

Piran

Kwon

2023

Drones

View full text Add to dashboard Cite

As a special type of transformer, vision transformers (ViTs) can be used for various computer vision (CV) applications. Convolutional neural networks (CNNs) have several potential problems that can be resolved with ViTs. For image coding tasks such as compression, super-resolution, segmentation, and denoising, different variants of ViTs are used. In our survey, we determined the many CV applications to which ViTs are applicable. CV applications reviewed included image classification, object detection, image segmentation, image compression, image super-resolution, image denoising, anomaly detection, and drone imagery. We reviewed the state of the-art and compiled a list of available models and discussed the pros and cons of each model.

show abstract

Section: Neural Architecture Search (Nas)mentioning

confidence: 99%

A Comprehensive Survey of Transformers for Computer Vision

Jamil

Piran

Kwon

2023

Drones

View full text Add to dashboard Cite

show abstract

“…Efficient Transformers. The concept of efficient Transformers was originally introduced in NLP, aiming to reduce the quadratic time and space complexity caused by the Transformer attention [41,75,37,38,87,51]. The mainstream methods use either patterns or kernels [76].…”

Section: Related Workmentioning

confidence: 99%

Linear Video Transformer with Feature Fixation

Lu¹,

Liu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Vision Transformers have achieved impressive performance in video classification, while suffering from the quadratic complexity caused by the Softmax attention mechanism. Some studies alleviate the computational costs by reducing the number of tokens in attention calculation, but the complexity is still quadratic. Another promising way is to replace Softmax attention with linear attention, which owns linear complexity but presents a clear performance drop. We find that such a drop in linear attention results from the lack of attention concentration on critical features. Therefore, we propose a feature fixation module to reweight feature importance of the query and key before computing linear attention. Specifically, we regard the query, key, and value as various latent representations of the input token, and learn the feature fixation ratio by aggregating Query-Key-Value information. This is beneficial for measuring the feature importance comprehensively. Furthermore, we enhance the feature fixation by neighborhood association, which leverages additional guidance from spatial and temporal neighbouring tokens. The proposed method significantly improves the linear attention baseline and achieves state-of-the-art performance among linear video Transformers on three popular video classification benchmarks. With fewer parameters and higher efficiency, our performance is even comparable to some Softmax-based quadratic Transformers.

show abstract

“…FBNet uses a proxy task, i.e., optimizing over a smaller dataset to evaluate candidate architectures. These architecture search techniques are compatible with both convolutional and transformerbased architectures [28].…”

Section: Architecture Search and Designmentioning

confidence: 99%

A Comprehensive Survey of RFID-Based Localization Techniques for Wireless Networks

Kaur¹,

Sran²

2021

Wireless Sensor Networks and the Internet of Things

View full text Add to dashboard Cite

Deep neural networks (DNNs) are state-of-the-art techniques for solving most computer vision problems. DNNs require billions of parameters and operations to achieve state-ofthe-art results. This requirement makes DNNs extremely compute, memory, and energy-hungry, and consequently difficult to deploy on small battery-powered Internet-of-Things (IoT) devices with limited computing resources. Deployment of DNNs on Internet-of-Things devices, such as traffic cameras, can improve public safety by enabling applications such as automatic accident detection and emergency response. Through this paper, we survey the recent advances in low-power and energy-efficient DNN implementations that improve the deployability of DNNs without significantly sacrificing accuracy. In general, these techniques either reduce the memory requirements, the number of arithmetic operations, or both. The techniques can be divided into three major categories: (1) neural network compression, (2) network architecture search and design, and (3) compiler and graph optimizations. In this paper, we survey both low-power techniques for both convolutional and transformer DNNs, and summarize the advantages, disadvantages, and open research problems.

show abstract

Neural Architecture Search on Efficient Transformers and Beyond

Cited by 6 publications

References 25 publications

A Comprehensive Survey of Transformers for Computer Vision

A Comprehensive Survey of Transformers for Computer Vision

Linear Video Transformer with Feature Fixation

A Comprehensive Survey of RFID-Based Localization Techniques for Wireless Networks

Contact Info

Product

Resources

About