Can Attention Enable MLPs To Catch Up With CNNs?

Guo, Minghao; Liu, Zheng-Ning; Mu, Tai‐Jiang; Liang, Dun; Martin, Ralph R.; Hu, Shi‐Min

doi:10.48550/arxiv.2105.15078

Cited by 2 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several MLP-based architectures for computer vision that also operate on sequences of image patches have been recently proposed [7]. The aim of these architectures is to reduce the computational cost of ViT by removing the attention mechanism, while achieving a comparable performance by preserving a global receptive field similar to that of ViT.…”

Section: Attention-free Mlp-based Architecturesmentioning

confidence: 99%

Multi-Exit Vision Transformer for Dynamic Inference

Bakhtiarnia¹,

Zhang²,

Iosifidis³

2021

Preprint

View full text Add to dashboard Cite

Deep neural networks can be converted to multiexit architectures by inserting early exit branches after some of their intermediate layers. This allows their inference process to become dynamic, which is useful for time critical IoT applications with stringent latency requirements, but with time-variant communication and computation resources. In particular, in edge computing systems and IoT networks where the exact computation time budget is variable and not known beforehand. Vision Transformer is a recently proposed architecture which has since found many applications across various domains of computer vision. In this work, we propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones. Through extensive experiments involving both classification and regression problems, we show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.

show abstract

Section: Attention-free Mlp-based Architecturesmentioning

confidence: 99%