2021
DOI: 10.48550/arxiv.2109.14382
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Abstract: Vision transformers have become one of the most important models for computer vision tasks. While they outperform earlier convolutional networks, the complexity quadratic to N is one of the major drawbacks when using traditional self-attention algorithms. Here we propose the UFO-ViT(Unit Force Operated Vision Trnasformer), novel method to reduce the computations of self-attention by eliminating some non-linearity. Modifying few of lines from self-attention, UFO-ViT achieves linear complexity without the degrad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 54 publications
0
5
0
Order By: Relevance
“…Despite the good classification performance of transformer-based methods, the original selfattention mechanism has time and energy complexity due to matrix multiplication. 41 In contrast to these neural networks, SNNs with asynchronous and event-based information processing are suitable for processing event streams with high computational efficiency and low costs. 42 SNNs with various learning rules are used in the tasks of event stream classification.…”
Section: Event-based Feature Recognition Methods For Event Cameramentioning
confidence: 99%
“…Despite the good classification performance of transformer-based methods, the original selfattention mechanism has time and energy complexity due to matrix multiplication. 41 In contrast to these neural networks, SNNs with asynchronous and event-based information processing are suitable for processing event streams with high computational efficiency and low costs. 42 SNNs with various learning rules are used in the tasks of event stream classification.…”
Section: Event-based Feature Recognition Methods For Event Cameramentioning
confidence: 99%
“…Naturally, the output of attention represents the features of the input sequence, which can be learned by the subsequent multi-layer perceptrons to complete the point cloud analysis. In summary, inspired by UFO-ViT [18], we propose a new framework, UFO-Net, adopting the idea of a unified force operation (UFO) layer that uses the L 2 -norm to normalize the feature map in the attention mechanism. UFO decomposes the transformation layer into a product of multiple heads and feature dimensions.…”
Section: Introductionmentioning
confidence: 99%
“…The essence of CNorm is a common L 2 -norm. CNorm learns point-to-point relational features by generating a unit hypersphere [18]. Furthermore, the offset matrices [16] introduced in UFO attention are effective in reducing the impact of noise and providing sufficient characteristic information for downstream tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Naturally, the output of attention represents the features of the input sequence, which can be learned by the subsequent multi-layer perceptrons to complete the point cloud analysis. In summary, inspired by UFO-ViT [ 18 ], we propose a new framework, UFO-Net, adopting the idea of a unified force operation (UFO) layer that uses the -norm to normalize the feature map in the attention mechanism. UFO decomposes the transformation layer into a product of multiple heads and feature dimensions.…”
Section: Introductionmentioning
confidence: 99%
“…The essence of CNorm is a common -norm. CNorm learns point-to-point relational features by generating a unit hypersphere [ 18 ]. Furthermore, the offset matrices [ 16 ] introduced in UFO attention are effective in reducing the impact of noise and providing sufficient characteristic information for downstream tasks.…”
Section: Introductionmentioning
confidence: 99%