UFO-ViT: High Performance Linear Vision Transformer without Softmax

Song, Jeong-geun

doi:10.48550/arxiv.2109.14382

Cited by 3 publications

(5 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite the good classification performance of transformer-based methods, the original selfattention mechanism has time and energy complexity due to matrix multiplication. 41 In contrast to these neural networks, SNNs with asynchronous and event-based information processing are suitable for processing event streams with high computational efficiency and low costs. 42 SNNs with various learning rules are used in the tasks of event stream classification.…”

Section: Event-based Feature Recognition Methods For Event Cameramentioning

confidence: 99%

Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule

Zhou,

Zhang

2024

J. Electron. Imag.

View full text Add to dashboard Cite

Event cameras record moving objects with asynchronous event streams. It remains a challenge to make full use of the spatiotemporal information of event streams to extract high-quality features and to make event camera object recognition. We propose an event-based event camera object recognition system, which includes a denoising module and an object recognition module. The denoising module removes noise events using spatiotemporal information of the events. The object recognition module extracts primary spatiotemporal features based on the event time surface prototypes obtained by clustering, then further extracts complex diagnostic features, and makes object recognition using spiking neural networks with reward-modulated spike-timing-dependent plasticity learning rule. Experimental results demonstrate that our system has better performance than baseline methods on five popular event camera datasets, especially on datasets with rich spatiotemporal dynamics. Our denoising method improves the noise robustness of our event camera object recognition system on high-speed moving datasets that are greatly affected by noises. Moreover, our method has much better recognition ability than baseline methods when using short input event streams. Our method is very beneficial for developing event-based event camera object recognition algorithm when event streams are short or noises are serious.

show abstract

Section: Event-based Feature Recognition Methods For Event Cameramentioning

confidence: 99%

Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule

Zhou,

Zhang

2024

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…Naturally, the output of attention represents the features of the input sequence, which can be learned by the subsequent multi-layer perceptrons to complete the point cloud analysis. In summary, inspired by UFO-ViT [18], we propose a new framework, UFO-Net, adopting the idea of a unified force operation (UFO) layer that uses the L 2 -norm to normalize the feature map in the attention mechanism. UFO decomposes the transformation layer into a product of multiple heads and feature dimensions.…”

Section: Introductionmentioning

confidence: 99%

“…The essence of CNorm is a common L 2 -norm. CNorm learns point-to-point relational features by generating a unit hypersphere [18]. Furthermore, the offset matrices [16] introduced in UFO attention are effective in reducing the impact of noise and providing sufficient characteristic information for downstream tasks.…”

Section: Introductionmentioning

confidence: 99%

UFO-Net: A Linear Attention-Based Network for Point Cloud Classification

He¹,

Guo²,

Tang³

et al. 2023

Preprint

View full text Add to dashboard Cite

3D point cloud classification tasks have been a hot topic in recent years. Most existing point cloud processing frameworks lack context-aware features due to the deficiency of sufficient local feature extraction information. Therefore, we design an augmented sampling and grouping (ASG) module to efficiently obtain fine-grained features from the original point cloud. In particular, this method strengthens the domain near each centroid and makes reasonable use of the local mean and global standard deviation to mine point cloud’s local and global features. In addition to this, inspired by the transformer structure UFO-ViT in 2D vision tasks, we first try to use a linearly-normalized attention mechanism in point cloud processing tasks, investigating a novel transformer-based point cloud classification architecture UFO-Net. An effective local feature learning module is adopted as a bridging technique to connect different feature extraction modules. Importantly, UFO-Net employs multiple stacked blocks to better capture feature representation of the point cloud. Extensive ablation experiments on public datasets show that our method outperforms other state-of-the-art methods. For instance, our network performed with 93.7% overall accuracy on the ModelNet40 dataset, which was 0.5% higher than PCT. Our network also archived 83.8% overall accuracy on the ScanObjectNN dataset, which is 3.8% better than PCT.

show abstract

-norm to normalize the feature map in the attention mechanism. UFO decomposes the transformation layer into a product of multiple heads and feature dimensions.…”

Section: Introductionmentioning

confidence: 99%

“…The essence of CNorm is a common

-norm. CNorm learns point-to-point relational features by generating a unit hypersphere [ 18 ]. Furthermore, the offset matrices [ 16 ] introduced in UFO attention are effective in reducing the impact of noise and providing sufficient characteristic information for downstream tasks.…”

Section: Introductionmentioning

confidence: 99%

UFO-Net: A Linear Attention-Based Network for Point Cloud Classification

Guo

Tang

et al. 2023

Sensors

View full text Add to dashboard Cite

Three-dimensional point cloud classification tasks have been a hot topic in recent years. Most existing point cloud processing frameworks lack context-aware features due to the deficiency of sufficient local feature extraction information. Therefore, we designed an augmented sampling and grouping module to efficiently obtain fine-grained features from the original point cloud. In particular, this method strengthens the domain near each centroid and makes reasonable use of the local mean and global standard deviation to extract point cloud’s local and global features. In addition to this, inspired by the transformer structure UFO-ViT in 2D vision tasks, we first tried to use a linearly normalized attention mechanism in point cloud processing tasks, investigating a novel transformer-based point cloud classification architecture UFO-Net. An effective local feature learning module was adopted as a bridging technique to connect different feature extraction modules. Importantly, UFO-Net employs multiple stacked blocks to better capture feature representation of the point cloud. Extensive ablation experiments on public datasets show that this method outperforms other state-of-the-art methods. For instance, our network performed with 93.7% overall accuracy on the ModelNet40 dataset, which is 0.5% higher than PCT. Our network also archived 83.8% overall accuracy on the ScanObjectNN dataset, which is 3.8% better than PCT.

show abstract

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Cited by 3 publications

References 54 publications

Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule

Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule

UFO-Net: A Linear Attention-Based Network for Point Cloud Classification

UFO-Net: A Linear Attention-Based Network for Point Cloud Classification

Contact Info

Product

Resources

About