2021
DOI: 10.1109/access.2021.3109102
|View full text |Cite
|
Sign up to set email alerts
|

TransAnomaly: Video Anomaly Detection Using Video Vision Transformer

Abstract: Video anomaly detection is challenging because abnormal events are unbounded, rare, equivocal, irregular in real scenes. In recent years, transformers have demonstrated powerful modelling abilities for sequence data. Thus, we attempt to apply transformers to video anomaly detection. In this paper, we propose a prediction-based video anomaly detection approach named TransAnomaly. Our model combines the U-Net and the Video Vision Transformer (ViViT) to capture richer temporal information and more global contexts… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(25 citation statements)
references
References 22 publications
0
20
0
Order By: Relevance
“…Hence, one may assume applying ViT for VAD task is not appropriate, as the amount of VAD datasets is not that large compared to the other tasks such as image classification or object detection. Several approaches exploiting Transformer to the anomaly detection [3], [14] are also only based on the Transformer with convolutional layers, not ViT. However, our approach proves that ViT can be successfully trained to detect anomalies in video even without a huge amount of data.…”
Section: B Vision Transformermentioning
confidence: 83%
See 2 more Smart Citations
“…Hence, one may assume applying ViT for VAD task is not appropriate, as the amount of VAD datasets is not that large compared to the other tasks such as image classification or object detection. Several approaches exploiting Transformer to the anomaly detection [3], [14] are also only based on the Transformer with convolutional layers, not ViT. However, our approach proves that ViT can be successfully trained to detect anomalies in video even without a huge amount of data.…”
Section: B Vision Transformermentioning
confidence: 83%
“…As shown in TABLE IV, we compare our model with other previous VAD approaches which can be classified into three main categories: reconstruction-based, predictionbased, and the hybrid methods. First, reconstruction-based methods include Conv-AE [4], 3D-Conv [35], MemAE [1], and MNAD-R [6], while MNAD-P [6], AMMC-Net [8], Frame-Pred [5], VEC [7], C2-D2GAN [3], and Transanomaly [14], are prediction-based methods and HF2-VAD [2] is the hybrid method. We observe that our model achieves better results than other state-of-the-art methods in Ped2, except for HF2-VAD [2].…”
Section: F Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Yuan et al in [119] proposed TransAnomaly, a video ViT and U-Net-based framework for the detection of the anomalies in the videos. They used three datasets, Pred1, Pred2, and Avenue.…”
Section: Vits For Anomaly Detectionmentioning
confidence: 99%
“…Another hybrid model for SC classification was proposed by Sharma et al [ 48 ] who fused the features of cascaded ensembling of CNN and a handcrafted features-based DL model and achieved state-of-the-art performance. There is no doubt that vision transformers play an important role in several vision-based challenging applications, such as fire detection [ 49 , 50 ], anomaly detection [ 51 ], and medical image classification [ 52 , 53 ]. It is well documented that, according to the recent literature, multiclass SC classification is not an easy task because of the large amount of similarity in the dermoscopic images.…”
Section: Introductionmentioning
confidence: 99%