Proceedings of the 25th ACM International Conference on Multimedia 2017
DOI: 10.1145/3123266.3123451
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-Temporal AutoEncoder for Video Anomaly Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
219
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 437 publications
(242 citation statements)
references
References 18 publications
0
219
0
1
Order By: Relevance
“…Hasan et al [9] detect the anomalies according to the reconstruction error of a convolutional AE. Zhao et al [44] proposed to use 3D convolution based reconstruction and prediction. Luo et al [24] iteratively update the sparse coefficients via a stacked RNN to detect anomalies in videos.…”
Section: Related Workmentioning
confidence: 99%
“…Hasan et al [9] detect the anomalies according to the reconstruction error of a convolutional AE. Zhao et al [44] proposed to use 3D convolution based reconstruction and prediction. Luo et al [24] iteratively update the sparse coefficients via a stacked RNN to detect anomalies in videos.…”
Section: Related Workmentioning
confidence: 99%
“…Since the spatial appearances and temporal relations are both very important cues for video understanding, our work investigates the use of both spatial and temporal dimensions using 3D CNNs in videos. To our best knowledge, only few works (Zhao et al 2017;Vondrick, Pirsiavash, and Torralba 2016) exploit the 3D architectures in self-supervised feature learning. They use reconstruction/generation-based pretext tasks, and aim for a specific target task: anomaly detection and video generation respectively.…”
Section: Self-supervised Representation Learningmentioning
confidence: 99%
“…Since there are few prior works on self-supervised representation learning using 3D CNNs, we enumerate several alternative self-supervision tasks to provide our own reference levels and validate the effectiveness of our method. While we mainly focus on the context-based approaches, we also explore the reconstruction-based methods: spatio-temporal autoencoders (Zhao et al 2017) and 3D inpainting (Pathak et al 2016) as well. All the methods and experiments use the same 3D ResNet-18 as a backbone architecture, and use Kinetics dataset (without labels).…”
Section: Alternative Pretraining Strategiesmentioning
confidence: 99%
“…Recent approaches rejuvenate the field by using convolutional neural networks (CNN) to extract high-level features from video frame intensity and achieve improved results. Some of these methods include Convolutional autoencoder [13], spatio-temporal autoencoder [5], 3D Convnet AE [27], and Temporally-coherent Sparse Coding Stacked-RNN [20]. Acknowledging the limitation of the intensity based features such as sensitivity to appearance noise, Liu et al [18] proposed to use the prediction of optical flow in their temporal coherent loss, effectively filtering out parts of the noise in pixel appearance.…”
Section: Video Anomaly Detectionmentioning
confidence: 99%