2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.86
|View full text |Cite
|
Sign up to set email alerts
|

Learning Temporal Regularity in Video Sequences

Abstract: Perceiving meaningful activities in a long video sequence is a challenging problem due to ambiguous definition of 'meaningfulness' as well as clutters in the scene. We approach this problem by learning a generative model for regular motion patterns (termed as regularity) using multiple sources with very limited supervision. Specifically, we propose two methods that are built upon the autoencoders for their ability to work with little to no supervision. We first leverage the conventional handcrafted spatio-temp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
909
3
5

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,058 publications
(919 citation statements)
references
References 49 publications
2
909
3
5
Order By: Relevance
“…However, this method is sensitive to appearance and motion deviations during evaluation. A fully convolutional autoencoder network that generalizes abnormalities across various datasets by learning temporal regularity in videos was presented by Hasan et al [16]. Conventional handcrafted techniques in Histogram of Oriented Gradients (HOG) [9] and Histogram of Optical Flow (HOF) [3] are used as appearance and motion feature descriptors.…”
Section: Encoding Modulementioning
confidence: 99%
See 1 more Smart Citation
“…However, this method is sensitive to appearance and motion deviations during evaluation. A fully convolutional autoencoder network that generalizes abnormalities across various datasets by learning temporal regularity in videos was presented by Hasan et al [16]. Conventional handcrafted techniques in Histogram of Oriented Gradients (HOG) [9] and Histogram of Optical Flow (HOF) [3] are used as appearance and motion feature descriptors.…”
Section: Encoding Modulementioning
confidence: 99%
“…Building and supplying such cuboid as input to the network enables to better incorporate temporal details of input video frames. This is mainly attributed to the cuboid's ability of keeping appearance and motion patterns, and sustaining temporal information from multiple frames for longer duration [16,34,37]. This allows reliable and better feature representation across the model.…”
Section: Overview Of Our Sitgru Networkmentioning
confidence: 99%
“…To quantitatively evaluate the performance of the HORG descriptor, Fig. 12 plots ROC curves of the detection results and Table 6 lists the AUC values of both the HORG descriptor and state-of-the-art comparison approaches [28,31,32,[49][50][51][52]. The comparisons demonstrate that the performance of the HORG descriptor outperforms the comparison approaches.…”
Section: Pets2009mentioning
confidence: 99%
“…Different types of deep neural networks have been designed to learn rich discriminative features, and a strong performance has been achieved in AED. Hasan et al [28] proposed a convolutional autoencoder framework for reconstructing a scene, and the reconstruction costs were computed for identifying abnormalities in the scene. Sabokrou et al [29] proposed a deep network cascade for AED.…”
Section: ) Deep-learned Descriptormentioning
confidence: 99%
“…ConvNet consists of a stack of convolutional layers with a fullyconnected layer and a softmax classifier. Zhou et al, [18] in his paper applied a 3D ConvNet on classifying anomalies, whereas Hasan et al, [19] in his paper used an endto-end convolutional autoencoder to detect anomalies in surveillance videos.…”
Section: Deep Learningmentioning
confidence: 99%