Abnormal activity detection is one of the most challenging tasks in the field of computer vision. This study is motivated by the recent state-of-art work of abnormal activity detection, which utilizes both abnormal and normal videos in learning abnormalities with the help of multiple instance learning by providing the data with video-level information. In the absence of temporal-annotations, such a model is prone to give a false alarm while detecting the abnormalities. For this reason, in this paper, we focus on the task of minimizing the false alarm rate while performing an abnormal activity detection task. The mitigation of these false alarms and recent advancement of 3D deep neural network in video action recognition task collectively give us motivation to exploit the 3D ResNet in our proposed method, which helps to extract spatial-temporal features from the videos. Afterwards, using these features and deep multiple instance learning along with the proposed ranking loss, our model learns to predict the abnormality score at the video segment level. Therefore, our proposed method 3D deep Multiple Instance Learning with ResNet (MILR) along with the new proposed ranking loss function achieves the best performance on the UCF-Crime benchmark dataset, as compared to other state-of-art methods. The effectiveness of our proposed method is demonstrated on the UCF-Crime dataset.
Given the scarcity of annotated datasets, learning the context-dependency of anomalous events as well as mitigating false alarms represent challenges in the task of anomalous activity detection. We propose a framework, Deep-network with Multiple Ranking Measures (DMRMs), which addresses context-dependency using a joint learning technique for motion and appearance features. In DMRMs, the spatial-time-dependent features are extracted from a video using a 3D residual network (ResNet), and deep motion features are extracted by integrating the motion flow maps’ information with the 3D ResNet. Afterward, the extracted features are fused for joint learning. This data fusion is then passed through a deep neural network for deep multiple instance learning (DMIL) to learn the context-dependency in a weakly-supervised manner using the proposed multiple ranking measures (MRMs). These MRMs consider multiple measures of false alarms, and the network is trained with both normal and anomalous events, thus lowering the false alarm rate. Meanwhile, in the inference phase, the network predicts each frame’s abnormality score along with the localization of moving objects using motion flow maps. A higher abnormality score indicates the presence of an anomalous event. Experimental results on two recent and challenging datasets demonstrate that our proposed framework improves the area under the curve (AUC) score by 6.5% compared to the state-of-the-art method on the UCF-Crime dataset and shows AUC of 68.5% on the ShanghaiTech dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.