ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Rendón-Segador, Fernando J.; Álvarez-García, Juan Antonio; Enríquez, Fernando; Deniz, Oscar

doi:10.3390/electronics10131601

Cited by 35 publications

(25 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 5 shows that our proposed model is more lightweight than previously proposed methods for violence detection. Although the models presented by Sudhakaran and Lanz [ 22 ], Akti et al [ 42 ], and Rendón-Segador et al [ 40 ] are slightly more accurate, our proposed model has a much lower count of parameters compared to these models, which makes our method faster and computationally efficient. The only model that has a lower number of parameters than ours was the end-to-end CNN-LSTM model presented by AlDahoul et al [ 43 ]; however, experiments showed that this model is less accurate and less precise (model precision is 72.53 ± 4.6%) than ours.…”

Section: Experiments and Resultsmentioning

confidence: 92%

Efficient Violence Detection in Surveillance

Vijeikis

Raudonis

Dervinis

2022

Sensors

View full text Add to dashboard Cite

Intelligent video surveillance systems are rapidly being introduced to public places. The adoption of computer vision and machine learning techniques enables various applications for collected video features; one of the major is safety monitoring. The efficacy of violent event detection is measured by the efficiency and accuracy of violent event detection. In this paper, we present a novel architecture for violence detection from video surveillance cameras. Our proposed model is a spatial feature extracting a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction and classification. The proposed model is computationally light and still achieves good results—experiments showed that an average accuracy is 0.82 ± 2% and average precision is 0.81 ± 3% using a complex real-world security camera footage dataset based on RWF-2000.

show abstract

Section: Experiments and Resultsmentioning

confidence: 92%

Efficient Violence Detection in Surveillance

Vijeikis

Raudonis

Dervinis

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…For violence detection, they summarized the video sequences into dynamic images [53] and used these images to train a CNN classifier. Rendón-Segador et al [8] adopted a 3D DenseNet and combined it with a selfattention mechanism, and a bidirectional convolutional LSTM, to detect violence. The method relies on the optical flow as input, which is first encoded by the DenseNet into sequences of feature maps, and then passed on to self-attention and ConvLSTM layers before carrying out prediction by the fully connected layers of the classifier.…”

Section: Deep Learning-based Methodsmentioning

confidence: 99%

“…The need for improved techniques of autonomous detection is gaining more and more focus, mainly because of enormous amounts of surveillance data being generated and the impracticality of its manual monitoring because of the human toil involved. Several traditional (e.g., [3][4][5]) as well as deep learning-based methods (e.g., [6][7][8]) have focused on the problem. Abnormal events detection encompasses two types of video scenes: crowded and uncrowded [9].…”

Section: Introductionmentioning

confidence: 99%

LightAnomalyNet: A Lightweight Framework for Efficient Abnormal Behavior Detection

Mehmood

2021

Sensors

View full text Add to dashboard Cite

The continuous development of intelligent video surveillance systems has increased the demand for enhanced vision-based methods of automated detection of anomalies within various behaviors found in video scenes. Several methods have appeared in the literature that detect different anomalies by using the details of motion features associated with different actions. To enable the efficient detection of anomalies, alongside characterizing the specificities involved in features related to each behavior, the model complexity leading to computational expense must be reduced. This paper provides a lightweight framework (LightAnomalyNet) comprising a convolutional neural network (CNN) that is trained using input frames obtained by a computationally cost-effective method. The proposed framework effectively represents and differentiates between normal and abnormal events. In particular, this work defines human falls, some kinds of suspicious behavior, and violent acts as abnormal activities, and discriminates them from other (normal) activities in surveillance videos. Experiments on public datasets show that LightAnomalyNet yields better performance comparative to the existing methods in terms of classification accuracy and input frames generation.

show abstract

“… Rendón-Segador et al (2021) present a new approach for determining whether a video has a violent scene or not, based on an adapted 3D DenseNet, for a multi-head self-attention layer, and a bidirectional ConvLSTM module that enables encoding relevant spatio-temporal features. In addition, an ablation analysis of the input frames is carried out, comparing dense optical flow and neighboring frames removal, as well as the effect of the attention layer, revealing that combining optical flow and the attention mechanism enhances findings by up to 4.4 percent.…”

Section: Classification Of Violence Detection Techniquesmentioning

confidence: 99%

State-of-the-art violence detection techniques in video surveillance security systems: a systematic review

Omarov

Narynov²,

Zhumanov

et al. 2022

PeerJ Computer Science

View full text Add to dashboard Cite

We investigate and analyze methods to violence detection in this study to completely disassemble the present condition and anticipate the emerging trends of violence discovery research. In this systematic review, we provide a comprehensive assessment of the video violence detection problems that have been described in state-of-the-art researches. This work aims to address the problems as state-of-the-art methods in video violence detection, datasets to develop and train real-time video violence detection frameworks, discuss and identify open issues in the given problem. In this study, we analyzed 80 research papers that have been selected from 154 research papers after identification, screening, and eligibility phases. As the research sources, we used five digital libraries and three high ranked computer vision conferences that were published between 2015 and 2021. We begin by briefly introducing core idea and problems of video-based violence detection; after that, we divided current techniques into three categories based on their methodologies: conventional methods, end-to-end deep learning-based methods, and machine learning-based methods. Finally, we present public datasets for testing video based violence detectionmethods’ performance and compare their results. In addition, we summarize the open issues in violence detection in videoand evaluate its future tendencies.

show abstract

ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Cited by 35 publications

References 53 publications

Efficient Violence Detection in Surveillance

Efficient Violence Detection in Surveillance

LightAnomalyNet: A Lightweight Framework for Efficient Abnormal Behavior Detection

State-of-the-art violence detection techniques in video surveillance security systems: a systematic review

Contact Info

Product

Resources

About