Human Activity Recognition is an active research area with several Convolutional Neural Network (CNN) based features extraction and classification methods employed for surveillance and other applications. However, accurate identification of HAR from a sequence of frames is a challenging task due to cluttered background, different viewpoints, low resolution, and partial occlusion. Current CNN-based techniques use large-scale computational classifiers along with convolutional operators having local receptive fields, limiting their performance to capture long-range temporal information. Therefore, in this work, we introduce a convolution-free approach for accurate HAR, which overcomes the above-mentioned problems and accurately encodes relative spatial information. In the proposed framework, the frame-level features are extracted via pretrained Vision Transformer; next, these features are passed to multilayer long short-term memory to capture the long-range dependencies of the actions in the surveillance videos. To validate the performance of the proposed framework, we carried out extensive experiments on UCF50 and HMDB51 benchmark HAR datasets and improved accuracy by 0.944% and 1.414%, respectively, when compared to state-of-the-art deep models.
Background and motivation: Every year, millions of Muslims worldwide come to Mecca to perform the Hajj. In order to maintain the security of the pilgrims, the Saudi government has installed about 5000 closed circuit television (CCTV) cameras to monitor crowd activity efficiently. Problem: As a result, these cameras generate an enormous amount of visual data through manual or offline monitoring, requiring numerous human resources for efficient tracking. Therefore, there is an urgent need to develop an intelligent and automatic system in order to efficiently monitor crowds and identify abnormal activity. Method: The existing method is incapable of extracting discriminative features from surveillance videos as pre-trained weights of different architectures were used. This paper develops a lightweight approach for accurately identifying violent activity in surveillance environments. As the first step of the proposed framework, a lightweight CNN model is trained on our own pilgrim’s dataset to detect pilgrims from the surveillance cameras. These preprocessed salient frames are passed to a lightweight CNN model for spatial features extraction in the second step. In the third step, a Long Short Term Memory network (LSTM) is developed to extract temporal features. Finally, in the last step, in the case of violent activity or accidents, the proposed system will generate an alarm in real time to inform law enforcement agencies to take appropriate action, thus helping to avoid accidents and stampedes. Results: We have conducted multiple experiments on two publicly available violent activity datasets, such as Surveillance Fight and Hockey Fight datasets; our proposed model achieved accuracies of 81.05 and 98.00, respectively.
For efficient energy distribution, microgrids (MG) provide significant assistance to main grids and act as a bridge between the power generation and consumption. Renewable energy generation resources, particularly photovoltaics (PVs), are considered as a clean source of energy but are highly complex, volatile, and intermittent in nature making their forecasting challenging. Thus, a reliable, optimized, and a robust forecasting method deployed at MG objectifies these challenges by providing accurate renewable energy production forecasting and establishing a precise power generation and consumption matching at MG. Furthermore, it ensures effective planning, operation, and acquisition from the main grid in the case of superior or inferior amounts of energy, respectively. Therefore, in this work, we develop an end-to-end hybrid network for automatic PV power forecasting, comprising three basic steps. Firstly, data preprocessing is performed to normalize, remove the outliers, and deal with the missing values prominently. Next, the temporal features are extracted using deep sequential modelling schemes, followed by the extraction of spatial features via convolutional neural networks. These features are then fed to fully connected layers for optimal PV power forecasting. In the third step, the proposed model is evaluated on publicly available PV power generation datasets, where its performance reveals lower error rates when compared to state-of-the-art methods.
Digital surveillance systems are ubiquitous and continuously generate massive amounts of data, and manual monitoring is required in order to recognise human activities in public areas. Intelligent surveillance systems that can automatically identify normal and abnormal activities are highly desirable, as these would allow for efficient monitoring by selecting only those camera feeds in which abnormal activities are occurring. This paper proposes an energy-efficient camera prioritisation framework that intelligently adjusts the priority of cameras in a vast surveillance network using feedback from the activity recognition system. The proposed system addresses the limitations of existing manual monitoring surveillance systems using a three-step framework. In the first step, the salient frames are selected from the online video stream using a frame differencing method. A lightweight 3D convolutional neural network (3DCNN) architecture is applied to extract spatio-temporal features from the salient frames in the second step. Finally, the probabilities predicted by the 3DCNN network and the metadata of the cameras are processed using a linear threshold gate sigmoid mechanism to control the priority of the camera. The proposed system performs well compared to state-of-theart violent activity recognition methods in terms of efficient camera prioritisation in large-scale surveillance networks. Comprehensive experiments and an evaluation of activity recognition and camera prioritisation showed that our approach achieved an accuracy of 98% with an F1-score of 0.97 on the Hockey Fight dataset, and an accuracy of 99% with an F1-score of 0.98 on the Violent Crowd dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.