Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a harmless action as violent. To this end, we present three deep-learning based models for violence detection and test them on the AIRTLab dataset, a novel dataset designed to check the robustness of algorithms against false positives. The objective is twofold: on one hand, we compute accuracy metrics on the three proposed models (two are based on transfer learning and one is trained from scratch), building a baseline of metrics for the AIRTLab dataset; on the other hand, we validate the capability of the proposed dataset of challenging the robustness to false positives. The results of the proposed models are in line with the scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting better generalization capabilities than the trained from scratch network. Moreover, the tests highlighted that most of the classification errors concern the identification of non-violent clips, validating the design of the proposed dataset. Finally, to demonstrate the significance of the proposed models, the paper presents a comparison with the related literature, as well as with models based on well-established pre-trained 2D Convolutional Neural Networks 2D CNNs. Such comparison highlights that 3D models get better accuracy performance than time distributed 2D CNNs (merged with a recurrent model) in processing the spatiotemporal features of video clips. The source code of the experiments and the AIRTLab dataset are available in public repositories.
The developments in the internet of things (IoT), artificial intelligence (AI), and cyber-physical systems (CPS) are paving the way to the implementation of smart factories in what is commonly recognized as the fourth industrial revolution. In the manufacturing sector, these technological advancements are making Industry 4.0 a reality, with data-driven methodologies based on machine learning (ML) that are capable of extracting knowledge from the data collected by sensors placed on production machines. This is particularly relevant in plastic injection molding, with the objective of monitoring the quality of molded products from the parameters of the production process. In this regard, the main contribution of this paper is the systematic comparison of ML techniques to predict the quality classes of plastic molded products, using real data collected during the production process. Specifically, we compare six different classifiers on the data coming from the production of plastic road lenses. To run the comparison, we collected a dataset composed of the process parameters of 1451 road lenses. On such samples, we tested a multi-class classification, providing a statistical analysis of the results as well as of the importance of the input features. Among the tested classifiers, the ensembles of decision trees, i.e., random forest and gradient-boosted trees (GBT), achieved 95% accuracy in predicting the quality classes of molded products, showing the viability of the use of ML-based techniques for this purpose. The collected dataset and the source code of the experiments are available in a public, open-access repository, making the presented research fully reproducible.
Although face recognition technology is currently integrated into industrial applications, it has open challenges, such as verification and identification from arbitrary poses. Specifically, there is a lack of research about face recognition in surveillance videos using, as reference images, mugshots taken from multiple Points of View (POVs) in addition to the frontal picture and the right profile traditionally collected by national police forces. To start filling this gap and tackling the scarcity of databases devoted to the study of this problem, we present the Face Recognition from Mugshots Database (FRMDB). It includes 28 mugshots and 5 surveillance videos taken from different angles for 39 distinct subjects. The FRMDB is intended to analyze the impact of using mugshots taken from multiple points of view on face recognition on the frames of the surveillance videos. To validate the FRMDB and provide a first benchmark on it, we ran accuracy tests using two CNNs, namely VGG16 and ResNet50, pre-trained on the VGGFace and VGGFace2 datasets for the extraction of face image features. We compared the results to those obtained from a dataset from the related literature, the Surveillance Cameras Face Database (SCFace). In addition to showing the features of the proposed database, the results highlight that the subset of mugshots composed of the frontal picture and the right profile scores the lowest accuracy result among those tested. Therefore, additional research is suggested to understand the ideal number of mugshots for face recognition on frames from surveillance videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.