Crime generates significant losses, both human and economic. Every year, billions of dollars are lost due to attacks, crimes, and scams. Surveillance video camera networks generate vast amounts of data, and the surveillance staff cannot process all the information in real-time. Human sight has critical limitations. Among those limitations, visual focus is one of the most critical when dealing with surveillance. For example, in a surveillance room, a crime can occur in a different screen segment or on a distinct monitor, and the surveillance staff may overlook it. Our proposal focuses on shoplifting crimes by analyzing situations that an average person will consider as typical conditions, but may eventually lead to a crime. While other approaches identify the crime itself, we instead model suspicious behavior—the one that may occur before the build-up phase of a crime—by detecting precise segments of a video with a high probability of containing a shoplifting crime. By doing so, we provide the staff with more opportunities to act and prevent crime. We implemented a 3DCNN model as a video feature extractor and tested its performance on a dataset composed of daily action and shoplifting samples. The results are encouraging as the model correctly classifies suspicious behavior in most of the scenarios where it was tested. For example, when classifying suspicious behavior, the best model generated in this work obtains precision and recall values of 0.8571 and 1 in one of the test scenarios, respectively.
Artificial neural networks are efficient learning algorithms that are considered to be universal approximators for solving numerous real-world problems in areas such as computer vision, language processing, or reinforcement learning. To approximate any given function, neural networks train a large number of parameters—up to millions, or even billions in some cases. The large number of parameters and hidden layers in neural networks make them hard to interpret, which is why they are often referred to as black boxes. In the quest to make artificial neural networks interpretable in the field of computer vision, feature visualization stands out as one of the most developed and promising research directions. While feature visualizations are a valuable tool to gain insights about the underlying function learned by the network, they are still considered to be simple visual aids requiring human interpretation. In this paper, we propose that feature visualizations—class visualizations in particular—are analogous to mental imagery in humans, resembling the experience of seeing or perceiving the actual training data. Therefore, we propose that class visualizations contain embedded knowledge that can be exploited in a more automated manner. We present a series of experiments that shed light on the nature of class visualizations and demonstrate that class visualizations can be considered a conceptual compression of the data used to train the underlying model. Finally, we show that class visualizations can be regarded as convolutional filters and experimentally show their potential for extreme model compression purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.