While action recognition has become an important line of research in computer vision, the recognition of particular events such as aggressive behaviors, or fights, has been relatively less studied. These tasks may be extremely useful in several video surveillance scenarios such as psychiatric wards, prisons or even in personal camera smartphones. Their potential usability has led to a surge of interest in developing fight or violence detectors. One of the key aspects in this case is efficiency, that is, these methods should be computationally fast. "Handcrafted" spatiotemporal features that account for both motion and appearance information can achieve high accuracy rates, albeit the computational cost of extracting some of those features is still prohibitive for practical applications. The deep learning paradigm has been recently applied for the first time to this task too, in the form of a 3D Convolutional Neural Network that processes the whole video sequence as input. However, results in human perception of other's actions suggest that, in this specific task, motion features are crucial. This means that using the whole video as input may add both redundancy and noise in the learning process. In this work, we propose a hybrid "handcrafted/learned" feature framework which provides better accuracy than the previous feature learning method, with similar computational efficiency. The proposed method is compared to three related benchmark datasets. The method outperforms the different state-of-the-art methods in two of the three considered benchmark datasets.
Embedded systems control and monitor a great deal of our reality. While some “classic” features are intrinsically necessary, such as low power consumption, rugged operating ranges, fast response and low cost, these systems have evolved in the last few years to emphasize connectivity functions, thus contributing to the Internet of Things paradigm. A myriad of sensing/computing devices are being attached to everyday objects, each able to send and receive data and to act as a unique node in the Internet. Apart from the obvious necessity to process at least some data at the edge (to increase security and reduce power consumption and latency), a major breakthrough will arguably come when such devices are endowed with some level of autonomous “intelligence”. Intelligent computing aims to solve problems for which no efficient exact algorithm can exist or for which we cannot conceive an exact algorithm. Central to such intelligence is Computer Vision (CV), i.e., extracting meaning from images and video. While not everything needs CV, visual information is the richest source of information about the real world: people, places and things. The possibilities of embedded CV are endless if we consider new applications and technologies, such as deep learning, drones, home robotics, intelligent surveillance, intelligent toys, wearable cameras, etc. This paper describes the Eyes of Things (EoT) platform, a versatile computer vision platform tackling those challenges and opportunities.
Computer vision and deep learning are clearly demonstrating a capability to create engaging cognitive applications and services. However, these applications have been mostly confined to powerful Graphic Processing Units (GPUs) or the cloud due to their demanding computational requirements. Cloud processing has obvious bandwidth, energy consumption and privacy issues. The Eyes of Things (EoT) is a powerful and versatile embedded computer vision platform which allows the user to develop artificial vision and deep learning applications that analyse images locally. In this article, we use the deep learning capabilities of an EoT device for a real-life facial informatics application: a doll capable of recognizing emotions, using deep learning techniques, and acting accordingly. The main impact and significance of the presented application is in showing that a toy can now do advanced processing locally, without the need of further computation in the cloud, thus reducing latency and removing most of the ethical issues involved. Finally, the performance of the convolutional neural network developed for that purpose is studied and a pilot was conducted on a panel of 12 children aged between four and ten years old to test the doll.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.