We present an autonomous assistive robotic system for human activity recognition from video sequences. Due to the large variability inherent to video capture from a nonfixed robot (as opposed to a fixed camera), as well as the robot's limited computing resources, implementation has been guided by robustness to this variability and by memory and computing speed efficiency. To accommodate motion speed variability across users, we encode motion using dense interest point trajectories. Our recognition model harnesses the dense interest point bag-of-words representation through an intersection kernel-based SVM that better accommodates the large intra-class variability stemming from a robot operating in different locations and conditions. To contextually assess the engine as implemented in the robot, we compare it with the most recent approaches of human action recognition performed on public datasets (non-robot-based), including a novel approach of our own that is based on a two-layer SVMhidden conditional random field sequential recognition model. The latter's performance is among the best within the recent state of the art. We show that our robot-based recognition engine, while less accurate than the sequential model, nonetheless shows good performances, especially given the adverse test conditions of the robot, relative to those of a fixed camera.
Most of recent methods for action/activity recognition, usually based on static classifiers, have achieved improvements by integrating context of local interest point (IP) features such as spatiotemporal IPs by characterising their neighbourhood under different scales. In this study, the authors propose a new approach that explicitly models the sequential aspect of activities. First, a sliding window segmentation technique splits the video stream into overlapping short segments. Each window is characterised by a local bag of words of IPs encoded by motion information. A first‐layer support vector machine provides for each window a vector of conditional class probabilities that summarises all discriminant information that is relevant for sequence recognition. The sequence of these stochastic vectors is then fed to a hidden conditional random field for inference at the sequence level. They also show how their approach can be naturally extended to the problem of conjoint segmentation and recognition of a sequence of action classes within a continuous video stream. They have tested their model on various human action and activity datasets and the obtained results compare favourably with current state of the art.
Ultrapure Silicon nanoparticles (SiNPs) produced by femtosecond laser ablation in water have gained great interest in the area of cancer therapy as they are efficient as photosensitizers in photodynamic therapy...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.