4-dimensional local spatio-temporal features for human activity recognition

MATEC Web of Conferences

Chen

et al. 2016

Abstract. In this paper, we propose a robust and effective framework to largely improve the performance of human action recognition using depth maps. The key contribution is the proposition of the Sub-action Motion History Image (SMHI) and Static History Image (SHI) in a depth sequence. We evenly subdivide the normalized motion energy into a set of segments which corresponding frame indices are used to partition a video into different sub-actions segments. The Local Binary Patterns (LBP) descriptor is then computed from the SMHI and SHI for the representation of an action. We evaluate the proposed framework on MSR Action3D dataset. Experimental results indicate that the proposed approach outperforms the most of the art methods and demonstrate the effectiveness of the proposed approaches.

Section: Related Workmentioning

confidence: 99%

Action Recognition Based on Sub-action Motion History Image and Static History Image

MATEC Web of Conferences

Chen

et al. 2016

Computer Vision -- ACCV 2014

“…An extension to this method is using a probabilistic approach that combines prior domain knowledge to model each activity as a distribution over the codewords and each video as a distribution over the activities [24]. Although the advantage of these approaches that use image descriptors is that they do not require skeleton or object tracks to describe the activity observed, they are unable to take into account spatiotemporal relations between the different relevant entities in the scene, which are important elements when learning and recognising human activities [25,17].…”

Section: Related Workmentioning

confidence: 99%

Qualitative and Quantitative Spatio-temporal Relations in Daily Living Activity Recognition

Tayyub

Tavanai

Gatsoulis

et al. 2015

Abstract. For the effective operation of intelligent assistive systems working in real-world human environments, it is important to be able to recognise human activities and their intentions. In this paper we propose a novel approach to activity recognition from visual data. Our approach is based on qualitative and quantitative spatio-temporal features which encode the interactions between human subjects and objects in an abstract and efficient manner. Unlike current state of the art approaches, our approach uses significantly fewer assumptions and does not require any knowledge about object types, their affordances, or the sub-level activities that high-level activities consist of. We perform an automatic feature selection process which provides the most representative descriptions of the learnt activities. We validated our method using these descriptions on the CAD-120 benchmark dataset consisting of video sequences showing humans performing daily real-world activities. The experimental results show the strength of our work which significantly outperforms the current state of the art benchmark.

“…Dollar et al [18] detected such features using separable filters in space and time dimensions from color videos. Recently, Zhang and Parker [3] extended [18] to detect features in color-depth videos. These methods extract LST features from the entire frame; as a result, they detect a large portion of irrelevant features from background clutter and are incapable of distinguishing features from different individuals in a group.…”

Section: B Local Spatio-temporal Featuresmentioning

confidence: 99%

“…Although several approaches are discussed in the robotic perception literature to address the task of single-person action reasoning [1], [2], [3] and group action recognition [4], [5], [6], [7], interpreting the actions of each or a specific individual within a group has not been previously well investigated. We name this essential problem activity recognition of multiple individuals (ARMI), as depicted in the Simon Says game in Fig.…”

Section: Introductionmentioning

confidence: 99%

“…Use of orientationbased descriptors provides the representation with additional robustness to illumination variations. Recently, the popularity of using affordable structured-light cameras to construct 3D robotic vision systems, in which human representations are developed and valuable depth information is encoded into LST features, continues to attract increasing attention from computer vision [2], [8] and robotics communities [3].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive human-centered representation for activity recognition of multiple individuals from 3D point cloud sequences

Reardon

2015 IEEE International Conference on Robotics and Automation (ICRA)

et al. 2015

Self Cite

Abstract-Activity recognition of multi-individuals (ARMI) within a group, which is essential to practical human-centered robotics applications such as childhood education, is a particularly challenging and previously not well studied problem. We present a novel adaptive human-centered (AdHuC) representation based on local spatio-temporal features (LST) to address ARMI in a sequence of 3D point clouds. Our human-centered detector constructs affiliation regions to associate LST features with humans by mining depth data and using a cascade of rejectors to localize humans in 3D space. Then, features are detected within each affiliation region, which avoids extracting irrelevant features from dynamic background clutter and addresses moving cameras on mobile robots. Our feature descriptor is able to adapt its support region to linear perspective view variations and encode multi-channel information (i.e., color and depth) to construct the final representation. Empirical studies validate that the AdHuC representation obtains promising performance on ARMI using an Meka humanoid robot to play multi-people Simon Says games. Experiments on benchmark datasets further demonstrate that our adaptive human-centered representation outperforms previous approaches for activity recognition from color-depth data.