2011 IEEE/RSJ International Conference on Intelligent Robots and Systems 2011
DOI: 10.1109/iros.2011.6048130
|View full text |Cite
|
Sign up to set email alerts
|

4-dimensional local spatio-temporal features for human activity recognition

Abstract: Recognizing human activities from common color image sequences faces many challenges, such as complex backgrounds, camera motion, and illumination changes. In this paper, we propose a new 4-dimensional (4D) local spatio-temporal feature that combines both intensity and depth information. The feature detector applies separate filters along the 3D spatial dimensions and the 1D temporal dimension to detect a feature point. The feature descriptor then computes and concatenates the intensity and depth gradients wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
66
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 65 publications
(66 citation statements)
references
References 7 publications
0
66
0
Order By: Relevance
“…Recently, many methods are proposed for action recognition based on spatial-temporal interest points. Zhang et al [6] proposed a 4D local spatial-temporal feature which combines both intensity and depth information. They first used separate filters along the 3D spatial dimensions and the temporal dimension to detect interest point.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, many methods are proposed for action recognition based on spatial-temporal interest points. Zhang et al [6] proposed a 4D local spatial-temporal feature which combines both intensity and depth information. They first used separate filters along the 3D spatial dimensions and the temporal dimension to detect interest point.…”
Section: Related Workmentioning
confidence: 99%
“…An extension to this method is using a probabilistic approach that combines prior domain knowledge to model each activity as a distribution over the codewords and each video as a distribution over the activities [24]. Although the advantage of these approaches that use image descriptors is that they do not require skeleton or object tracks to describe the activity observed, they are unable to take into account spatiotemporal relations between the different relevant entities in the scene, which are important elements when learning and recognising human activities [25,17].…”
Section: Related Workmentioning
confidence: 99%
“…Dollar et al [18] detected such features using separable filters in space and time dimensions from color videos. Recently, Zhang and Parker [3] extended [18] to detect features in color-depth videos. These methods extract LST features from the entire frame; as a result, they detect a large portion of irrelevant features from background clutter and are incapable of distinguishing features from different individuals in a group.…”
Section: B Local Spatio-temporal Featuresmentioning
confidence: 99%
“…Although several approaches are discussed in the robotic perception literature to address the task of single-person action reasoning [1], [2], [3] and group action recognition [4], [5], [6], [7], interpreting the actions of each or a specific individual within a group has not been previously well investigated. We name this essential problem activity recognition of multiple individuals (ARMI), as depicted in the Simon Says game in Fig.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation