2012 IEEE Conference on Computer Vision and Pattern Recognition 2012
DOI: 10.1109/cvpr.2012.6247824
|View full text |Cite
|
Sign up to set email alerts
|

A combined pose, object, and feature model for action understanding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
49
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 54 publications
(49 citation statements)
references
References 22 publications
0
49
0
Order By: Relevance
“…The high-level model evaluated here is based on the joint activity recognition and object tracking method of [30], which presents a model for understanding the interactions between humans and objects while performing an action over time. The model uses an "in the hand" interaction primitive and represents a variety of actions in which an object is manipulated.…”
Section: High-level Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…The high-level model evaluated here is based on the joint activity recognition and object tracking method of [30], which presents a model for understanding the interactions between humans and objects while performing an action over time. The model uses an "in the hand" interaction primitive and represents a variety of actions in which an object is manipulated.…”
Section: High-level Modelmentioning
confidence: 99%
“…We combine this novel representation with two existing methods: a lowlevel latent state temporal sequence model [28], and a high-level model based on a sequence of structured features including primitive representations of object state and person-object interaction [30]. We provide a comprehensive evaluation of these representations, separately and in combination, on the VISINT corpus.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Packer et al [19] presented a system that is able to recognize difficult, fine-grained human actions with the management of objects in truthful action sequences. Reddy et al [20] propose the scene context info obtained from moving and immobile pixels in the key frames, in combination with motion features, to resolve the action recognition difficulty on a big dataset with videos from the web.…”
Section: Related Workmentioning
confidence: 99%