2009
DOI: 10.1109/tpami.2009.83
|View full text |Cite
|
Sign up to set email alerts
|

Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition

Abstract: Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
399
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 486 publications
(409 citation statements)
references
References 44 publications
5
399
0
Order By: Relevance
“…Visual Genome is the first large-scale visual relationship dataset. This dataset can be used to study the extraction of visual relationships (Sadeghi et al 2015) from images, and its interactions between objects can also be used to study action recognition (Yao and Fei-Fei 2010;Ramanathan et al 2015) and spatial orientation between objects (Gupta et al 2009;Prest et al 2012).…”
Section: Relationship Extractionmentioning
confidence: 99%
“…Visual Genome is the first large-scale visual relationship dataset. This dataset can be used to study the extraction of visual relationships (Sadeghi et al 2015) from images, and its interactions between objects can also be used to study action recognition (Yao and Fei-Fei 2010;Ramanathan et al 2015) and spatial orientation between objects (Gupta et al 2009;Prest et al 2012).…”
Section: Relationship Extractionmentioning
confidence: 99%
“…Such strategies benefit from using the global image content, thus not suffering from low-quality appearance, small objects, or occlusions. The object-action context is addressed in [17,23,11,38,15] while spatial coherence constraints may be enforced as well [11].…”
Section: Related Workmentioning
confidence: 99%
“…Despite many successes achieved by these methods, we argue that invariant feature sets are insufficient alone for this complicated task, since most of them can only provide partial invariance -some address this type of variations and others address that but not all; and even with these feature sets, lots of prototypes are still needed to cover the huge range of the variability exhibited in the pose space of human body, not to mention such a representation is usually with high dimension. To deal with these issues, some authors proposed to enhance the stability of feature sets using various context information (if available), such as human-object context [13][14][15] or group context [16,11,1], or using a multiple cues based approach to combine the strength of different features [2]. Recently, Wang et al introduce a method which relies on more semantically meaningful features (i.e., pose-lets) and arrange them in a hierarchical manner to improve the invariance and discriminative power of the feature representation [3], and achieves the state of the art performance on a challenging web data set with still images [17].…”
Section: Related Workmentioning
confidence: 99%