2016
DOI: 10.1109/tcsvt.2015.2397200
|View full text |Cite
|
Sign up to set email alerts
|

Exemplar-Based Recognition of Human–Object Interactions

Abstract: Abstract-Human action can be recognised from a single still image by modelling human-object interactions (HOI), which infers the mutual spatial structure information between human and the manipulated object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object and estimation of human pose; they are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar-based … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
16
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(17 citation statements)
references
References 55 publications
0
16
0
Order By: Relevance
“…It is hard to directly solve the above problem because the existence of the bounded constraints (12). Here, we introduce a method to find an approximate solution based on the popularly used projected gradient descent technique.…”
Section: A Soft Rnn (Srnn) Regression Based Early Action Predictionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is hard to directly solve the above problem because the existence of the bounded constraints (12). Here, we introduce a method to find an approximate solution based on the popularly used projected gradient descent technique.…”
Section: A Soft Rnn (Srnn) Regression Based Early Action Predictionmentioning
confidence: 99%
“…While action recognition is a long-term research topic with considerable progress on developing robust spatiotemporal features (Cuboids [6], interest point clouds [1], HOG3D [21], dense trajectory [52], and two-stream CNN [46], [50], [55], [57] etc.) and feature learning techniques (sparse coding [64], max-margin learning [12], [68], Fisher vector [52], rank pooling [8] etc. ), conventional action recognition aims at developing algorithms and systems for after-of-the-fact prediction of human action (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to the conventional use of RGB videos, the information from depth channel is insensitive to illumination variations, invariant to color and texture changes, and more importantly reliable for body silhouette and skeleton (human posture) extraction [29]. Bearing on these merits, it is believed that the introduced depth information can greatly light up the research of human activity analysis [10], [23], [34], [48].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the depth information is usually weak in capturing the important appearance information, such as object color and texture. These greatly limit the application of depth cameras on recognizing complex human activities with object and interactions, such as human-object interactions [10] and fine grained activities [15], where the color appearance is also important.…”
Section: Introductionmentioning
confidence: 99%
“…The research challenges and focus on collective activity recognition should differ significantly from the widely studied action recognition [21], [28], [31], [32], [35], [37], [41]- [43], where the actions performed by individuals are the main focus. It should also be distinguished from the crowd activity recognition [19], [27], [29], [30] in a way that collective activity is not to model a crowd scenario but rather to infer collective person-person interactions between several people.…”
mentioning
confidence: 99%