2019
DOI: 10.48550/arxiv.1904.03181
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detecting Human-Object Interactions via Functional Generalization

Abstract: We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(15 citation statements)
references
References 24 publications
0
15
0
Order By: Relevance
“…However, different from [26] who aims to zero-shot learning, VCL targets at Generalized Zero-Shot Learning [34]. In [3], a generic object detector was incorporated to generalize to interactions involving previously unseen objects. Also, Yang et al [37] proposed to alleviate the predicate bias to objects for zero-shot visual relationship detection.…”
Section: Low-shot and Zero-shot Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…However, different from [26] who aims to zero-shot learning, VCL targets at Generalized Zero-Shot Learning [34]. In [3], a generic object detector was incorporated to generalize to interactions involving previously unseen objects. Also, Yang et al [37] proposed to alleviate the predicate bias to objects for zero-shot visual relationship detection.…”
Section: Low-shot and Zero-shot Learningmentioning
confidence: 99%
“…Also, Yang et al [37] proposed to alleviate the predicate bias to objects for zero-shot visual relationship detection. Similar to previous approaches [3,26,29,36], we also equally treat the same verb from different HOIs. However, all those works [3,26,29,36] largely ignore the composition of verbs and objects.…”
Section: Low-shot and Zero-shot Learningmentioning
confidence: 99%
“…One of the main problems of detecting visual relationships is the need for tremendous amounts of varied examples, as appearances and classes of both subject and target should vary for generalization of each interaction class. The release of large datasets [3,11,14,33] has allowed the developement of several visual relationship detectors in recent years [4,13,15,20,21,29,30,31,32] as well as HOI detectors [1,3,8,10,11,23,24,28].…”
Section: Related Workmentioning
confidence: 99%
“…Some techniques [1,28,29] incorporate linguistic knowledge to address the issue of having a long-tail distribution of human-object interaction classes. They exploit the contextual information present in the language priors learnt with a 'word2vec' network, to generalize interactions across functionally similar objects.…”
Section: Related Workmentioning
confidence: 99%
“…Although modelling HOIs has been broadly studied in images [2,6,15,57], it has received less consideration in videos. Even deep learning methods have been developed for recognizing human actions in videos, most of them, including Covnet [41], recurrent neural networks (RNNs) [9,25] and 3D convolution models [5,49], only take individual frame-wise information as inputs (coarse-grained) without explicitly modeling (fine-grained) human-object relations across a video sequence.…”
Section: Introductionmentioning
confidence: 99%