2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00977
|View full text |Cite
|
Sign up to set email alerts
|

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

Abstract: We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
131
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 112 publications
(131 citation statements)
references
References 20 publications
0
131
0
Order By: Relevance
“…In addition, the approaches in (Gupta, Schwing, and Hoiem 2019) and (Li et al 2019) require pose estimation models too. The numbers listed in table 1 do not count these parameters.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the approaches in (Gupta, Schwing, and Hoiem 2019) and (Li et al 2019) require pose estimation models too. The numbers listed in table 1 do not count these parameters.…”
Section: Resultsmentioning
confidence: 99%
“…As mentioned before, V-COCO (Gupta and Malik 2015) is a small dataset and does not provide any insights into the proposed method. In line with recent work (Gupta, Schwing, and Hoiem 2019), we avoid using it.…”
Section: Methodsmentioning
confidence: 92%
“…Computer vision (CV)-based human motion modelling and analysis has been extensively researched by the community. But, most of the research can be categorised into pose estimation [160], human-object interaction [63,98], activity/gesture recognition [31,65,113] or human-human interaction [53]. However, comparative analysis of human motion has received relatively less attention from the community.…”
Section: Introductionmentioning
confidence: 99%
“…Most existing works on HOI detection [9,11,14,25,36,39,41] treat HOIs as individual interaction categories and focus on mining visual representations of human-object pairs to improve classification performances. Despite previous successes, these conventional Figure 2: Polysemy of action labels.…”
Section: Introductionmentioning
confidence: 99%
“…As an example shown in Figure 2, collocated with different objects, the actual implications of action "ride" are sometimes inconsistent. Such phenomenon brings ambiguities and extra challenges to compositional methods [1,11,14,36].…”
Section: Introductionmentioning
confidence: 99%