2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01027
|View full text |Cite
|
Sign up to set email alerts
|

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
157
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 136 publications
(157 citation statements)
references
References 22 publications
0
157
0
Order By: Relevance
“…Moreover, there also appeared one-stage methods [39] that directly detected HOI triplets. Besides the works based on convolutional neural networks (CNN), recently Transformer-based methods [40], [41] are proposed and achieved decent improvements.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, there also appeared one-stage methods [39] that directly detected HOI triplets. Besides the works based on convolutional neural networks (CNN), recently Transformer-based methods [40], [41] are proposed and achieved decent improvements.…”
Section: Related Workmentioning
confidence: 99%
“…In Tab. 1, on the challenging HICO-DET [1], the upper bounds are 45.52 (+QPIC [41], detection [41]) and 62.65 (GT humanobject boxes) mAP, which are significantly superior to the state-of-the-arts (about 29 mAP [41] and 44 mAP [38]). Here, detection [41] indicates using the detected human-object boxes from [41].…”
Section: Analyzing the Upper Bound Of Hakementioning
confidence: 99%
See 1 more Smart Citation
“…One-stage methods [20][21][22] execute object detection and HOI detection concurrently and pair them afterwards. Recent studies [23][24][25][26] achieve end-to-end HOI detection with a DETR [27] style network and benefit from the wider perception field of transformers [26].…”
Section: Related Workmentioning
confidence: 99%
“…In order to apply the proposed VSM and CMC into practice, we select an end-to-end vision model (VM) (Zou et al 2021;Tamura, Ohashi, and Yoshinaga 2021) as our VM and compose Object-guided Cross-modal Calibration Network (OCN). To conclude, our contributions are three-fold:…”
Section: Introductionmentioning
confidence: 99%