2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.454
|View full text |Cite
|
Sign up to set email alerts
|

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

Abstract: We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level. This is motivated by the fact that it is extremely expensive to label the combinatorial relations between objects at the instance level. Compared to the extensively studied problem, Weakly Supervised Object Detection (WSOD), WSVRD is more challenging as it needs to examine a large set of re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
108
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 131 publications
(109 citation statements)
references
References 44 publications
1
108
0
Order By: Relevance
“…To our best knowledge, it is the only work on unsupervised referring expression grounding. Note that it is also known as "weakly supervised" detection [60] as there is still image-level ground truth (i.e., the referring expression). Table 3 reports the unsupervised results on the RefCLEF.…”
Section: Evaluations Of Unsupervised Groundingmentioning
confidence: 99%
“…To our best knowledge, it is the only work on unsupervised referring expression grounding. Note that it is also known as "weakly supervised" detection [60] as there is still image-level ground truth (i.e., the referring expression). Table 3 reports the unsupervised results on the RefCLEF.…”
Section: Evaluations Of Unsupervised Groundingmentioning
confidence: 99%
“…In [30] an end-to-end system exploits the interaction of visual and geometric features of the subject, object and predicate. The end-to-end system in [34] exploits weakly supervised learning (i.e., the supervision is at image level). LTNs exploit the combination of the visual/geometric features of the subject/object with additional background knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…The whole dataset is split into 73,801 images for training and 25,857 images for testing. [41] 62.87 62.63 10.45 9.46 6.04 5.52 Shuffle [38] 62.94 62.71 ----VSA-Net [12] 64.53 64.41 9.97 9.72 6.28 6.02 PPR-FCN [42] 64.86 64. 17 We compare our complete model denoting "RLM(ours)" with some existing methods.…”
Section: Experiments On Visual Genomementioning
confidence: 99%
“…Visual relationship detection can be divided into two stages, including object-pairs proposing stage and predicate recognition stage. Traditional methods [22,42] follow the simple framework: given N detection objects, N 2 object-pairs are proposed in objectpairs proposing stage. The main problem is that the performance of relationship models is heavily dependent on N .…”
Section: Introductionmentioning
confidence: 99%