2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00474
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Graph Attention for Referring Expression Comprehension

Abstract: Referring expression comprehension aims to locate the object instance described by a natural language referring expression in an image. This task is compositional and inherently requires visual reasoning on top of the relationships among the objects in the image. Meanwhile, the visual reasoning process is guided by the linguistic structure of the referring expression. However, existing approaches treat the objects in isolation or only explore the first-order relationships between objects without being aligned … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
83
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 183 publications
(86 citation statements)
references
References 26 publications
0
83
0
Order By: Relevance
“…Liu et al [15] develop a neural module tree network to regularize the visual grounding along the dependency parsing tree of the sentence. The works in [3,36,42] argue to learn the representations from expression and image regions in a stepwise manner, and perform multi-step reasoning for better matching performance. Wang et al [30] propose a graph-based language-guided attention network to highlight the inter-object and intra-object relationships that are closely relevant to the expression for better performance.…”
Section: Related Work 21 Referring Expression Comprehensionmentioning
confidence: 99%
“…Liu et al [15] develop a neural module tree network to regularize the visual grounding along the dependency parsing tree of the sentence. The works in [3,36,42] argue to learn the representations from expression and image regions in a stepwise manner, and perform multi-step reasoning for better matching performance. Wang et al [30] propose a graph-based language-guided attention network to highlight the inter-object and intra-object relationships that are closely relevant to the expression for better performance.…”
Section: Related Work 21 Referring Expression Comprehensionmentioning
confidence: 99%
“…The task of referring expression comprehension has attracted increasing attention in recent years, which expects to locate corresponding objects within an image based on input expressions. Previous referring expression comprehension methods [2, 6, 11, 16, 22, 25, 27, 28, 35, 41-43, 45, 47-51] can be mainly divided into two types, including proposal-region-based [2,6,11,22,25,27,28,[41][42][43][47][48][49][50][51] and grid-region-based methods [16,35,45] .…”
Section: Related Workmentioning
confidence: 99%
“…Most of proposal-region-based methods [2,22,[41][42][43]47, 50] are based on the "listener" strategy, which first combines language features with visual features of proposal regions, and then select the target region that best matches the input expression from these proposals. The proposal regions are typically extracted by a pretrained object detector (e.g., Faster R-CNN [34], Mask R-CNN [9] and others [4,17,[30][31][32]).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations