2020
DOI: 10.1007/978-3-030-58607-2_4
|View full text |Cite
|
Sign up to set email alerts
|

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 92 publications
(44 citation statements)
references
References 35 publications
0
44
0
Order By: Relevance
“…To further explicitly align the vision and language modalities in a co-embedding space, Chen et al [3] generate the visual-textual co-embedding map in several recurrent steps. As graph neural network [35,42] presents a new form of mining the relationship between data, Hui et al [15] and Yang et al introduce graph structure models to achieve efficient message passing in RIS. Moreover, some works [13,15] also consider the linguistic roles of each word during multimodal interaction process.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…To further explicitly align the vision and language modalities in a co-embedding space, Chen et al [3] generate the visual-textual co-embedding map in several recurrent steps. As graph neural network [35,42] presents a new form of mining the relationship between data, Hui et al [15] and Yang et al introduce graph structure models to achieve efficient message passing in RIS. Moreover, some works [13,15] also consider the linguistic roles of each word during multimodal interaction process.…”
Section: Related Workmentioning
confidence: 99%
“…As graph neural network [35,42] presents a new form of mining the relationship between data, Hui et al [15] and Yang et al introduce graph structure models to achieve efficient message passing in RIS. Moreover, some works [13,15] also consider the linguistic roles of each word during multimodal interaction process. Words are classified into four categories, and a progressive comprehension process is proposed under the guidance of different type of words in [13].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…[12] utilize query attention and key-word-aware visual context to model relationships among different image regions, according to the corresponding query. More recent works, [13] model multimodal context by cross-modal interaction and guided through a dependency tree structure, [14] progressively exploits various types of words in the expression to segment the referent in a graph-based structure. In contrast to existing works on RIS that directly refer to objects in an image, we ground the region adjacent to the object to provide navigational guidance to a self-driving vehicle.…”
Section: B Referring Image Segmentationmentioning
confidence: 99%