2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00602
|View full text |Cite
|
Sign up to set email alerts
|

Referring Image Segmentation via Recurrent Refinement Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
135
1

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 154 publications
(137 citation statements)
references
References 16 publications
1
135
1
Order By: Relevance
“…In contrast, our model uses a cross-modal selfattention module that can effectively model long-range dependencies between linguistic and visual modalities. Lastly, different from [15] which adopts ConvLSTM to refine segmentation with multi-scale visual features sequentially, the proposed method employs a novel gated fusion module for combining multi-level self-attentive features.…”
Section: Our Modelmentioning
confidence: 99%
See 4 more Smart Citations
“…In contrast, our model uses a cross-modal selfattention module that can effectively model long-range dependencies between linguistic and visual modalities. Lastly, different from [15] which adopts ConvLSTM to refine segmentation with multi-scale visual features sequentially, the proposed method employs a novel gated fusion module for combining multi-level self-attentive features.…”
Section: Our Modelmentioning
confidence: 99%
“…For the language description with N words, we encode each word w n as a one-hot vector, and project it into a compact word embedding represented as e n ∈ R C l by a lookup table. Different from previous methods [10,15,22] that apply LSTM to process the word vectors sequentially and encode the entire language description as a sentence vector, we keep the individual word vectors and introduce a crossmodal self-attention module to capture long-range correlations between these words and spatial regions in the image. More details will be presented in Sec.…”
Section: Multimodal Featuresmentioning
confidence: 99%
See 3 more Smart Citations