Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475222
|View full text |Cite
|
Sign up to set email alerts
|

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Abstract: Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression. The diverse and flexible expressions as well as complex visual contents in the images raise the RIS model with higher demands for investigating fine-grained matching behaviors between words in expressions and objects presented in images. However, such matching behaviors are hard to be learned and captured when the visual cues of referents (i.e. referred objects) are insuffici… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(2 citation statements)
references
References 40 publications
0
2
0
Order By: Relevance
“…One-stage frameworks (Suo et al 2021;Li and Sigal 2021;Hu, Rohrbach, and Darrell 2016) have been proposed. To model semantic relationships between vision and language, recent methods (Ding et al 2021;Feng et al 2021;Jiao et al 2021;Li and Sigal 2021;Yang et al 2022;Luo et al 2020b)incorporate complex cross-attention mechanisms inspired by the powerful abilities of Transformers (Vaswani et al 2017) for capturing long-range dependencies.…”
Section: Referring Expression Segmentationmentioning
confidence: 99%
“…One-stage frameworks (Suo et al 2021;Li and Sigal 2021;Hu, Rohrbach, and Darrell 2016) have been proposed. To model semantic relationships between vision and language, recent methods (Ding et al 2021;Feng et al 2021;Jiao et al 2021;Li and Sigal 2021;Yang et al 2022;Luo et al 2020b)incorporate complex cross-attention mechanisms inspired by the powerful abilities of Transformers (Vaswani et al 2017) for capturing long-range dependencies.…”
Section: Referring Expression Segmentationmentioning
confidence: 99%
“…Recent years have witnessed the great success of deep learning techniques on a series of tasks (He et al 2016;Liu et al 2018;Feng et al 2021), such as image recognition (He et al 2016;Liu et al 2020;Chen et al 2020b,a), Image segmentation (Jiao et al 2021), object detection (Ren et al 2016), video recognition and retrieval (Wu et al 2020c;Song et al 2021). Therefore, DNNs have been widely applied in realworld applications, e.g., online recognition services, navigation robots, autonomous driving (Tian et al 2018), etc.…”
Section: Introductionmentioning
confidence: 99%