2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01005
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
132
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 202 publications
(137 citation statements)
references
References 31 publications
0
132
0
Order By: Relevance
“…Finally, the cross-modal features are used to generate the final prediction masks. Unlike existing RES methods [13,14], which segment objects according to the query text, we input text in parallel with the input image to extract information. By combining crossmodal features from both image and text, we accurately segment fluorescein leakage.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the cross-modal features are used to generate the final prediction masks. Unlike existing RES methods [13,14], which segment objects according to the query text, we input text in parallel with the input image to extract information. By combining crossmodal features from both image and text, we accurately segment fluorescein leakage.…”
Section: Methodsmentioning
confidence: 99%
“…1c). The recent success of reference expression segmentation (RES), which involves the use of natural language expressions to locate objects [13,14], suggests the possibility of using cross-modal data to build a robust and effective framework for fluorescein leakage segmentation.…”
Section: Introductionmentioning
confidence: 99%
“…Recurrent network in [25,30] and pyramid feature map in [22] are utilized to excavate more semantic context for fusion. Luo et al [28] proposes to learn referring segmentation and comprehension in a unified manner for better aligned representation. Inspired by the prevalence of attention mechanism in computer vision field, researchers resort to attention mechanism for an effective fusion of multi-modal representations.…”
Section: Referring Image Segmentationmentioning
confidence: 99%
“…(3) Incomplete utilization of instancelevel features: visual embeddings are always treated equally in terms of every location without highlighting in instances. Most of the previous methods [2,27,28,37] in this area directly use the global image representations without considering the instance-level features. However, for referring segmentation, the instance-level features should be highlighted since the referent in expression is often prone to describe instances.…”
Section: Introductionmentioning
confidence: 99%
“…As for consensus constraints, multi-task learning [14,15,20] enhances the model's generalization and performance by adding multiple related tasks to the main task. However, it is not usable for a single task.…”
Section: Introductionmentioning
confidence: 99%