2022
DOI: 10.1007/978-3-031-19833-5_35
|View full text |Cite
|
Sign up to set email alerts
|

SeqTR: A Simple Yet Universal Network for Visual Grounding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 50 publications
(26 citation statements)
references
References 45 publications
0
24
0
Order By: Relevance
“…The efforts on feature fusion have explored feature concatenation [28,48], attention mechanisms [7,29,72,86], and multi-modal Transformers [19,37,42,77]. The method most related to ours is Se-qTR [93], which also adopts a transformer model for generating the polygon vertices sequentially. However, SeqTR can only produce a single polygon of 18 vertices with coarse segmentation mask, failing to outline objects with complex shapes and occlusion.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The efforts on feature fusion have explored feature concatenation [28,48], attention mechanisms [7,29,72,86], and multi-modal Transformers [19,37,42,77]. The method most related to ours is Se-qTR [93], which also adopts a transformer model for generating the polygon vertices sequentially. However, SeqTR can only produce a single polygon of 18 vertices with coarse segmentation mask, failing to outline objects with complex shapes and occlusion.…”
Section: Related Workmentioning
confidence: 99%
“…Referring Expression Comprehension (REC) predicts a bounding box that tightly encompasses the target object in an image corresponding to a referring expression. Existing works include two-staged methods [26,27,90,94] that are based on region proposal ranking, and one-stage methods [4,34,42,46,84,93] that directly predict the target bounding box. Several papers [42,57,93] explore multitask learning of REC and RIS since they are two closely related tasks.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations