2022
DOI: 10.48550/arxiv.2201.12329
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Abstract: We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the queryto-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
114
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 58 publications
(114 citation statements)
references
References 16 publications
(38 reference statements)
0
114
0
Order By: Relevance
“…Vision Transformer(ViT) [18,5] achieved state-of-the-art results on various vision tasks. To increase the convergence speed and improve accuracy, well-explored locality inductive bias have been reintroduced into vision transformer [66,22,62,41,27,61,51,19,56,26], among which, hybrid architecture of convolution and transformer design [49,57,12,21,34] can achieve state-of-the-art performance of a wide range of tasks. Our ConvMAE is highly motivated by the hybrid architecture design [21,34,12,57] in vision backbones.…”
Section: Related Workmentioning
confidence: 99%
“…Vision Transformer(ViT) [18,5] achieved state-of-the-art results on various vision tasks. To increase the convergence speed and improve accuracy, well-explored locality inductive bias have been reintroduced into vision transformer [66,22,62,41,27,61,51,19,56,26], among which, hybrid architecture of convolution and transformer design [49,57,12,21,34] can achieve state-of-the-art performance of a wide range of tasks. Our ConvMAE is highly motivated by the hybrid architecture design [21,34,12,57] in vision backbones.…”
Section: Related Workmentioning
confidence: 99%
“…Detection: Mainstream detection algorithms have been dominated by convolutional neural networkbased frameworks, until recently Transformer-based detectors [2,22,18,37] achieve great progress. DETR [2] is the first end-to-end and query-based Transformer object detector, which adopts a setprediction objective with bipartite matching.…”
Section: Related Workmentioning
confidence: 99%
“…Although DETR addresses both the object detection and panoptic segmentation tasks, its segmentation performance is still inferior to classical segmentation models. To improve the detection and segmentation performance of query-based models, researchers have developed specialized models for object detection [40,22,18,37], image segmentation [38,6,4], instance segmentation [10], panoptic segmentation [27], and semantic segmentation [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Following, Deformable DETR [51] develops a sparse attention module named deformable attention to fasten the convergence speed of DETR. Sharing the same spirit, many researchers [9,26,48,29] proposed various schemes to speed up the convergence of DETR. More recently, Wang et al pointed out that DETR has the issue of data hunger and proposed to solve it by augmenting the supervision.…”
Section: Related Workmentioning
confidence: 99%