2020
DOI: 10.48550/arxiv.2005.12872
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

End-to-End Object Detection with Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
454
1
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 223 publications
(457 citation statements)
references
References 0 publications
0
454
1
2
Order By: Relevance
“…Since their introduction by Vaswani et al (2017), transformers, originally designed for machine translation, were applied to various problems, from text generation (Radford et al, 2018; to image processing (Carion et al, 2020) and speech recognition (Dong et al, 2018) where they soon achieved state-of-the-art performance (Dosovitskiy et al, 2021;Wang et al, 2020b). In mathematics, transformers were used for symbolic integration (Lample & Charton, 2019), theorem proving (Polu & Sutskever, 2020), formal logic (Hahn et al, 2021), SAT solving (Shi et al, 2021), symbolic regression (Biggio et al, 2021) and dynamical systems (Charton et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Since their introduction by Vaswani et al (2017), transformers, originally designed for machine translation, were applied to various problems, from text generation (Radford et al, 2018; to image processing (Carion et al, 2020) and speech recognition (Dong et al, 2018) where they soon achieved state-of-the-art performance (Dosovitskiy et al, 2021;Wang et al, 2020b). In mathematics, transformers were used for symbolic integration (Lample & Charton, 2019), theorem proving (Polu & Sutskever, 2020), formal logic (Hahn et al, 2021), SAT solving (Shi et al, 2021), symbolic regression (Biggio et al, 2021) and dynamical systems (Charton et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…The experiments are conducted on two backbones, ResNet-50 and ResNet-101. We use a COCO pre-trained DETR (Carion et al 2020) to initialize the weights. The model is trained with AdamW, and the learning rate is set to 1e-4 except that the learning rate for backbone is set to 1e-5.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…c) This module aims at imposing the two sequences of feature maps to be matched. object detection [9,107]. To better characterize inter-step correlations, we integrate the vision transformer ViT [20] into the backbone by replacing the global average pooling.…”
Section: Vision Transformermentioning
confidence: 99%