DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Zhang, Hao; Li, Feng; Liu, Shilong; Zhang, Lei; Su, Hang; Zhu, Jun; Ni, Lionel M.; Shum, Heung-Yeung

doi:10.48550/arxiv.2203.03605

Cited by 142 publications

(225 citation statements)

References 0 publications

Supporting

Mentioning

225

Contrasting

Order By: Relevance

“…Following, Deformable DETR [51] develops a sparse attention module named deformable attention to fasten the convergence speed of DETR. Sharing the same spirit, many researchers [9,26,48,29] proposed various schemes to speed up the convergence of DETR. More recently, Wang et al pointed out that DETR has the issue of data hunger and proposed to solve it by augmenting the supervision.…”

Section: Related Workmentioning

confidence: 99%

Improving Transferability for Domain Adaptive Detection Transformers

Gong¹,

Li²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

DETR-style detectors stand out amongst in-domain scenarios, but their properties in domain shift settings are under-explored. This paper aims to build a simple but effective baseline with a DETR-style detector on domain shift settings based on two findings. For one, mitigating the domain shift on the backbone and the decoder output features excels in getting favorable results. For another, advanced domain alignment methods in both parts further enhance the performance. Thus, we propose the Object-Aware Alignment (OAA) module and the Optimal Transport based Alignment (OTA) module to achieve comprehensive domain alignment on the outputs of the backbone and the detector. The OAA module aligns the foreground regions identified by pseudo-labels in the backbone outputs, leading to domain-invariant based features. The OTA module utilizes sliced Wasserstein distance to maximize the retention of location information while minimizing the domain gap in the decoder outputs. We implement the findings and the alignment modules into our adaptation method, and it benchmarks the DETR-style detector on the domain shift settings. Experiments on various domain adaptive scenarios validate the effectiveness of our method.

show abstract

Section: Related Workmentioning

confidence: 99%

Improving Transferability for Domain Adaptive Detection Transformers

Gong¹,

Li²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Other methods have introduced transformer architectures [38] that base their workflow on attention mechanisms and have obtained remarkable experimental results with a promising future [39,40].…”

Section: Network Modelsmentioning

confidence: 99%

Open Source Assessment of Deep Learning Visual Object Detection

Paniego

Sharma

Plaza

2022

Sensors

View full text Add to dashboard Cite

This paper introduces Detection Metrics, an open-source scientific software for the assessment of deep learning neural network models for visual object detection. This software provides objective performance metrics such as mean average precision and mean inference time. The most relevant international object detection datasets are supported along with the most widely used deep learning frameworks. Different network models, even those built from different frameworks, can be fairly compared in this way. This is very useful when developing deep learning applications or research. A set of tools is provided to manage and work with different datasets and models, including visualization and conversion into several common formats. Detection Metrics may also be used in automatic batch processing for large experimental tests, saving researchers time, and new domain-specific datasets can be easily created from videos or webcams. It is open-source, can be audited, extended, and adapted to particular requirements. It has been experimentally validated. The performance of the most relevant state-of-the-art neural models for object detection has been experimentally compared. In addition, it has been used in several research projects, guiding in selecting the most suitable network model architectures and training procedures. The performance of the different models and training alternatives can be easily measured, even on large datasets.

show abstract

“…Transformer-based networks were successfully applied in various computer vision tasks and held impressive results. Mask DINO [11] extends DINO [12] by adding a new branch to perform mask prediction for panoptic, instance and semantic segmentation. Content query embeddings from DINO [12] are used to perform mask classification for all segmentation tasks.…”

Section: Semantic Instance Segmentationmentioning

confidence: 99%

“…Mask DINO [11] extends DINO [12] by adding a new branch to perform mask prediction for panoptic, instance and semantic segmentation. Content query embeddings from DINO [12] are used to perform mask classification for all segmentation tasks. QueryInst [13] proposes a query-based end-to-end instance segmentation with parallel supervision on six dynamic mask heads.…”

Section: Semantic Instance Segmentationmentioning

confidence: 99%

Speeding Up Semantic Instance Segmentation by Using Motion Information

2022

View full text Add to dashboard Cite

Environment perception and understanding represent critical aspects in most computer vision systems and/or applications. State-of-the-art techniques to solve this vision task (e.g., semantic instance segmentation) require either dedicated hardware resources to run or a longer execution time. Generally, the main efforts were to improve the accuracy of these methods rather than make them faster. This paper presents a novel solution to speed up the semantic instance segmentation task. The solution combines two state-of-the-art methods from semantic instance segmentation and optical flow fields. To reduce the inference time, the proposed framework (i) runs the inference on every 5th frame, and (ii) for the remaining four frames, it uses the motion map computed by optical flow to warp the instance segmentation output. Using this strategy, the execution time is strongly reduced while preserving the accuracy at state-of-the-art levels. We evaluate our solution on two datasets using available benchmarks. Then, we conclude on the results obtained, highlighting the accuracy of the solution and the real-time operation capability.

show abstract

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Cited by 142 publications

References 0 publications

Improving Transferability for Domain Adaptive Detection Transformers

Improving Transferability for Domain Adaptive Detection Transformers

Open Source Assessment of Deep Learning Visual Object Detection

Speeding Up Semantic Instance Segmentation by Using Motion Information

Contact Info

Product

Resources

About