DetectFormer: Category-Assisted Transformer for Traffic Scene Object Detection

Liang, Tianjiao; Bao, Hong; Pan, Weiguo; Fan, Xinyue; Li, Han

doi:10.3390/s22134833

Cited by 24 publications

(19 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Weighted full convolution layers are used in R-FCN approaches [14] to discover ROI and detect the category of objects as well as the information of their surroundings. With the use of deep learning algorithms, object detection approaches also seem promising for autonomous vehicles [15] and traffic scene object detection [16]. The You Only Look Once (YOLO) architecture processes 155 frames per second in real-time cases to produce quicker results.…”

Section: Related Workmentioning

confidence: 99%

Multi-shift spatio-temporal features assisted deep neural network for detecting the intrusion of wild animals using surveillance cameras

Kumar,

Stanley

2024

E3S Web Conf.

View full text Add to dashboard Cite

The coexistence of human populations and wildlife in shared habitats necessitates the development of effective intrusion detection systems to mitigate potential conflicts and promote harmonious relationships. Detecting the intrusion of wild animals, especially in areas where human-wildlife conflicts are common, is essential for both human and animal safety. Animal intrusion has become a serious threat to crop yield, impacting food security and reducing farmer profits. Rural residents and forestry workers are increasingly concerned about the issue of animal assaults. Drones and surveillance cam-eras are frequently used to monitor the movements of wild animals. To identify the type of animal, track its movement, and provide its position, an effective model is needed. This paper presents a novel methodology for detecting the intrusion of wild animals using deep neural networks with multishift spatio-temporal features from surveillance camera video images. The pro-posed method consists of a multi-shift attention convolutional neural net-work model to extract spatial features, a multi-moment gated recurrent unit attention model to extract temporal features, and a feature fusion network to fully explore the spatial semantics and temporal features of surveillance video images. The proposed model was tested with images from three different datasets and achieved promising results in terms of mean accuracy and precision.

show abstract

Section: Related Workmentioning

confidence: 99%

Multi-shift spatio-temporal features assisted deep neural network for detecting the intrusion of wild animals using surveillance cameras

Kumar,

Stanley

2024

E3S Web Conf.

View full text Add to dashboard Cite

show abstract

“…They tested the method along several augmentation techniques aim to have a more robust traffic sign detection under light condition changes. In a similar manner, DetecFormer ( Liang et al, 2022c ) was introduced by fusing local and global information in a global context encoder with the same purpose of traffic scene detection.…”

Section: Two-dimensional Object Detectionmentioning

confidence: 99%

A survey on 3D object detection in real time for autonomous driving

Contreras,

Jain,

Bhatt

et al. 2024

Front. Robot. AI

View full text Add to dashboard Cite

This survey reviews advances in 3D object detection approaches for autonomous driving. A brief introduction to 2D object detection is first discussed and drawbacks of the existing methodologies are identified for highly dynamic environments. Subsequently, this paper reviews the state-of-the-art 3D object detection techniques that utilizes monocular and stereo vision for reliable detection in urban settings. Based on depth inference basis, learning schemes, and internal representation, this work presents a method taxonomy of three classes: model-based and geometrically constrained approaches, end-to-end learning methodologies, and hybrid methods. There is highlighted segment for current trend of multi-view detectors as end-to-end methods due to their boosted robustness. Detectors from the last two kinds were specially selected to exploit the autonomous driving context in terms of geometry, scene content and instances distribution. To prove the effectiveness of each method, 3D object detection datasets for autonomous vehicles are described with their unique features, e. g., varying weather conditions, multi-modality, multi camera perspective and their respective metrics associated to different difficulty categories. In addition, we included multi-modal visual datasets, i. e., V2X that may tackle the problems of single-view occlusion. Finally, the current research trends in object detection are summarized, followed by a discussion on possible scope for future research in this domain.

show abstract

“…It leverages a self-supervised learning framework seamlessly integrated with a robust optimization method. Unlike approaches tailored for specific applications, such as object detection, which often enhance deep learning generalization through domainspecific augmentations [34] or particular modifications to transformer architecture [35], our focus lies in assessing the broader efficacy of self-supervised learning representations in domain generalization.…”

Section: Related Workmentioning

confidence: 99%

Evaluating and Improving Domain Invariance in Contrastive Self-Supervised Learning by Extrapolating the Loss Function

Zare,

Van Nguyen

2023

IEEE Access

View full text Add to dashboard Cite

Despite the remarkable progress of self-supervised learning (SSL), how self-supervised representations generalize to out-of-distribution data remains little understood. In this paper, we study the effects of distribution shifts on self-supervised representations. Our findings indicate that self-supervised representation learning is more robust than traditional supervised learning (52.8% versus 17.1% on the CMNIST dataset, 63.6% versus 60.6% on the Waterbirds dataset). However, self-supervised representations still suffer significantly from domain shifts, especially when spurious correlations are present. Motivated by this limitation, we propose a risk-extrapolated information NCE (ReinformNCE) to facilitate selfsupervised learning algorithms to learn more stable representations. Our approach integrates the infoNCE loss function and a robust optimization approach that extrapolates the risks of training domains. Extensive experiments show that ReinformNCE helps to extract domain-invariant self-supervised representations and it substantially improves the robustness of the self-supervised representations (68.2% versus 52.8% on the CMNIST dataset, 77.9% versus 63.6% on the Waterbirds dataset). To the best of our knowledge, this is the first work demonstrating the feasibility of learning domain-invariant representations based on robust optimization theory and without supervised information.

show abstract

DetectFormer: Category-Assisted Transformer for Traffic Scene Object Detection

Cited by 24 publications

References 52 publications

Multi-shift spatio-temporal features assisted deep neural network for detecting the intrusion of wild animals using surveillance cameras

Multi-shift spatio-temporal features assisted deep neural network for detecting the intrusion of wild animals using surveillance cameras

A survey on 3D object detection in real time for autonomous driving

Evaluating and Improving Domain Invariance in Contrastive Self-Supervised Learning by Extrapolating the Loss Function

Contact Info

Product

Resources

About