Object detection plays a vital role in autonomous driving systems, and the accurate detection of surrounding objects can ensure the safe driving of vehicles. This paper proposes a category-assisted transformer object detector called DetectFormer for autonomous driving. The proposed object detector can achieve better accuracy compared with the baseline. Specifically, ClassDecoder is assisted by proposal categories and global information from the Global Extract Encoder (GEE) to improve the category sensitivity and detection performance. This fits the distribution of object categories in specific scene backgrounds and the connection between objects and the image context. Data augmentation is used to improve robustness and attention mechanism added in backbone network to extract channel-wise spatial features and direction information. The results obtained by benchmark experiment reveal that the proposed method can achieve higher real-time detection performance in traffic scenes compared with RetinaNet and FCOS. The proposed method achieved a detection performance of 97.6% and 91.4% in AP50 and AP75 on the BCTSDB dataset, respectively.
Traffic sign detection is an important component of autonomous vehicles. There is still a mismatch problem between the existing detection algorithm and its practical application in real traffic scenes, which is mainly due to the detection accuracy and data acquisition. To tackle this problem, this study proposed an improved sparse R-CNN that integrates coordinate attention block with ResNeSt and builds a feature pyramid to modify the backbone, which enables the extracted features to focus on important information, and improves the detection accuracy. In order to obtain more diverse data, the augmentation method used is specifically designed for complex traffic scenarios, and we also present a traffic sign dataset in this study. For on-road autonomous vehicles, we designed two modules, self-adaption augmentation (SAA) and detection time augmentation (DTA), to improve the robustness of the detection algorithm. The evaluations on traffic sign datasets and on-road testing demonstrate the accuracy and effectiveness of the proposed method.
Vision-based object detection is an essential element of autonomous driving. Because vehicles will typically have limited on-board computing resources, a small-sized detection model is required. At the same time, high object detection accuracy and real-time inference detection speeds are required to ensure safety while driving. In this paper, an anchor-free lightweight object detector for autonomous driving called ALODAD is proposed. ALODAD incorporates an attention scheme into the lightweight neural network GhostNet, and builds an anchor-free detection framework to achieve lower computation costs and provide parameters with high detection accuracy. Specifically, the lightweight backbone neural network integrates a convolutional block attention model that analyzes the valuable features from traffic scene images to generate an accurate bounding box, and then constructs feature pyramids for multi-scale object detection. The proposed method adds an intersection over union (IoU) branch into the decoupled detector to rank the vast numbers of candidate detections accurately. To increase the data diversity, data augmentation is used during training. Extensive experiments based on benchmarks demonstrate that the proposed method offers an improved performance when compared with the baseline. The proposed method can achieve increased detection accuracy while also meeting the real-time requirements of autonomous driving. The proposed method is compared with the YOLOv5 and RetinaNet models and obtains 98.7% and 94.5% for the average precision metrics AP50 and AP75, respectively, on the BCTSDB dataset.
In the field of computer vision, training a well-performing model on a dataset with a long-tail distribution is a challenging task. To address this challenge, image resampling is usually introduced as a simple and effective solution. However, when performing instance segmentation tasks, there may be multiple classes in one image. Hence, image resampling alone is not enough to obtain a sufficiently balanced distribution at the level of target data volume. In this paper, we propose an improved instance segmentation method for long-tail datasets based on Mask R-CNN. Specifically, an object-centric memory bank is used to establish an object-centric storage strategy that can solve the imbalance problem with respect to categories. In the testing phase, a post-processing calibration is used to adjust each class logit to change the confidence score, which improves the prediction score of tail classes. A discrete cosine transform-based mask is used to obtain high-quality masks, which improves segmentation accuracy. The evaluation of the proposed method on the LVIS dataset demonstrates its effectiveness. The proposed method improves the AP performance of EQL by 2.2%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.