Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Recently, end-to-end detectors based on transformer (DETRs) have made remarkable progress. However, their high computational cost still limits the performance of the DETRs series as real-time object detectors. In order to solve this problem, we introduce the Van-DETR model, which enhances the first real-time end-to-end object detector, RT-DETR. Specifically, we innovatively introduce a new, more lightweight backbone—VanillaNet, replacing the former backbone—ResNet. To address its weak nonlinearity and poor local analysis, we combine large kernel convolutions with small kernel convolutions to integrate global and local information, significantly enhancing feature extraction capabilities. Secondly, in the hybrid encoder, we cascade group process the features extracted by the backbone and design a gated linear unit with a star-shaped connection for intra-scale feature interaction. During the cross-scale feature fusion stage, we propose a high-low frequency feature fusion module with strong feature representation capabilities. To verify the effectiveness of the model, we conduct experiments on two public object detection datasets—visdrone dataset and a people dataset from roboflow. Experimental results show that the proposed Van-DETR model achieves MAP50 of 0.471 and 0.730 on two object detection datasets, respectively, representing improvements of 4.5% and 2.8% over the original RT-DETR model. Source code is available at https://github.com/vangoghzz/Van-DETR.
Recently, end-to-end detectors based on transformer (DETRs) have made remarkable progress. However, their high computational cost still limits the performance of the DETRs series as real-time object detectors. In order to solve this problem, we introduce the Van-DETR model, which enhances the first real-time end-to-end object detector, RT-DETR. Specifically, we innovatively introduce a new, more lightweight backbone—VanillaNet, replacing the former backbone—ResNet. To address its weak nonlinearity and poor local analysis, we combine large kernel convolutions with small kernel convolutions to integrate global and local information, significantly enhancing feature extraction capabilities. Secondly, in the hybrid encoder, we cascade group process the features extracted by the backbone and design a gated linear unit with a star-shaped connection for intra-scale feature interaction. During the cross-scale feature fusion stage, we propose a high-low frequency feature fusion module with strong feature representation capabilities. To verify the effectiveness of the model, we conduct experiments on two public object detection datasets—visdrone dataset and a people dataset from roboflow. Experimental results show that the proposed Van-DETR model achieves MAP50 of 0.471 and 0.730 on two object detection datasets, respectively, representing improvements of 4.5% and 2.8% over the original RT-DETR model. Source code is available at https://github.com/vangoghzz/Van-DETR.
Robust object detection and weather classification are essential for the safe operation of autonomous vehicles (AVs) in adverse weather conditions. While existing research often treats these tasks separately, this paper proposes a novel multi objectives model that treats weather classification and object detection as a single problem using only the AV camera sensing system. Our model offers enhanced efficiency and potential performance gains by integrating image quality assessment, Super-Resolution Generative Adversarial Network (SRGAN), and a modified version of You Only Look Once (YOLO) version 5. Additionally, by leveraging the challenging Detection in Adverse Weather Nature (DAWN) dataset, which includes four types of severe weather conditions, including the often-overlooked sandy weather, we have conducted several augmentation techniques, resulting in a significant expansion of the dataset from 1027 images to 2046 images. Furthermore, we optimize the YOLO architecture for robust detection of six object classes (car, cyclist, pedestrian, motorcycle, bus, truck) across adverse weather scenarios. Comprehensive experiments demonstrate the effectiveness of our approach, achieving a mean average precision (mAP) of 74.6%, underscoring the potential of this multi objectives model to significantly advance the perception capabilities of autonomous vehicles’ cameras in challenging environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.