Both transformer and one-stage detectors have shown promising object detection results and have attracted increasing attention. However, the developments in effective domain adaptive techniques in transformer and one-stage detectors still have not been widely used. In this paper, we investigate this issue and propose a novel improved You Only Look Once (YOLO) model based on a cross-attention strategy transformer, called CAST-YOLO. This detector is a Teacher–Student knowledge transfer-based detector. We design a transformer encoder layer (TE-Layer) and a convolutional block attention module (CBAM) to capture global and rich contextual information. Then, the detector implements cross-domain object detection through the knowledge distillation method. Specifically, we propose a cross-attention strategy transformer to align domain-invariant features between the source and target domains. This strategy consists of three transformers with shared weights, identified as the source branch, target branch, and cross branch. The feature alignment uses knowledge distillation, to address better knowledge transfer from the source domain to the target domain. The above strategy provides better robustness for a model with noisy input. Extensive experiments show that our method outperforms the existing methods in foggy weather adaptive detection, significantly improving the detection results.
Three-dimensional (3D) object detection has a vital effect on the environmental awareness task of autonomous driving scenarios. At present, the accuracy of 3D object detection has significant improvement potential. In addition, a 3D point cloud is not uniformly distributed on a regular grid because of its disorder, dispersion, and sparseness. The strategy of the convolution neural networks (CNNs) for 3D point cloud feature extraction has the limitations of potential information loss and empty operation. Therefore, we propose a graph neural network (GNN) detector based on neighbor feature alignment mechanism for 3D object detection in LiDAR point clouds. This method exploits the structural information of graphs, and it aggregates the neighbor and edge features to update the state of vertices during the iteration process. This method enables the reduction of the offset error of the vertices, and ensures the invariance of the point cloud in the spatial domain. For experiments performed on the KITTI public benchmark, the results demonstrate that the proposed method achieves competitive experimental results.
The object detection task usually assumes that the training and test samples obey the same distribution, and this assumption is not valid in reality, therefore the study of cross-domain object detection is proposed. Compared with image classification, the cross-domain object detection task presents the greater challenge, which requires both accurate classification and localization of samples in the target domain. The teacher–student framework (the student model is supervised by pseudo-labels from the teacher model) has produced a large accuracy improvement in cross-domain object detection. Feature-level adversarial training is used in the student model, which allows features in the source and target domains to share a similar distribution. However, the direction and gradient of the weights can be divided into domain-specific and domain-invariant features, and the purpose of domain adaptive is to focus on the domain-invariant features while eliminating interference from the domain-specific features. Inspired by this, we propose a teacher–student framework named dual adaptive branch (DAB), which uses domain adversarial learning to address the domain distribution. Specifically, we ensure that the student model aligns domain-invariant features and suppresses domain-specific features in this process. We further validate our method based on multiple domains. The experimental results demonstrate that our proposed method significantly improves the performance of cross-domain object detection and achieves the competitive experimental results on common benchmarks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.