SA-YOLOv3: an efficient and accurate object detector using self-attention mechanism for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(5), pp. 4099-4110.
Self-driving vehicles require a number of tests to prevent fatal accidents and ensure their appropriate operation in the physical world. However, conducting vehicle tests on the road is difficult because such tests are expensive and labor intensive. In this study, we used an autonomous-driving simulator, and investigated the three-dimensional environmental perception problem of the simulated system. Using the open-source CARLA simulator, we generated a CarlaSim from unreal traffic scenarios, comprising 15 000 camera-LiDAR (Light Detection and Ranging) samples with annotations and calibration files. Then, we developed Multi-Sensor Fusion Perception (MSFP) model for consuming two-modal data and detecting objects in the scenes. Furthermore, we conducted experiments on the KITTI and CarlaSim datasets; the results demonstrated the effectiveness of our proposed methods in terms of perception accuracy, inference efficiency, and generalization performance. The results of this study will faciliate the future development of autonomous-driving simulated tests.
Camera-LiDAR 3D object detection has been extensively investigated due to its significance for many real-world applications. However, there are still of great challenges to address the intrinsic data difference and perform accurate feature fusion among two modalities. To these ends, we propose a two-stream architecture termed as CL3D, that integrates with point enhancement module, point-guided fusion module with IoU-aware head for cross-modal 3D object detection. Specifically, pseudo LiDAR is firstly generated from RGB image, and point enhancement module (PEM) is then designed to enhance the raw LiDAR with pseudo point. Moreover, point-guided fusion module (PFM) is developed to find image-point correspondence at different resolutions, and incorporate semantic with geometric features in a point-wise manner. We also investigate the inconsistency between localization confidence and classification score in 3D detection, and introduce IoU-aware prediction head (IoU Head) for accurate box regression. Comprehensive experiments are conducted on publicly available KITTI dataset, and CL3D reports the outstanding detection performance compared to both single-and multi-modal 3D detectors, demonstrating its effectiveness and competitiveness.
Automated lane marking detection is essential for advanced driver assistance system (ADAS) and pavement management work. However, prior research has mostly detected lane marking segments from a front-view image, which easily suffers from occlusion or noise disturbance. In this paper, we aim at accurate and robust lane marking detection from a top-view perspective, and propose a deep learning-based detector with adaptive anchor scheme, referred to as A2-LMDet. On the one hand, it is an end-to-end framework that fuses feature extraction and object detection into a single deep convolutional neural network. On the other hand, the adaptive anchor scheme is designed by formulating a bilinear interpolation algorithm, and is used to guide specific-anchor box generation and informative feature extraction. To validate the proposed method, a newly built lane marking dataset contained 24,000 high-resolution laser imaging data is further developed for case study. Quantitative and qualitative results demonstrate that A2-LMDet achieves highly accurate performance with 0.9927 precision, 0.9612 recall, and a 0.9767 [Formula: see text] score, which outperforms other advanced methods by a considerable margin. Moreover, ablation analysis illustrates the effectiveness of the adaptive anchor scheme for enhancing feature representation and performance improvement. We expect our work will help the development of related research.
Prior convolution-based road crack detectors typically learn more abstract visual representation with increasing receptive field via an encoder–decoder architecture. Despite the promising accuracy, progressive spatial resolution reduction causes semantic feature blurring, leading to coarse and incontiguous distress detection. To these ends, an alternative sequence-to-sequence perspective with a transformer network termed TransCrack is introduced for road crack detection. Specifically, an image is decomposed into a grid of fixed-size crack patches, which is flattened with position embedding into a sequence. We further propose a pure transformer-based encoder with multi-head reduced self-attention modules and feed-forward networks for explicitly modelling long-range dependencies from the sequential input in a global receptive field. More importantly, a simple decoder with cross-layer aggregation architecture is developed to incorporate global with local attentions across different regions for detailed feature recovery and pixel-wise crack mask prediction. Empirical studies are conducted on three publicly available damage detection benchmarks. The proposed TransCrack achieves a state-of-the-art performance over all counterparts by a substantialmargin, and qualitative results further demonstrate its superiority in contiguous crack recognition and fine-grained profile extraction.
This article is part of the theme issue ‘Artificial intelligence in failure analysis of transportation infrastructure and materials’.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.