Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.
Sound speed profiles (SSPs) have a great impact on the accuracy of underwater localization and sonar ranging. In traditional SSP inversion, the sound intensity distribution used in normal mode theory-based matching field processing (MFP) or the multipath signal propagation time adopted in ray theory-based MFP is susceptible to boundary parameter mismatch issues, which reduces the inversion accuracy. Moreover, heuristic algorithms introduced in the MFP require many individuals and iterations to search for the optimal feature representation coefficients after the empirical orthogonal function (EOF) decomposition, which causes extra computational time. In this paper, we propose a two-way interactive signal propagation time measurement method based on an autonomous underwater vehicle (AUV) and a horizontal linear array (HLA), and we apply the propagation time of direct arrival signals for shallow-water SSP inversion to avoid the boundary parameter mismatch. We propose a joint artificial neural network (ANN) and ray theory SSP inversion model to reduce the computational time at the working phase by fitting the nonlinear relationship from the signal propagation time to the SSP, and once the relationship is established, the goal of reducing the computational time can be achieved. To make the ANN better learn the SSP distribution in a target region and ensure a good inversion accuracy, we give an empirical data selection strategy. Then we propose a virtual SSP generation algorithm to help ANN training in the case of under-fitting problems caused by insufficient training data. Simulation results show that our approach can provide a reliable and instantaneous monitoring of shallow-water SSP distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.