Visual simultaneous localization and mapping (SLAM) algorithms face challenges in complex underwater scenarios, such as turbidity, dynamism, and low texture, where point features are unreliable and can lead to weakened or even failed systems. To overcome these issues, high-level object features are considered due to their accuracy and robustness. In this paper, we introduce an effective object-level SLAM method that employs a stereo camera to enhance the navigation robustness of autonomous underwater vehicles and generates a detailed semantic map. Point features and object features are integrated to serve the proposed approach. We begin by detecting 2D objects in images using a state-of-the-art neural network, followed by obtaining 3D objects described by the general model through the principle of multi-view geometry and eventually constructing semantic landmarks. To account for object data association, we present an object match method that takes into consideration the stereo camera characteristics in a single stereo frame and a filter-based approach to track the landmarks in odometry. Experiments are also conducted using the KITTI dataset and our sequences collected from the pool and coast. The evaluation results indicate that the proposed method can improve the performance of ORBSLAM2 in terms of both navigation robustness and mapping information in underwater scenarios.
For unmanned surface vehicles (USVs), perception and control are commonly performed in embedded devices with limited computing power. Sea surface object detection can provide sufficient information for USVs, while most algorithms have poor real-time performance on embedded devices. To achieve real-time object detection on the USV platform, this paper designs a lightweight object detection network based on YOLO v5. In our work, an improved ShuffleNet v2 based on the attention mechanism was adopted as a backbone network to extract features. The depth-wise separable convolution module was introduced to rebuild the neck network. Additionally, the fusion method was changed from Concat to Add to optimize the feature fusion module. Experiments show that the proposed method reached 32.64 frames per second (FPS) on the Nvidia Jetson AGX Xavier and achieved a mean average precision (mAP) of 93.1% and 93.9% on our dataset and Singapore Maritime Dataset, respectively. Moreover, the number of model parameters of the proposed network was only 25% of that of YOLO v5n. The proposed network achieves a better balance between speed and accuracy, which is more suitable for detecting sea surface objects for USVs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.