Geometric Constraint-Based and Improved YOLOv5 Semantic SLAM for Dynamic Scenes

Zhang, R.X.; Zhang, Xinguang

doi:10.3390/ijgi12060211

Cited by 7 publications

(5 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In YOLO-SLAM proposed by Wu et al [28], a lightweight object detection network, Darknet19-YOLOv3, is designed to generate the essential semantic information required by the SLAM system with low latency. Zhang and Zhang [29] employed ShuffleNetV2 to lightweight YOLOv5 and added a pyramid-shaped scene parsing network segmentation head, simultaneously achieving target detection and semantic segmentation functionalities. However, these methods based on deep learning typically necessitate prior information and training for specific dynamic objects, leading to insufficient generalization [20].…”

Section: Slam Combines Deep Learningmentioning

confidence: 99%

SLM-SLAM: a visual SLAM system based on segmented large-scale model in dynamic scenes and zero-shot conditions

Zhu,

Chen,

Jiang

et al. 2024

Meas. Sci. Technol.

View full text Add to dashboard Cite

In the real world, the presence of various dynamic objects can impact the localization accuracy of the majority of typical Visual Simultaneous Localization and Mapping (VSLAM) systems. Simultaneously, many dynamic VSLAM systems based on neural networks require pre-training for specific application scenarios. We present SLM-SLAM, the first VSLAM system that implements zero-shot processing of dynamic scenes. It achieves the capability to handle various dynamic objects without the necessity for pre-training, enabling straightforward adaptation to different application scenarios. Firstly, we designed an open-world semantic segmentation module based on a segmented large-scale model to acquire semantic information in the scene. Subsequently, we devised a label-based strategy for selecting feature points, jointly optimizing poses with the weighted labels provided by both semantic and geometric information. Finally, we refined the keyframe selection strategy of ORB-SLAM3 to prevent matching errors caused by an insufficient number of remaining static feature points in the scene. We conducted experiments on the TUM dataset, the KITTI dataset, and real world scenarios. The results indicate that in dynamic scenes, our SLM-SLAM significantly improves localization accuracy compared to ORB-SLAM3, and its performance is comparable to state-of-the-art dynamic VSLAM systems.

show abstract

Section: Slam Combines Deep Learningmentioning

confidence: 99%

SLM-SLAM: a visual SLAM system based on segmented large-scale model in dynamic scenes and zero-shot conditions

Zhu,

Chen,

Jiang

et al. 2024

Meas. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…A number of approaches have also emerged to improve the performance of SLAM algorithms by improving deep learning networks. Zhang, R [17] used ShuffleNetV2 to improve the YOLOv5 network. Meanwhile, to achieve semantic extraction in the environment, the segmentation head of the pyramid scene analysis network is added to the head of the YOLOv5 network, giving the improved YOLOv5 network both target detection and semantic segmentation capabilities.…”

Section: Related Work 21 Visual Slam Based On Deep Learningmentioning

confidence: 99%

DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning

Yu,

Wang,

Yan

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

This work presents a novel RGB-D dynamic Simultaneous Localisation and Mapping (SLAM) method that improves the precision, stability, and efficiency of localisation while relying on lightweight deep learning in a dynamic environment compared to the traditional static feature-based visual SLAM algorithm. Based on ORB-SLAM3, the GCNv2-tiny network instead of the ORB method, improves the reliability of feature extraction and matching and the accuracy of position estimation; then, the semantic segmentation thread employs the lightweight YOLOv5s object detection algorithm based on the GSConv network combined with a depth image to determine potentially dynamic regions of the image. Finally, to guarantee that the static feature points are used for position estimation, dynamic probability is employed to determine the true dynamic feature points based on the optical flow, semantic labels, and the state in last frame. We have performed experiments on the TUM datasets to verify the feasibility of the algorithm. Compared with the classical dynamic visual SLAM algorithm, the experimental results demonstrate that the absolute trajectory error is greatly reduced in dynamic environments, and that the computing efficiency is improved by 31.54% compared with the real-time dynamic visual SLAM algorithm with close accuracy, demonstrating the superiority of DLD-SLAM in accuracy, stability, and efficiency.

show abstract

“…Min et al [37] not only combined the semantic information obtained by YOLOv5 with epipolar geometry constraint, but also introduced blur filtering to solve the problem of blurred image motion caused by capturing rapidly moving objects. Zhang and Zhang [38] used ShuffleNetV2 to lighten the YOLOv5 network, thereby increasing network speed without compromising system accuracy. Song et al [39] employed the newer and faster YOLOv7 network to extract semantic information, which they closely integrated with geometric information to filter dynamic points, achieving high localization accuracy and robustness in both high and low dynamic environments.…”

Section: Deep Learning-based Visual Slam In Dynamic Scenesmentioning

confidence: 99%

AFO-SLAM: an improved visual SLAM in dynamic scenes using acceleration of feature extraction and object detection

Wei,

Deng,

Wang

et al. 2024

Meas. Sci. Technol.

View full text Add to dashboard Cite

In visual simultaneous localization and mapping (SLAM) systems, traditional methods often excel due to rigid environmental assumptions, but face challenges in dynamic environments. To address this, learning-based approaches have been introduced, but their expensive computing costs hinder real-time performance, especially on embedded mobile platforms. In this article, we propose a robust and real-time visual SLAM method towards dynamic environments using acceleration of feature extraction and object detection (AFO-SLAM). First, AFO-SLAM employs an independent object detection thread that utilizes YOLOv5 to extract semantic information and identify the bounding boxes of moving objects. To preserve the background points within these boxes, depth information is utilized to segment target foreground and background with only a single frame, with the points of the foreground area considered as dynamic points and then rejected. To optimize performance, CUDA program accelerates feature extraction preceding point removal. Finally, extensive evaluations are performed on both TUM RGB-D dataset and real scenes using a low-power embedded platform. Experimental results demonstrate that AFO-SLAM offers a balance between accuracy and real-time performance on embedded platforms, and enables the generation of dense point cloud maps in dynamic scenarios.

show abstract

Geometric Constraint-Based and Improved YOLOv5 Semantic SLAM for Dynamic Scenes

Cited by 7 publications

References 34 publications

SLM-SLAM: a visual SLAM system based on segmented large-scale model in dynamic scenes and zero-shot conditions

SLM-SLAM: a visual SLAM system based on segmented large-scale model in dynamic scenes and zero-shot conditions

DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning

AFO-SLAM: an improved visual SLAM in dynamic scenes using acceleration of feature extraction and object detection

Contact Info

Product

Resources

About