MISD-SLAM: Multimodal Semantic SLAM for Dynamic Environments

You, Yingxuan; Peng, Wei; Cai, Jialun; Huang, Weibo; Kang, Risheng; Liu, Hong

doi:10.1155/2022/7600669

Cited by 17 publications

(11 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wu et al [32] adopted the Darknet19-YOLOv3 network and depth difference with RANSAC to distinguish dynamic features in the detecting areas. You et al [33] used instance segmentation by YOLOCT++ to remove pure dynamic feature points, using the error of reprojection depth as a threshold.…”

Section: Combination Of Geometry and Learning Methodsmentioning

confidence: 99%

SCE-SLAM: a real-time semantic RGBD SLAM system in dynamic scenes based on spatial coordinate error

Song

Zhang

Zhong

et al. 2023

Meas. Sci. Technol.

View full text Add to dashboard Cite

Simultaneous Localization and Mapping (SLAM) is one of the prerequisite technologies for intelligent mobile robots to accomplish various tasks in unknown environments. In recent years, many excellent SLAM systems have emerged, but most of them have a basic assumption that the environment is static, which results in their poor performance in dynamic environments. To solve this problem, this paper presents SCE-SLAM: a novel real-time semantic RGB-D SLAM system that is built on the RGB-D mode of ORB-SLAM3. SCE-SLAM tightly combines semantic and geometric information. Considering the real-time requirements, the semantic module provides semantic prior knowledge for the geometric module using the latest and fastest object detection network YOLOv7. Then, a new geometric constraint method is proposed to filter dynamic feature points. This method takes full advantage of depth images and semantic information to recover three-dimensional feature points and the initial camera pose. A three-dimensional coordinate error is used as a threshold, and SCE-SLAM removes dynamic points using the K-means clustering algorithm. In this way, SCE-SLAM effectively reduces the impact of dynamic points. Furthermore, we validate SCE-SLAM with challenging dynamic sequences of the TUM dataset. The results demonstrate that SCE-SLAM significantly improves the localization accuracy and system robustness in all kinds of dynamic environments.

show abstract

Section: Combination Of Geometry and Learning Methodsmentioning

confidence: 99%

SCE-SLAM: a real-time semantic RGBD SLAM system in dynamic scenes based on spatial coordinate error

Song

Zhang

Zhong

et al. 2023

Meas. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…This system combines the semantic segmentation network SegNet providing semantic priori information with motion feature point detection to filter out dynamic objects in each frame, thus improving the accuracy of pose estimation while building a semantic octree map to meet a wider range of needs. Literature [23] designs a novel multimodal semantic SLAM system, which uses instance segmentation networks to provide semantic knowledge of the surrounding environment, directly removes ORB features from predefined dynamic objects, and combines multi-view geometric constraints with a K-means clustering algorithm to remove undefined but moving pixels. Literature [24] uses the lightweight YOLOv3 to change the backbone network of the detection model from darknet-53 to darknet-19, speeding up the detection efficiency by providing necessary semantic information in a dynamic environment.…”

Section: Combining Deep Learning Methods For Extracting Dynamic Featuresmentioning

confidence: 99%

DIG-SLAM: an accurate RGB-D SLAM based on instance segmentation and geometric clustering for dynamic indoor scenes

Liang,

Yuan,

Kuang

et al. 2023

Meas. Sci. Technol.

View full text Add to dashboard Cite

Simultaneous localization and mapping (SLAM) has emerged as a critical technology enabling robots to navigate in unknown environments, drawing extensive attention within the robotics research community. However, traditional visual SLAM ignores the presence of dynamic objects in indoor scenes, and dynamic point features of dynamic objects can lead to incorrect data correlation, making the traditional visual SLAM is difficult to accurately estimate the camera’s pose when the objects in the scenes are moving. Using only point features cannot fully extract geometric information in dynamic indoor scenes, reducing the system’s robustness. To solve this problem, we develop a RGB-D SLAM system called DIG-SLAM. Firstly, the objects’ contour regions are extracted using the YOLOv7 instance segmentation method, serving as a prerequisite for determining dynamic objects and constructing a semantic information map. Meanwhile, the line features are extracted using the line segment detector (LSD) algorithm, and the redundant line features are optimized via K-means clustering. Secondly, moving consistency checks combined with instance partitioning determine dynamic regions, and the point and line features of the dynamic regions are removed. Finally, the combination of static line features and point features optimizes the camera pose. Meanwhile, a static semantic octree map is created to provide richer and higher-level scene understanding and perception capabilities for robots or autonomous systems. The experimental results on the Technische Universität München (TUM) dataset show that the average absolute trajectory error of the developed DIG-SLAM is reduced by 28.68% compared with the dynamic semantic SLAM (DS-SLAM). Compared with other dynamic SLAM methods, the proposed system shows better camera pose estimation accuracy and system’s robustness in dynamic indoor environments and better map building in real indoor scenes.

show abstract

“…The development of multimodal representations of the environment using a single sensor has emerged as a crucial area of research in the field of SLAM [11]. Such representations can effectively capture complementary information from diverse aspects of the scene, enriching the overall understanding of the environment and leading to more accurate and robust perception systems [12]. By using a single sensor to derive multimodal data, the complexity of hardware integration can be minimized; which reduces the overall cost of the system, and alleviate calibration and synchronization issues that may arise when using multiple sensors [13].…”

Section: Problem Presentationmentioning

confidence: 99%

Real-time embedded large-scale place recognition for autonomous ground vehicles using a spatial descriptor

Chghaf

Rodríguez

Elouardi

et al. 2023

Real-Time Processing of Image, Depth and Video Information 2023

View full text Add to dashboard Cite

Place recognition is a key task in an autonomous vehicle's Simultaneous Localization and Mapping (SLAM). The motion estimation is bound to drift over time due to cumulative errors. Fortunately, the correct identification of a revisited area provided by the place recognition module enables further optimizations that correct drifting errors if detected in realtime. Place recognition based on structural information of the scene is more robust to luminosity changes that can lead to false detections in the case of feature-based descriptors. However, they were mainly investigated in the context of depth sensors. Inspired by a LiDAR-based descriptor [1], we extent this global geometric descriptor to structural information from stereo vision system. Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. First, we introduce the approach used to record the 3D structural information of the visible space based on stereo images. Then, we conduct a parametric optimization protocol for precise place recognition in a given environment. Our experiments on the KITTI dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm's complexity to propose an optimized parallelization on GPU and SoC architectures. Performance evaluation on different hardware (GeForce RTX 3080, Jetson AGX Xavier, and Arria 10 SoC-FPGA) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 16x, depending on the architecture.

show abstract

MISD-SLAM: Multimodal Semantic SLAM for Dynamic Environments

Cited by 17 publications

References 34 publications

SCE-SLAM: a real-time semantic RGBD SLAM system in dynamic scenes based on spatial coordinate error

SCE-SLAM: a real-time semantic RGBD SLAM system in dynamic scenes based on spatial coordinate error

DIG-SLAM: an accurate RGB-D SLAM based on instance segmentation and geometric clustering for dynamic indoor scenes

Real-time embedded large-scale place recognition for autonomous ground vehicles using a spatial descriptor

Contact Info

Product

Resources

About