DOT: Dynamic Object Tracking for Visual SLAM

Ballester, Irene; Fontan, Alejandro; Civera, Javier; Strobl, Klaus H.; Triebel, Rudolph

doi:10.1109/icra48506.2021.9561452

Cited by 64 publications

(42 citation statements)

References 21 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…AcousticFusion [30] fuses sound source direction into the RGB-D image and thus removes the effect of dynamic obstacles on the multi-robot SLAM system, but the robustness wil be reduced in the case of serious noise. DOT [31] combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to avoid such image areas in their optimizations, which reduces the rate at which segmentation should be done and reduces the computational needs with respect to the state of the art. FlowFusion [32] decouple dynamic pixels from static background pixels by comparing camera motion consistency clustering dynamic pixel points and removing them.…”

Section: B Dynamic Slammentioning

confidence: 99%

SIIS-SLAM: A Vision SLAM Based on Sequential Image Instance Segmentation

Zhang

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Simultaneous localization and mapping (SLAM) is a fundamental function of intelligent robots. To reduce the influence of dynamic objects on SLAM in dynamic environments, this study pro-poses a visual SLAM based on sequential image segmentation, referred to as SIIS-SLAM. Based on ORB-SLAM3, SIIS-SLAM integrates the sequential image instance segmentation and optical flow dynamic detection module. The sequential image segmentation module is designed to eliminate the effectiveness of dynamic objects in the estimation of relative pose between sequential frames. Specifically, based on the coarse relative pose estimated by ORB-SLAM3 and the box coordinates of instances detected by Mask R-CNN, the sequential image segmentation module effectively improves the speed and accuracy of instance segmentation. Dynamic objects can be effectively detected by combining the instance segmentation results and optical flow module. Filtering the feature points in dynamic objects can improve the accuracy and robustness of SLAM. Experimental results demonstrate that SIIS-SLAM achieves the better accuracy in dynamic environments compared to ORB SLAM3 and other advanced methods.

show abstract

Section: B Dynamic Slammentioning

confidence: 99%

SIIS-SLAM: A Vision SLAM Based on Sequential Image Instance Segmentation

Zhang

et al. 2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Recent works have attempted to handle dynamic changes in the environment, adopting one of two common strategies. The first is to specifically identify static structure classes and treat all potentially dynamic objects, usually extracted with an image-based semantic segmentation network such as Mask R-CNN [30], as outliers, ignoring them completely in localization and mapping [31,32,33,34]. Though this method has proven effective when a small number of fast-moving objects are present, it can fail when used in large, crowded environments, as only a small number of static background structures will remain after dynamic object pruning [35].…”

Section: Handling Of Dynamic Objectsmentioning

confidence: 99%

POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes

Qian¹,

Veronica²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Maintaining an up-to-date map to reflect recent changes in the scene is very important, particularly in situations involving repeated traversals by a robot operating in an environment over an extended period. Undetected changes may cause a deterioration in map quality, leading to poor localization, inefficient operations, and lost robots. Volumetric methods, such as truncated signed distance functions (TSDFs), have quickly gained traction due to their real-time production of a dense and detailed map, though map updating in scenes that change over time remains a challenge. We propose a framework that introduces a novel probabilistic object state representation to track object pose changes in semi-static scenes. The representation jointly models a stationarity score and a TSDF change measure for each object. A Bayesian update rule that incorporates both geometric and semantic information is derived to achieve consistent online map maintenance. To extensively evaluate our approach alongside the state-of-the-art, we release a novel real-world dataset in a warehouse environment. We also evaluate on the public ToyCar dataset. Our method outperforms state-of-the-art methods on the reconstruction quality of semi-static environments.

show abstract

“…Beyond monocular SLAM, effective segmentation and tracking of dynamic objects can be achieved [1,4,27,32,39,68,78] with auxiliary depth data from stereo, RGB-D and LiDAR, which, however, is not generally available for in-the-wild captured videos. Thanks to the rapid development of deep learning on visual recognition, many works [3,5,85,89,93] tackle this problem by exploring the combination with object detection, semantic and instance segmentation. However, These methods are often restricted to pre-defined semantic classes.…”

Section: Related Workmentioning

confidence: 99%

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild

Wang¹,

Liu²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Estimating the pose of a moving camera from monocular video is a challenging problem, especially due to the presence of moving objects in dynamic environments, where the performance of existing camera pose estimation methods are susceptible to pixels that are not geometrically consistent. To tackle this challenge, we present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence initialized from pairwise optical flow. Our key idea is to optimize long-range video correspondence as dense point trajectories and use it to learn robust estimation of motion segmentation. A novel neural network architecture is proposed for processing irregular point trajectory data. Camera poses are then estimated and optimized with global bundle adjustment over the portion of long-range point trajectories that are classified as static. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories compared to existing state-of-the-art methods. In addition, our method is able to retain reasonable accuracy of camera poses on fully static scenes, which consistently outperforms strong state-of-the-art dense correspondence based methods with end-to-end deep learning, demonstrating the potential of dense indirect methods based on optical flow and point trajectories. As the point trajectory representation is general, we further present results and comparisons on in-the-wild monocular videos with complex motion of dynamic objects. Code is available at https://github.com/bytedance/particle-sfm.

show abstract

DOT: Dynamic Object Tracking for Visual SLAM

Cited by 64 publications

References 21 publications

SIIS-SLAM: A Vision SLAM Based on Sequential Image Instance Segmentation

SIIS-SLAM: A Vision SLAM Based on Sequential Image Instance Segmentation

POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild

Contact Info

Product

Resources

About