2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9561452
|View full text |Cite
|
Sign up to set email alerts
|

DOT: Dynamic Object Tracking for Visual SLAM

Abstract: Fig. 1. Top row: ORB-SLAM2 [1] tracks on KITTI [2] images. Middle row: ORB-SLAM2 tracks with DOT segmentation masks, which differentiate between moving and static objects. Bottom row: ORB-SLAM2 tracks using Detectron2 [3] segmentation masks, encoding all potentially dynamic objects. Note how DOT segments out actually moving objects (e.g., moving cars), while keeping the static ones (e.g., parked cars).

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 64 publications
(42 citation statements)
references
References 21 publications
(29 reference statements)
0
31
0
Order By: Relevance
“…AcousticFusion [30] fuses sound source direction into the RGB-D image and thus removes the effect of dynamic obstacles on the multi-robot SLAM system, but the robustness wil be reduced in the case of serious noise. DOT [31] combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to avoid such image areas in their optimizations, which reduces the rate at which segmentation should be done and reduces the computational needs with respect to the state of the art. FlowFusion [32] decouple dynamic pixels from static background pixels by comparing camera motion consistency clustering dynamic pixel points and removing them.…”
Section: B Dynamic Slammentioning
confidence: 99%
“…AcousticFusion [30] fuses sound source direction into the RGB-D image and thus removes the effect of dynamic obstacles on the multi-robot SLAM system, but the robustness wil be reduced in the case of serious noise. DOT [31] combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to avoid such image areas in their optimizations, which reduces the rate at which segmentation should be done and reduces the computational needs with respect to the state of the art. FlowFusion [32] decouple dynamic pixels from static background pixels by comparing camera motion consistency clustering dynamic pixel points and removing them.…”
Section: B Dynamic Slammentioning
confidence: 99%
“…Recent works have attempted to handle dynamic changes in the environment, adopting one of two common strategies. The first is to specifically identify static structure classes and treat all potentially dynamic objects, usually extracted with an image-based semantic segmentation network such as Mask R-CNN [30], as outliers, ignoring them completely in localization and mapping [31,32,33,34]. Though this method has proven effective when a small number of fast-moving objects are present, it can fail when used in large, crowded environments, as only a small number of static background structures will remain after dynamic object pruning [35].…”
Section: Handling Of Dynamic Objectsmentioning
confidence: 99%
“…Beyond monocular SLAM, effective segmentation and tracking of dynamic objects can be achieved [1,4,27,32,39,68,78] with auxiliary depth data from stereo, RGB-D and LiDAR, which, however, is not generally available for in-the-wild captured videos. Thanks to the rapid development of deep learning on visual recognition, many works [3,5,85,89,93] tackle this problem by exploring the combination with object detection, semantic and instance segmentation. However, These methods are often restricted to pre-defined semantic classes.…”
Section: Related Workmentioning
confidence: 99%