2022
DOI: 10.48550/arxiv.2209.07919
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Mapping and Deep Feature Tracking

Abstract: We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a feature-based deep neural tracker as the front-end and a NeRF-style neural implicit mapper as the back-end. The neural implicit mapper is trained on-the-fly, while though the neural tracker is pretrained on the ScanNet dataset, it is also finetuned along with the training of the neural implicit mapper. Under such a design, our iDF-SLAM is capable of learning to use scene-specific features for camera tracking, thus enabling lifelong learning of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 27 publications
(52 reference statements)
0
4
0
Order By: Relevance
“…For example, ScanNet [55] and NYU-Dv2 [56] have RGBD images with semantic segmentation labels, and are commonly used to train deep-learning systems to integrate semantic information into navigation. Whilst NYU-Dv2 lacks the ground-truth camera poses needed for vSLAM benchmarking, ScanNet does have this ground-truth data and has started seeing usage for evaluation purposes, especially for neural implicit SLAM [57,58].…”
Section: Visual Slam Datasetsmentioning
confidence: 99%
“…For example, ScanNet [55] and NYU-Dv2 [56] have RGBD images with semantic segmentation labels, and are commonly used to train deep-learning systems to integrate semantic information into navigation. Whilst NYU-Dv2 lacks the ground-truth camera poses needed for vSLAM benchmarking, ScanNet does have this ground-truth data and has started seeing usage for evaluation purposes, especially for neural implicit SLAM [57,58].…”
Section: Visual Slam Datasetsmentioning
confidence: 99%
“…scenes, while NICE-SLAM can scale up to much larger indoor environments by applying hierarchical feature grids and tiny MLPs as the scene representation. Many follow-up works improve upon these two works from various perspectives, including efficient scene representation [18,21], fast optimziation [67], add IMU measurements [25], or different shape representations [39,30]. However, all of them require RGB-D inputs, which limits their applications in outdoor scenes or when only RGB sensors are available.…”
Section: Input Rgb Streammentioning
confidence: 99%
“…NICE-SLAM [76] introduces a hierarchical implicit encoding to perform map-ping and camera tracking in much larger indoor scenes. Although follow-up works [67,30,21,18,25,39] try to improve upon NICE-SLAM and iMAP from different perspectives, all of these works still rely on the reliable depth input from RGB-D sensors.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to iMAP, it updates only the visible grid features at each step, effectively solving the forgetting problem. Subsequent works have made further improvements, including the integration with traditional voxel grids [34] and different shape representations [35], [36]. In contrast to these methods that focus on dense reconstruction of the scene, our approach emphasizes object instances with semantic meaning.…”
Section: B Nerfs and Nerf-based Slammentioning
confidence: 99%