VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points

Yokozuka, Masashi; Oishi, Shuji; Thompson, Simon G.; Banno, Atsuhiko

doi:10.1109/cvpr.2019.00987

Cited by 32 publications

(16 citation statements)

References 34 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent work investigates CPU-based approaches in combination with RGB-D sensing (e.g., Wald et al, 2018), PanopticFusion (Narita et al, 2019), and Voxblox++ (Grinvald et al, 2019). A sparser set of contributions addresses other sensing modalities, including monocular cameras (e.g., CNN-SLAM (Tateno et al, 2017), VSO (Lianos et al, 2018), VITAMIN-E (Yokozuka et al, 2019), and XIVO (Dong et al, 2017)) and lidar (Behley et al, 2019; Dubé et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Rosinol

Violette

Abate

et al. 2021

The International Journal of Robotics Research

131

View full text Add to dashboard Cite

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots’ internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, and voxels), or as a collection of objects. This article attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D dynamic scene graph (DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatiotemporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual–inertial data. Kimera includes accurate algorithms for visual–inertial simultaneous localization and mapping (SLAM), metric–semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves competitive performance in visual–inertial SLAM, estimates an accurate 3D metric–semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution is to showcase how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera have been released open source.

show abstract

Section: Related Workmentioning

confidence: 99%

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Rosinol

Violette

Abate

et al. 2021

The International Journal of Robotics Research

131

View full text Add to dashboard Cite

show abstract

“…Approaches in SLAM can generally be classified based on the type of sensors used to sense the environment. LiDAR SLAM uses LiDAR sensors at its core while visual SLAM uses cameras as a main sensor such as monocular-camera [2]- [6], stereo [7]- [9], RGB-D [8], [10], [11], etc. Monocular-based approaches, although widely adopted due to their simple and economical setup, suffer from recovering the metric scale, and that integrating with an IMU (either loosely-coupled or tightly-coupled), referred to with a prefix visual-inertial, offers a main advantage of solving scale ambiguity and providing a more robust navigation.…”

Section: Related Workmentioning

confidence: 99%

“…AutoStitch 5 , OpenPano 6 , issues with memory management [65], we could only be able to compare our system against Agisoft Photoscan 7 (AS) and Context Capture 8 (CC). We use the same dataset as that used in Pose estimation evaluation, and the computational results are presented in Tab.…”

Section: Pose Estimation Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

SurfaceView: Seamless and Tile-Based Orthomosaics Using Millions of Street-Level Images From Vehicle-Mounted Cameras

Tanathong¹,

Smith

Remde³

2022

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

“…Bundle Adjustment [112], consisting on the joint optimization of a set of camera poses and points, is frequently used to obtain a globally consistent model of the scene [79]. However, there are also several recent VSLAM approaches that alternate the optimization between points and poses, reducing the computational cost with a small impact in the accuracy, given a sufficient number of points [84,94,120,123]. In its most basic form, the map model consists of a set of n points and m RGB-D keyframes.…”

Section: Point-based Mappingmentioning

confidence: 99%

RGB-D Odometry and SLAM

Civera

Lee

2019

Advances in Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

The emergence of modern RGB-D sensors had a significant impact in many application fields, including robotics, augmented reality (AR) and 3D scanning. They are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction. These merits have made them very popular in mobile robotics and AR, where it is of great interest to estimate egomotion and 3D scene structure. Such spatial understanding can enable robots to navigate autonomously without collisions and allow users to insert virtual entities consistent with the image stream. In this chapter, we review common formulations of odometry and Simultaneous Localization and Mapping (known by its acronym SLAM) using RGB-D stream input. The two topics are closely related, as the former aims to track the incremental camera motion with respect to a local map of the scene, and the latter to jointly estimate the camera trajectory and the global map with consistency. In both cases, the standard approaches minimize a cost function using nonlinear optimization techniques. This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors. We also give mathematical preliminaries relevant to most odometry and SLAM algorithms. In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing. For each component, we describe different approaches proposed in the literature. In the final part, we provide a brief discussion on advanced research topics with the references to the state-of-the-art.

show abstract

VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points

Cited by 32 publications

References 34 publications

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

SurfaceView: Seamless and Tile-Based Orthomosaics Using Millions of Street-Level Images From Vehicle-Mounted Cameras

RGB-D Odometry and SLAM

Contact Info

Product

Resources

About