State of the Art on 3D Reconstruction with RGB‐D Cameras

Zollhöfer, Michael; Stotko, Patrick; Görlitz, Andreas; Theobalt, Christian; Nießner, Matthias; Klein, Reinhard; Kolb, Andreas

doi:10.1111/cgf.13386

Cited by 256 publications

(145 citation statements)

References 226 publications

Supporting

Mentioning

123

Contrasting

Unclassified

Order By: Relevance

“…Regarding visual SLAM, many open‐source approaches exist but not many can be easily used on a robot (consult Zollhöfer et al () for a review on 3D reconstruction focused approaches). For navigation, to avoid dealing with scale ambiguities, we limit our review to approaches able to estimate the real scale of the environment while mapping (e.g., with stereo and RGB‐D cameras or with visual–inertial odometry), thus excluding structure from motion or monocular SLAM approaches like parallel tracking and mapping (PTAM) (Klein & Murray, ), semi‐direct visual odometry (SVO) (Forster, Pizzoli, & Scaramuzza, ), REgularized MOnocular Depth Estimation (REMODE) (Pizzoli, Forster, & Scaramuzza, ), DT‐SLAM (Herrera, Kim, Kannala, Pulli, & Heikkilä, ), large‐scale direct monocular SLAM (LSD‐SLAM) (Engel, Schöps, & Cremers, ) or oriented FAST and rotated BRIEF (ORB)‐SLAM (Mur‐Artal, Montiel, & Tardos, ).…”

Section: Popular Slam Approaches Available On Rosmentioning

confidence: 99%

RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation

Labbé

Michaud

2018

Journal of Field Robotics

666

325

View full text Add to dashboard Cite

Distributed as an open-source library since 2013, real-time appearance-based mapping (RTAB-Map) started as an appearance-based loop closure detection approach with memory management to deal with large-scale and long-term online operation. It then grew to implement simultaneous localization and mapping (SLAM) on various robots and mobile platforms. As each application brings its own set of constraints on sensors, processing capabilities, and locomotion, it raises the question of which SLAM approach is the most appropriate to use in terms of cost, accuracy, computation power, and ease of integration. Since most of SLAM approaches are either visual-or lidar-based, comparison is difficult. Therefore, we decided to extend RTAB-Map to support both visual and lidar SLAM, providing in one package a tool allowing users to implement and compare a variety of 3D and 2D solutions for a wide range of applications with different robots and sensors. This paper presents this extended version of RTAB-Map and its use in comparing, both quantitatively and qualitatively, a large selection of popular real-world datasets (e.g., KITTI, EuRoC, TUM RGB-D, MIT Stata Center on PR2 robot), outlining strengths, and limitations of visual and lidar SLAM configurations from a practical perspective for autonomous navigation applications. K E Y W O R D S perception, position estimation, SLAM

show abstract

Section: Popular Slam Approaches Available On Rosmentioning

confidence: 99%

RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation

Labbé

Michaud

2018

Journal of Field Robotics

666

325

View full text Add to dashboard Cite

show abstract

“…As universal 3D representations, 3D point clouds "can represent almost any type of physical object, site, landscape, geographic region, or infrastructure-at all scales and with any precision" as Richter (2018) states, who discusses algorithms and data structures for out-of-core processing, analysing, and classifying of 3D point clouds. To acquire 3D point clouds, various technologies can be applied including airborne or terrestrial laser scanning, mobile mapping, RGB-D cameras (Zollhöfer et al 2018), image matching, or multi-beam echo sounding.…”

Section: D Point Cloudsmentioning

confidence: 99%

Geospatial Artificial Intelligence: Potentials of Machine Learning for 3D Point Clouds and Geospatial Digital Twins

Döllner

2020

PFG

View full text Add to dashboard Cite

Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/ DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models.

show abstract

“…It still remains a challenge to obtain accurate depth for casual videos using portable devices. A survey on RGBD camera is written by Zollhöfer [Zollhöfer et al 2018]. Current high-end smartphones such as iPhone X supports depth measurement using dual-pixels and dedicated post-processing to generate smooth, edge-preserving depth maps.…”

Section: Rvr Evaluationmentioning

confidence: 99%

Synthetic defocus and look-ahead autofocus for casual videography

et al. 2019

View full text Add to dashboard Cite

In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos from larger cameras. This paper is about a new approach in the pursuit of cinematic focus for casual videography. We present a system that synthetically renders refocusable video from a deep DOF video shot with a smartphone, and analyzes future video frames to deliver context-aware autofocus for the current frame. To create refocusable video, we extend recent machine learning methods designed for still photography, contributing a new dataset for machine training, a rendering model better suited to cinema focus, and a filtering solution for temporal coherence. To choose focus accurately for each frame, we demonstrate autofocus that looks at upcoming video frames and applies AI-assist modules such as motion, face, audio and saliency detection. We also show that autofocus benefits from machine learning and a large-scale video dataset with focus annotation, where we use our RVR-LAAF GUI to create this sizable dataset efficiently. We deliver, for example, a shallow DOF video where the autofocus transitions onto each person before she begins to speak. This is impossible for conventional camera autofocus because it would require seeing into the future.

show abstract

State of the Art on 3D Reconstruction with RGB‐D Cameras

Cited by 256 publications

References 226 publications

RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation

RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation

Geospatial Artificial Intelligence: Potentials of Machine Learning for 3D Point Clouds and Geospatial Digital Twins

Synthetic defocus and look-ahead autofocus for casual videography

Contact Info

Product

Resources

About