Abstract:The advent of affordable consumer grade RGB‐D cameras has brought about a profound advancement of visual scene reconstruction methods. Both computer graphics and computer vision researchers spend significant effort to develop entirely new algorithms to capture comprehensive shape models of static and dynamic scenes with RGB‐D cameras. This led to significant advances of the state of the art along several dimensions. Some methods achieve very high reconstruction detail, despite limited sensor resolution. Others… Show more
“…Regarding visual SLAM, many open‐source approaches exist but not many can be easily used on a robot (consult Zollhöfer et al () for a review on 3D reconstruction focused approaches). For navigation, to avoid dealing with scale ambiguities, we limit our review to approaches able to estimate the real scale of the environment while mapping (e.g., with stereo and RGB‐D cameras or with visual–inertial odometry), thus excluding structure from motion or monocular SLAM approaches like parallel tracking and mapping (PTAM) (Klein & Murray, ), semi‐direct visual odometry (SVO) (Forster, Pizzoli, & Scaramuzza, ), REgularized MOnocular Depth Estimation (REMODE) (Pizzoli, Forster, & Scaramuzza, ), DT‐SLAM (Herrera, Kim, Kannala, Pulli, & Heikkilä, ), large‐scale direct monocular SLAM (LSD‐SLAM) (Engel, Schöps, & Cremers, ) or oriented FAST and rotated BRIEF (ORB)‐SLAM (Mur‐Artal, Montiel, & Tardos, ).…”
Section: Popular Slam Approaches Available On Rosmentioning
Distributed as an open-source library since 2013, real-time appearance-based mapping (RTAB-Map) started as an appearance-based loop closure detection approach with memory management to deal with large-scale and long-term online operation. It then grew to implement simultaneous localization and mapping (SLAM) on various robots and mobile platforms. As each application brings its own set of constraints on sensors, processing capabilities, and locomotion, it raises the question of which SLAM approach is the most appropriate to use in terms of cost, accuracy, computation power, and ease of integration. Since most of SLAM approaches are either visual-or lidar-based, comparison is difficult. Therefore, we decided to extend RTAB-Map to support both visual and lidar SLAM, providing in one package a tool allowing users to implement and compare a variety of 3D and 2D solutions for a wide range of applications with different robots and sensors. This paper presents this extended version of RTAB-Map and its use in comparing, both quantitatively and qualitatively, a large selection of popular real-world datasets (e.g., KITTI, EuRoC, TUM RGB-D, MIT Stata Center on PR2 robot), outlining strengths, and limitations of visual and lidar SLAM configurations from a practical perspective for autonomous navigation applications. K E Y W O R D S perception, position estimation, SLAM
“…Regarding visual SLAM, many open‐source approaches exist but not many can be easily used on a robot (consult Zollhöfer et al () for a review on 3D reconstruction focused approaches). For navigation, to avoid dealing with scale ambiguities, we limit our review to approaches able to estimate the real scale of the environment while mapping (e.g., with stereo and RGB‐D cameras or with visual–inertial odometry), thus excluding structure from motion or monocular SLAM approaches like parallel tracking and mapping (PTAM) (Klein & Murray, ), semi‐direct visual odometry (SVO) (Forster, Pizzoli, & Scaramuzza, ), REgularized MOnocular Depth Estimation (REMODE) (Pizzoli, Forster, & Scaramuzza, ), DT‐SLAM (Herrera, Kim, Kannala, Pulli, & Heikkilä, ), large‐scale direct monocular SLAM (LSD‐SLAM) (Engel, Schöps, & Cremers, ) or oriented FAST and rotated BRIEF (ORB)‐SLAM (Mur‐Artal, Montiel, & Tardos, ).…”
Section: Popular Slam Approaches Available On Rosmentioning
Distributed as an open-source library since 2013, real-time appearance-based mapping (RTAB-Map) started as an appearance-based loop closure detection approach with memory management to deal with large-scale and long-term online operation. It then grew to implement simultaneous localization and mapping (SLAM) on various robots and mobile platforms. As each application brings its own set of constraints on sensors, processing capabilities, and locomotion, it raises the question of which SLAM approach is the most appropriate to use in terms of cost, accuracy, computation power, and ease of integration. Since most of SLAM approaches are either visual-or lidar-based, comparison is difficult. Therefore, we decided to extend RTAB-Map to support both visual and lidar SLAM, providing in one package a tool allowing users to implement and compare a variety of 3D and 2D solutions for a wide range of applications with different robots and sensors. This paper presents this extended version of RTAB-Map and its use in comparing, both quantitatively and qualitatively, a large selection of popular real-world datasets (e.g., KITTI, EuRoC, TUM RGB-D, MIT Stata Center on PR2 robot), outlining strengths, and limitations of visual and lidar SLAM configurations from a practical perspective for autonomous navigation applications. K E Y W O R D S perception, position estimation, SLAM
“…As universal 3D representations, 3D point clouds "can represent almost any type of physical object, site, landscape, geographic region, or infrastructure-at all scales and with any precision" as Richter (2018) states, who discusses algorithms and data structures for out-of-core processing, analysing, and classifying of 3D point clouds. To acquire 3D point clouds, various technologies can be applied including airborne or terrestrial laser scanning, mobile mapping, RGB-D cameras (Zollhöfer et al 2018), image matching, or multi-beam echo sounding.…”
Artificial intelligence (AI) is changing fundamentally the way how IT solutions are implemented and operated across all application domains, including the geospatial domain. This contribution outlines AI-based techniques for 3D point clouds and geospatial digital twins as generic components of geospatial AI. First, we briefly reflect on the term "AI" and outline technology developments needed to apply AI to IT solutions, seen from a software engineering perspective. Next, we characterize 3D point clouds as key category of geodata and their role for creating the basis for geospatial digital twins; we explain the feasibility of machine learning (ML) and deep learning (DL) approaches for 3D point clouds. In particular, we argue that 3D point clouds can be seen as a corpus with similar properties as natural language corpora and formulate a "Naturalness Hypothesis" for 3D point clouds. In the main part, we introduce a workflow for interpreting 3D point clouds based on ML/ DL approaches that derive domain-specific and application-specific semantics for 3D point clouds without having to create explicit spatial 3D models or explicit rule sets. Finally, examples are shown how ML/DL enables us to efficiently build and maintain base data for geospatial digital twins such as virtual 3D city models, indoor models, or building information models.
“…It still remains a challenge to obtain accurate depth for casual videos using portable devices. A survey on RGBD camera is written by Zollhöfer [Zollhöfer et al 2018]. Current high-end smartphones such as iPhone X supports depth measurement using dual-pixels and dedicated post-processing to generate smooth, edge-preserving depth maps.…”
In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos from larger cameras. This paper is about a new approach in the pursuit of cinematic focus for casual videography. We present a system that synthetically renders refocusable video from a deep DOF video shot with a smartphone, and analyzes
future
video frames to deliver context-aware autofocus for the current frame. To create refocusable video, we extend recent machine learning methods designed for still photography, contributing a new dataset for machine training, a rendering model better suited to cinema focus, and a filtering solution for temporal coherence. To choose focus accurately for each frame, we demonstrate autofocus that looks at upcoming video frames and applies AI-assist modules such as motion, face, audio and saliency detection. We also show that autofocus benefits from machine learning and a large-scale video dataset with focus annotation, where we use our RVR-LAAF GUI to create this sizable dataset efficiently. We deliver, for example, a shallow DOF video where the autofocus transitions onto each person
before
she begins to speak. This is impossible for conventional camera autofocus because it would require seeing into the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.