Achieving accurate, high-rate pose estimates from proprioceptive and/or exteroceptive measurements is the first step in the development of navigation algorithms for agile mobile robots such as Unmanned Aerial Vehicles (UAVs). In this paper, we propose a decoupled Graph-Optimization based Multi-Sensor Fusion approach (GOMSF) that combines generic 6 Degree-of-Freedom (DoF) visual-inertial odometry poses and 3 DoF globally referenced positions to infer the global 6 DoF pose of the robot in real-time. Our approach casts the fusion as a real-time alignment problem between the local base frame of the visual-inertial odometry and the global base frame. The alignment transformation that relates these coordinate systems is continuously updated by optimizing a sliding window pose graph containing the most recent robot's states. We evaluate the presented pose estimation method on both simulated data and large outdoor experiments using a small UAV that is capable to run our system onboard. Results are compared against different state-of-the-art sensor fusion frameworks, revealing that the proposed approach is substantially more accurate than other decoupled fusion strategies. We also demonstrate comparable results in relation with a finely tuned Extended Kalman Filter that fuses visual, inertial and GPS measurements in a coupled way and show that our approach is generic enough to deal with different input sources in a straightforward manner.Videohttps://youtu.be/GIZNSZ2soL8
On the pursuit of autonomous flying robots, the scientific community has been developing onboard real-time algorithms for localisation, mapping and planning. Despite recent progress, the available solutions still lack accuracy and robustness in many aspects. While mapping for autonomous cars had a substantive boost using deep-learning techniques to enhance LIDAR measurements using image-based depth completion, the large viewpoint variations experienced by aerial vehicles are still posing major challenges for learning-based mapping approaches. In this paper, we propose a depth completion and uncertainty estimation approach that better handles the challenges of aerial platforms, such as large viewpoint and depth variations, and limited computing resources. The core of our method is a novel compact network that performs both depth completion and confidence estimation using an image-guided approach. Realtime performance onboard a GPU suitable for small flying robots is achieved by sharing deep features between both tasks. Experiments demonstrate that our network outperforms the state-of-the-art in depth completion and uncertainty estimation for single-view methods on mobile GPUs. We further present a new photorealistic aerial depth completion dataset that exhibits more challenging depth completion scenarios than the established indoor and car driving datasets. The dataset includes an opensource, visual-inertial UAV simulator for photo-realistic data generation. Our results show that our network trained on this dataset can be directly deployed on real-world outdoor aerial public datasets without fine-tuning or style transfer.
In this work, we present a perception-aware pathplanning pipeline for Unmanned Aerial Vehicles (UAVs) for navigation in challenging environments. The objective is to reach a given destination safely and accurately by relying on monocular camera-based state estimators, such as Keyframebased Visual-Inertial Odometry (VIO) systems. Motivated by the recent advances in semantic segmentation using deep learning, our path-planning architecture takes into consideration the semantic classes of parts of the scene that are perceptually more informative than others. This work proposes a planning strategy capable of avoiding both texture-less regions and problematic areas, such as lakes and oceans, that may cause large drift or failures in the robot's pose estimation, by using the semantic information to compute the next best action with respect to perception quality. We design a hierarchical planner, composed of an A * path-search step followed by B-Spline trajectory optimization. While the A * steers the UAV towards informative areas, the optimizer keeps the most promising landmarks in the camera's field of view. We extensively evaluate our approach in a set of photo-realistic simulations, showing a remarkable improvement with respect to the state-of-the-art in active perception.
Abstract-With society and industry pushing for robotassisted systems to automate cumbersome tasks, such as inspection and maintenance, a vast amount of research effort has been dedicated to relevant technologies. Right at the forefront are small Unmanned Aerial Vehicles (UAVs) equipped with onboard cameras, recently demonstrating that vision-based autonomous flights without reliance on GPS are possible, sparking great interest in a plethora of areas. Current solutions, however, still lack in portability and generality struggling to perform outside the controlled laboratory environment, with onboard robotic perception constituting the biggest impediment.Driven by the need for real-time denser scene estimation, in this work we present a dramatically low-computation approach enabling estimation of the immediate surroundings of a UAV using the inertial and visual cues from a single onboard camera. Instead of following the recent trend towards dense scene reconstruction, we trade detail of reconstruction for efficiency of estimation, albeit without compromising accuracy. We present results against scene ground truth obtained by a millimetreprecise laser scanner which we make publicly available together with our code. The ETHZ CAB Building dataset contains the ground-truth and visual-inertial data captured from both handheld and flying setups. CODE AND DATASETThe video, dataset and code related to this work are available at: http://www.v4rl.ethz.ch/ research/datasets-code.html
Place recognition is an essential capability for robotic autonomy. While ground robots observe the world from generally similar viewpoints over repeated visits, other robots, such as small aircraft, experience far more different viewpoints, requiring place recognition for images captured from very wide baselines. While traditional feature-based methods fail dramatically under extreme viewpoint changes, deep learning approaches demand heavy runtime processing. Driven by the need for cheaper alternatives able to run on computationally restricted platforms, such as small aircraft, this work proposes a novel real-time pipeline employing depth-completion on sparse feature maps that are anyway computed during robot localization and mapping, to enable place recognition at extreme viewpoint changes. The proposed approach demonstrates unprecedented precision-recall rates on challenging benchmarking and own synthetic and real datasets with up to 45 • difference in viewpoints. In particular, our synthetic datasets are, to the best of our knowledge, the first to isolate the challenge of viewpoint changes for place recognition, addressing a crucial gap in the literature. All of the new datasets are publicly available to aid benchmarking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.