In the last decades, ego-motion estimation or visual odometry (VO) has received a considerable amount of attention from the robotic research community, mainly due to its central importance in achieving robust localization and, as a consequence, autonomy. Different solutions have been explored, leading to a wide variety of approaches, mostly grounded on geometric methodologies and, more recently, on data-driven paradigms. To guide researchers and practitioners in choosing the best VO method, different benchmark studies have been published. However, the majority of them compare only a small subset of the most popular approaches and, usually, on specific data sets or configurations. In contrast, in this work, we aim to provide a complete and thorough study of the most popular and best-performing geometric and data-driven solutions for VO. In our investigation, we considered several scenarios and environments, comparing the estimation accuracies and the role of the hyper-parameters of the approaches selected, and analyzing the computational resources they require. Experiments and tests are performed on different data sets (both publicly available and self-collected) and two different computational boards. The experimental results show pros and cons of the tested approaches under different perspectives. The geometric simultaneous localization and mapping methods are confirmed to be the best performing, while data-driven approaches show robustness with respect to nonideal conditions present in more challenging scenarios.
Simultaneous localization and mapping (SLAM) is one of the cornerstones of autonomous navigation systems in robotics and the automotive industry. Visual SLAM (V-SLAM), which relies on image features, such as keypoints and descriptors to estimate the pose transformation between consecutive frames, is a highly efficient and effective approach for gathering environmental information. With the rise of representation learning, feature detectors based on deep neural networks (DNNs) have emerged as an alternative to handcrafted solutions. This work examines the integration of sparse learned features into a state-of-the-art SLAM framework and benchmarks handcrafted and learning-based approaches by comparing the two methods through in-depth experiments. Specifically, we replace the ORB detector and BRIEF descriptor of the ORBSLAM3 pipeline with those provided by Superpoint, a DNN model that jointly computes keypoints and descriptors. Experiments on three publicly available datasets from different application domains were conducted to evaluate the pose estimation performance and resource usage of both solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.