Visual Odometry Revisited: What Should Be Learnt?

Zhan, Huangying; Weerasekera, Chamara Saroj; Bian, Jia-Wang; Reid, Ian

doi:10.1109/icra40945.2020.9197374

Cited by 123 publications

(90 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Li et al [ 17 ] proposed the DeepSLAM, which uses a deep recurrent convolutional neural network (RCNN) to simultaneously generate pose estimate, depth map, and outlier rejection mask. Zhang et al [ 31 ] presented a monocular VO system that combines the geometry-based method and the unsupervised deep learning. Liu et al [ 32 ] presented a deep-learning-based RGB-D visual odometry system, which takes RGB image and depth image as input and outputs camera pose through a dual-stream structure of a recurrent convolutional neural network.…”

Section: Related Workmentioning

confidence: 99%

Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Zhang

et al. 2021

Sensors

View full text Add to dashboard Cite

Visual simultaneous localization and mapping (VSLAM) plays a vital role in the field of positioning and navigation. At the heart of VSLAM is visual odometry (VO), which uses continuous images to estimate the camera’s ego-motion. However, due to many assumptions of the classical VO system, robots can hardly operate in challenging environments. To solve this challenge, we combine the multiview geometry constraints of the classical stereo VO system with the robustness of deep learning to present an unsupervised pose correction network for the classical stereo VO system. The pose correction network regresses a pose correction that results in positioning error due to violation of modeling assumptions to make the classical stereo VO positioning more accurate. The pose correction network does not rely on the dataset with ground truth poses for training. The pose correction network also simultaneously generates a depth map and an explainability mask. Extensive experiments on the KITTI dataset show the pose correction network can significantly improve the positioning accuracy of the classical stereo VO system. Notably, the corrected classical stereo VO system’s average absolute trajectory error, average translational relative pose error, and average translational root-mean-square drift on a length of 100–800 m in the KITTI dataset is 13.77 cm, 0.038 m, and 1.08%, respectively. Therefore, the improved stereo VO system has almost reached the state of the art.

show abstract

Section: Related Workmentioning

confidence: 99%

Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Zhang

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Because these methods can predict both depth and camera pose, they are wildly used in robotics and selfdriving cars as a visual odometry (VO) system. Zhan et al investigated the end-to-end unsupervised depth-VO [39] and also integrated the depth with Perspective-n-Point (PnP) method to achieve high robustness [40].…”

Section: Related Workmentioning

confidence: 99%

Underwater Depth Estimation for Spherical Images

Cui

Jin

Kuang

et al. 2021

Journal of Robotics

View full text Add to dashboard Cite

This paper proposes a method for monocular underwater depth estimation, which is an open problem in robotics and computer vision. To this end, we leverage publicly available in-air RGB-D image pairs for underwater depth estimation in the spherical domain with an unsupervised approach. For this, the in-air images are style-transferred to the underwater style as the first step. Given those synthetic underwater images and their ground truth depth, we then train a network to estimate the depth. This way, our learning model is designed to obtain the depth up to scale, without the need of corresponding ground truth underwater depth data, which is typically not available. We test our approach on style-transferred in-air images as well as on our own real underwater dataset, for which we computed sparse ground truth depths data via stereopsis. This dataset is provided for download. Experiments with this data against a state-of-the-art in-air network as well as different artificial inputs show that the style transfer as well as the depth estimation exhibit promising performance.

show abstract

“…For different scenes with different sensors, these methods are difficult to transfer because sensors should be photometrically recalibrated, and also correct uncertainty map formation for matching points is required. Modern enhancements of these approaches are neural network methods that train in a self-supervised manner -D3VO [35], Deep-MatchVO [36], DF-VO [37]. All of them allow generating pose estimation of two neighbor frames a monocular camera and depth map.…”

Section: Visual-based Robot Localizationmentioning

confidence: 99%

“…For our study, the following SLAM metrics were taken: 1) Relative translation (T KIT T I , %) and rotation (R KIT T I , deg/m) errors which are introduced in the KITTI Odometry Benchmark [52], [53]. Because of short indoor tracks in the HISNav Dataset, we use distance subsequences of length (0.25, 0.5, 1, 2, 4, 8, 16, 20) meters instead of conventional (100,200,...,800) distances.…”

Section: B Indoor Robot Localization Using Visual Slammentioning

confidence: 99%

Real-Time Object Navigation With Deep Neural Networks and Hierarchical Reinforcement Learning

et al. 2020

View full text Add to dashboard Cite

In the last years, deep learning and reinforcement learning methods have significantly improved mobile robots in such fields as perception, navigation, and planning. But there are still gaps in applying these methods to real robots due to the low computational efficiency of recent neural network architectures and their poor adaptability to robotic experiments' realities. In this paper, we consider an important task in mobile robotics -navigation to an object using an RGB-D camera. We develop a new neural network framework for robot control that is fast and resistant to possible noise in sensors and actuators. We propose an original integration of semantic segmentation, mapping, localization, and reinforcement learning methods to improve the effectiveness of exploring the environment, finding the desired object, and quickly navigating to it. We created a new HISNav dataset based on the Habitat virtual environment, which allowed us to use simulation experiments to pre-train the model and then upload it to a real robot. Our architecture is adapted to work in a real-time environment and fully implements modern trends in this area.

show abstract

Visual Odometry Revisited: What Should Be Learnt?

Cited by 123 publications

References 37 publications

Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Stereo Visual Odometry Pose Correction through Unsupervised Deep Learning

Underwater Depth Estimation for Spherical Images

Real-Time Object Navigation With Deep Neural Networks and Hierarchical Reinforcement Learning

Contact Info

Product

Resources

About