Global Pose Estimation with an Attention-Based Recurrent Network

Parisotto, Emilio; Chaplot, Devendra Singh; Zhang, Jian; Salakhutdinov, Ruslan

doi:10.1109/cvprw.2018.00061

Cited by 79 publications

(48 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We harness CNNs to encode images into high-level features. Optical flow has been proved useful for estimating frame-to-frame ego-motion by lots of current works [22,[31][32][33]38]. We design the encoder based on the Flownet [6] which predicts optical flow between two images.…”

Section: Encodermentioning

confidence: 99%

See 1 more Smart Citation

Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry

Xue

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

105

View full text Add to dashboard Cite

Most previous learning-based visual odometry (VO) methods take VO as a pure tracking problem. In contrast, we present a VO framework by incorporating two additional components called Memory and Refining. The Memory component preserves global information by employing an adaptive and efficient selection strategy. The Refining component ameliorates previous results with the contexts stored in the Memory by adopting a spatial-temporal attention mechanism for feature distilling. Experiments on the KITTI and TUM-RGBD benchmark datasets demonstrate that our method outperforms state-of-the-art learning-based methods by a large margin and produces competitive results against classic monocular VO approaches. Especially, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic VO algorithms tend to fail.

show abstract

Section: Encodermentioning

confidence: 99%

“…The Refining module ameliorates previous outputs by employing a spatialtemporal feature reorganization mechanism. 22,[31][32][33]. Due to the high dimensionality of depth maps, the number of frames is commonly limited to no more than 5.…”

Section: Introductionmentioning

confidence: 99%

Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry

Xue

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

105

View full text Add to dashboard Cite

show abstract

“…Zhu et al (2017) also peformed visual navigation via deep reinforcement learning but explicitly left out mapping the environment. Parisotto et al (2018) used a neural architecture for localisation; a graph of observations can be seen as a map, which is then used to iteratively refine a pose trajectory.…”

Section: Related Workmentioning

confidence: 99%

Approximate Bayesian Inference in Spatial Environments

Mirchev¹,

Kayalibay²,

Soelch³

et al. 2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

We propose to learn a stochastic recurrent model to solve the problem of simultaneous localisation and mapping (SLAM). Our model is a deep variational Bayes filter augmented with a latent global variable---similar to an external memory component---representing the spatially structured environment. Reasoning about the pose of an agent and the map of the environment is then naturally expressed as posterior inference in the resulting generative model. We evaluate the method on a set of randomly generated mazes which are traversed by an agent equipped with laser range finders. Path integration based on an accurate motion model is consistently outperformed, and most importantly, drift practically eliminated. Our approach inherits favourable properties from neural networks, such as differentiability, flexibility and the ability to train components either in isolation or end-to-end.

show abstract

“…Neural solutions for the SLAM problem are also being developed (e.g. [7]), but their architectures are far from purely discriminative DCNNs.…”

Section: Frameworkmentioning

confidence: 99%

Vision System for AGI: Problems and Directions

Potapov

Rodionov

Peterson

et al. 2018

Artificial General Intelligence

View full text Add to dashboard Cite

What frameworks and architectures are necessary to create a vision system for AGI? In this paper, we propose a formal model that states the task of perception within AGI. We show the role of discriminative and generative models in achieving efficient and general solution of this task, thus specifying the task in more detail. We discuss some existing generative and discriminative models and demonstrate their insufficiency for our purposes. Finally, we discuss some architectural dilemmas and open questions.

show abstract

Global Pose Estimation with an Attention-Based Recurrent Network

Cited by 79 publications

References 44 publications

Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry

Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry

Approximate Bayesian Inference in Spatial Environments

Vision System for AGI: Problems and Directions

Contact Info

Product

Resources

About