2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793530
|View full text |Cite
|
Sign up to set email alerts
|

GEN-SLAM: Generative Modeling for Monocular Simultaneous Localization and Mapping

Abstract: We present a Deep Learning based system for the twin tasks of localization and obstacle avoidance essential to any mobile robot. Our system learns from conventional geometric SLAM, and outputs, using a single camera, the topological pose of the camera in an environment, and the depth map of obstacles around it. We use a CNN to localize in a topological map, and a conditional VAE to output depth for a camera image, conditional on this topological location estimation. We demonstrate the effectiveness of our mono… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(13 citation statements)
references
References 41 publications
0
12
0
Order By: Relevance
“…One difference between the two approaches is the that when VAE is used for perception, the resulting embedding is stochastic, i.e., the same input image fed to the encoder multiple times will result in different embeddings as they are sampled from the Gaussian; whereas in the vanilla AE this is not the case. Other approaches such as using shared latent spaces [15], [16] can also be considered in future studies.…”
Section: Related Work a Visual Perceptionmentioning
confidence: 99%
“…One difference between the two approaches is the that when VAE is used for perception, the resulting embedding is stochastic, i.e., the same input image fed to the encoder multiple times will result in different embeddings as they are sampled from the Gaussian; whereas in the vanilla AE this is not the case. Other approaches such as using shared latent spaces [15], [16] can also be considered in future studies.…”
Section: Related Work a Visual Perceptionmentioning
confidence: 99%
“…4, the network contains two autoencoders that have the same structure. We use PyTorch framework to realize the autoencoder [21], which is composed of convolutional, fully connected and upsample layers. There is no pooling layer in the encoder part, and input size is reduced only by convolutional layers with stride=2.…”
Section: B Network Designmentioning
confidence: 99%
“…These algorithms with a wide range of parameters and layers learn to manage feature extraction, propagation, and regularisation. However, the failures in generating contextualised features has probed the increased design of models, specifically targeting feature extraction and propagation capabilities in image inpainting [20,21] and other research domains [22,23,24]. Based on the limitations, we propose a novel image inpainting method, namely V-LinkNet, that integrates feature extraction, high-level information dissemination and feature propagation.…”
Section: Introductionmentioning
confidence: 99%