GEN-SLAM: Generative Modeling for Monocular Simultaneous Localization and Mapping

Chakravarty, Punarjay; Narayanan, Praveen; Roussel, Tom

doi:10.1109/icra.2019.8793530

Cited by 24 publications

(13 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One difference between the two approaches is the that when VAE is used for perception, the resulting embedding is stochastic, i.e., the same input image fed to the encoder multiple times will result in different embeddings as they are sampled from the Gaussian; whereas in the vanilla AE this is not the case. Other approaches such as using shared latent spaces [15], [16] can also be considered in future studies.…”

Section: Related Work a Visual Perceptionmentioning

confidence: 99%

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

Balakrishnan,

Chakravarty,

Shrivastava

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Training robots to navigate diverse environments is a challenging problem as it involves the confluence of several different perception tasks such as mapping and localization, followed by optimal path-planning and control. Recently released photo-realistic simulators such as Habitat [1] allow for the training of networks that output control actions directly from perception: agents use Deep Reinforcement Learning (DRL) to regress directly from the camera image to a control output in an end-to-end fashion. This is data-inefficient and can take several days to train on a GPU. Our paper tries to overcome this problem by separating the training of the perception and control neural nets and increasing the path complexity gradually using a curriculum approach. Specifically, a pre-trained twin Variational AutoEncoder (VAE) [2] is used to compress RGBD (RGB & depth) sensing from an environment into a latent embedding, which is then used to train a DRLbased control policy. A*, a traditional path-planner is used as a guide for the policy and the distance between start and target locations is incrementally increased along the A* route, as training progresses. We demonstrate the efficacy of the proposed approach, both in terms of increased performance and decreased training times for the PointNav task in the Habitat simulation environment. This strategy of improving the training of direct-perception based DRL navigation policies is expected to hasten the deployment of robots of particular interest to industry such as co-bots on the factory floor and last-mile delivery robots. 2

show abstract

Section: Related Work a Visual Perceptionmentioning

confidence: 99%

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

Balakrishnan,

Chakravarty,

Shrivastava

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…4, the network contains two autoencoders that have the same structure. We use PyTorch framework to realize the autoencoder [21], which is composed of convolutional, fully connected and upsample layers. There is no pooling layer in the encoder part, and input size is reduced only by convolutional layers with stride=2.…”

Section: B Network Designmentioning

confidence: 99%

Cross Scene Prediction via Modeling Dynamic Correlation using Latent Space Shared Auto-Encoders

Zhao

2020

Preprint

View full text Add to dashboard Cite

Fig. 1: Can we cross-scene prediction via modeling the correlations of scene dynamics on unsynchronized history observations?Abstract-This work addresses on the following problem: given a set of unsynchronized history observations of two scenes that are correlative on their dynamic changes, the purpose is to learn a cross-scene predictor, so that with the observation of one scene, a robot can onlinely predict the dynamic state of another. A method is proposed to solve the problem via modeling dynamic correlation using latent space shared auto-encoders. Assuming that the inherent correlation of scene dynamics can be represented by shared latent space, where a common latent state is reached if the observations of both scenes are at an approximate time, a learning model is developed by connecting two auto-encoders through the latent space, and a prediction model is built by concatenating the encoder of the input scene with the decoder of the target one. Simulation datasets are generated imitating the dynamic flows at two adjacent gates of a campus, where the dynamic changes are triggered by a common working and teaching schedule. Similar scenarios can also be found at successive intersections on a single road, gates of a subway station, etc. Accuracy of cross-scene prediction is examined at various conditions of scene correlation and pairwise observations. Potentials of the proposed method are demonstrated by comparing with conventional end-to-end methods and linear predictions.

show abstract

“…These algorithms with a wide range of parameters and layers learn to manage feature extraction, propagation, and regularisation. However, the failures in generating contextualised features has probed the increased design of models, specifically targeting feature extraction and propagation capabilities in image inpainting [20,21] and other research domains [22,23,24]. Based on the limitations, we propose a novel image inpainting method, namely V-LinkNet, that integrates feature extraction, high-level information dissemination and feature propagation.…”

Section: Introductionmentioning

confidence: 99%

V-LinkNet: Learning Contextual Inpainting Across Latent Space of Generative Adversarial Network

Jam¹,

Kendrick²,

Drouard³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning methods outperform traditional methods in image inpainting. In order to generate contextual textures, researchers are still working to improve on existing methods and propose models that can extract, propagate, and reconstruct features similar to ground-truth regions. Furthermore, the lack of a high-quality feature transfer mechanism in deeper layers contributes to persistent aberrations on generated inpainted regions. To address these limitations, we propose the V-LinkNet cross-space learning strategy network. To improve learning on contextualised features, we design a loss model that employs both encoders. In addition, we propose a recursive residual transition layer (RSTL).The RSTL extracts high-level semantic information and propagates it down layers. Finally, we compare inpainting performance on the same face with different masks and on different faces with the same masks. To improve image inpainting reproducibility, we propose a standard protocol to overcome biases with various masks and images. We investigate the V-LinkNet components using experimental methods. Our result surpasses the state of the art when evaluated on the CelebA-HQ with the standard protocol. In addition, our model can generalise well when evaluated on Paris Street View, and Places2 datasets with the standard protocol.

show abstract

GEN-SLAM: Generative Modeling for Monocular Simultaneous Localization and Mapping

Cited by 24 publications

References 41 publications

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

Cross Scene Prediction via Modeling Dynamic Correlation using Latent Space Shared Auto-Encoders

V-LinkNet: Learning Contextual Inpainting Across Latent Space of Generative Adversarial Network

Contact Info

Product

Resources

About