Visual Graph Memory with Unsupervised Representation for Visual Navigation

Kwon, Obin; Kim, Nuri; Choi, Yoon-Kyung; Yoo, Hwiyeon; Park, Jeong-Ho; Oh, Songhwai

doi:10.1109/iccv48922.2021.01559

Cited by 27 publications

(19 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our setup differs from recent methods in ImageNav where panoramic 360°FoV sensors are required [15,39,46]. Here, we consider a standard 90°FoV for the agent's view [72].…”

Section: Semantic Search Policy For Image Goalsmentioning

confidence: 99%

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Al-Halah¹,

Ramakrishnan²,

Grauman³

2022

Preprint

View full text Add to dashboard Cite

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, RoomNav, ViewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.

show abstract

“…Our setup differs from recent methods in ImageNav where panoramic 360°FoV sensors are required [15,39,46]. Here, we consider a standard 90°FoV for the agent's view [72].…”

Section: Semantic Search Policy For Image Goalsmentioning

confidence: 99%

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Al-Halah¹,

Ramakrishnan²,

Grauman³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…a) Navigation approaches: Traditional approaches to visual navigation focus on building a 3D metric map of the environment [18], [3] before using that representation for any downstream navigation tasks, which does not lend itself favourably for task-driven learnable representations that can capture contextual cues. The recent introduction of largescale indoor environments and simulators [7], [17], [6] has fuelled a slew of learning based methods for indoor navigation tasks [1] such as point-goal [10], [19], [20], [21], [22], object-goal [23], [24], [25], [26], [27], and image-goal [8], [28], [29]. Modular approaches which incorporate explicit or learned map representations [11], [23], [25] have shown to outperform end-to-end methods on tasks such as object-goal, however, this is not currently the case for the point-goal [10], [20] task.…”

Section: Related Workmentioning

confidence: 99%

Uncertainty-driven Planner for Exploration and Navigation

Georgakis¹,

Bucher²,

Arapin³

et al. 2022

Preprint

View full text Add to dashboard Cite

We consider the problems of exploration and point-goal navigation in previously unseen environments, where the spatial complexity of indoor scenes and partial observability constitute these tasks challenging. We argue that learning occupancy priors over indoor maps provides significant advantages towards addressing these problems. To this end, we present a novel planning framework that first learns to generate occupancy maps beyond the field-of-view of the agent, and second leverages the model uncertainty over the generated areas to formulate path selection policies for each task of interest. For point-goal navigation the policy chooses paths with an upper confidence bound policy for efficient and traversable paths, while for exploration the policy maximizes model uncertainty over candidate paths. We perform experiments in the visually realistic environments of Matterport3D using the Habitat simulator and demonstrate: 1) Improved results on exploration and map quality metrics over competitive methods, and 2) The effectiveness of our planning module when paired with the stateof-the-art DD-PPO method for the point-goal navigation task.

show abstract

“…In other words, the authors completely replaced the RNN with an attention mechanism, which shows good performance in long-term tasks. Visual graph memory (VGM) [18] constructs a topological visual memory to navigate an environment while not utilizing the landmark information of the scene. No RL no simulator (NRNS) [27] uses models trained using image input without interaction with the simulator.…”

Section: Related Workmentioning

confidence: 99%

“…A crucial ingredient for successful visual navigation is to construct a memory, which can represent the structure of the environment along with compact visual features for representing high-dimensional visual inputs. A metric-map memory [5,17] created with SLAM, and a graph memory [8,9,[18][19][20] with nodes and edges are the two standard memory construction approaches for navigation algorithms. Even though navigation systems that use metric maps produce powerful results with exact localization and mapping, it is not practical because the navigation agent is susceptible to sensory noises.…”

Section: Introductionmentioning

confidence: 99%

“…The topological map, which represents geometric properties and spatial relations of places in the form of a graph, is proposed to construct a map without accurate mapping. Previous visual navigation methods [9,18] with topological map exploit image features as nodes and edges connecting the nodes in proximity. Since a node indicates a location, the robot's position can be estimated by the nodes in the topological map.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Topological Semantic Graph Memory for Image-Goal Navigation

Kim¹,

Kwon²,

Yoo³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

A novel framework is proposed to incrementally collect landmarkbased graph memory and use the collected memory for image goal navigation. Given a target image to search, an embodied robot utilizes semantic memory to find the target in an unknown environment. In this paper, we present a topological semantic graph memory (TSGM), which consists of (1) a graph builder that takes the observed RGB-D image to construct a topological semantic graph, (2) a cross graph mixer module that takes the collected nodes to get contextual information, and (3) a memory decoder that takes the contextual memory as an input to find an action to the target. On the task of an image goal navigation, TSGM significantly outperforms competitive baselines by +5.0-9.0% on the success rate and +7.0-23.5% on SPL, which means that the TSGM finds efficient paths. Additionally, we demonstrate our method on a mobile robot in real-world image goal scenarios.

show abstract

Visual Graph Memory with Unsupervised Representation for Visual Navigation

Cited by 27 publications

References 20 publications

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Uncertainty-driven Planner for Exploration and Navigation

Topological Semantic Graph Memory for Image-Goal Navigation

Contact Info

Product

Resources

About