Object Memory Transformer for Object Goal Navigation

Fukushima, Rui; Ota, Kei; Kanezaki, Asako; Sasaki, Yoko; Yoshiyasu, Yusuke

doi:10.1109/icra46639.2022.9812027

Cited by 18 publications

(6 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…VGM (Kwon et al, 2021) is constructed incrementally based on the similarities among the unsupervised representations of observed images, and these representations are learned from an unlabeled image dataset. OMT (Fukushima et al, 2022) uses transformer to salient objects stored in memory. DUET (Chen et al, 2022) proposes a joint long-term action planning to enable efficient exploration in global action space.…”

Section: A2 Memory Methodsmentioning

confidence: 99%

“…According to the definition of meta-ability and thinking, we summarize the current mainstream object navigation methods and identify their limitations. As shown in Figure 2, object navigation methods are divided into four categories: association methods (Dang et al, 2022a;Zhang et al, 2021), memory methods (Chen et al, 2022;Fukushima et al, 2022), deadlock-specialized methods (Du et al, 2020;Lin et al, 2021) and SLAM methods (Ravichandran et al, 2022;Liang et al, 2021). The different inductive biases introduced by these four types of methods determine which meta-abilities are emphasized and which are overlooked.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Dang¹,

Chen²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple thinking (MT) model that leverages distinct thinking to abstract various metaabilities. Our method decouples meta-abilities from three aspects: input, encoding, and reward while employing the multiple thinking collaboration (MTC) module to promote mutual cooperation between thinking. MAD introduces a novel qualitative and quantitative interpretability system for object navigation. Through extensive experiments on AI2-Thor and RoboTHOR, we demonstrate that our method outperforms stateof-the-art (SOTA) methods on both typical and zero-shot object navigation tasks.

show abstract

Section: A2 Memory Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Dang¹,

Chen²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In Object memory transformer [37], image and object representation are saved in the explicit memory every time step. Since TSGM only puts a new node into a graph memory based on the similarity between memory and current observations for both image and object graphs, it has less redundancy than [37]. [37] utilizes only the preceding T chunks of data are utilized.…”

Section: Appendices Appendix a Training Details And Experimental Sett...mentioning

confidence: 99%

“…Since TSGM only puts a new node into a graph memory based on the similarity between memory and current observations for both image and object graphs, it has less redundancy than [37]. [37] utilizes only the preceding T chunks of data are utilized. TSGM, on the other hand, makes use of all graph memory information derived from past exploration of the environment.…”

Section: Appendices Appendix a Training Details And Experimental Sett...mentioning

confidence: 99%

See 1 more Smart Citation

Topological Semantic Graph Memory for Image-Goal Navigation

Kim¹,

Kwon²,

Yoo³

et al. 2022

Preprint

View full text Add to dashboard Cite

A novel framework is proposed to incrementally collect landmarkbased graph memory and use the collected memory for image goal navigation. Given a target image to search, an embodied robot utilizes semantic memory to find the target in an unknown environment. In this paper, we present a topological semantic graph memory (TSGM), which consists of (1) a graph builder that takes the observed RGB-D image to construct a topological semantic graph, (2) a cross graph mixer module that takes the collected nodes to get contextual information, and (3) a memory decoder that takes the contextual memory as an input to find an action to the target. On the task of an image goal navigation, TSGM significantly outperforms competitive baselines by +5.0-9.0% on the success rate and +7.0-23.5% on SPL, which means that the TSGM finds efficient paths. Additionally, we demonstrate our method on a mobile robot in real-world image goal scenarios.

show abstract

ACT: Action-assoCiated and Target-Related Representations for Object Navigation

Wang,

Hu,

et al. 2024

MultiMedia Modeling

View full text Add to dashboard Cite

Object Memory Transformer for Object Goal Navigation

Cited by 18 publications

References 10 publications

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

Topological Semantic Graph Memory for Image-Goal Navigation

ACT: Action-assoCiated and Target-Related Representations for Object Navigation

Contact Info

Product

Resources

About