2022
DOI: 10.48550/arxiv.2203.14708
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Object Memory Transformer for Object Goal Navigation

Abstract: Fig. 1: a) Object Memory Transformer (OMT) for efficient indoor navigation to find the target object. Here, the agent looks for "Pillow" on the sofa, which is in the next room and the agent cannot observe it from the start position. Even in this complex case, OMT can take an efficient path to the target shown as a trajectory in the blue dots in (b) by exploiting long-term visual cues while taking into account the relevance of the feature at each time step. Specifically, OMT stores long-term history of observed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
(29 reference statements)
0
2
0
Order By: Relevance
“…Our DAT method is compared with three categories of relevant SOTA methods, as shown in Table 3 Methods with long-term memory. These methods theoretically depend on historical information to model environments more clearly; however, methods such as OMT (Fukushima et al 2022) store overcomplicated features, increasing the difficulty of network learning. Therefore, the current memory modules do not exert their full strength.…”
Section: Comparisons With the State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our DAT method is compared with three categories of relevant SOTA methods, as shown in Table 3 Methods with long-term memory. These methods theoretically depend on historical information to model environments more clearly; however, methods such as OMT (Fukushima et al 2022) store overcomplicated features, increasing the difficulty of network learning. Therefore, the current memory modules do not exert their full strength.…”
Section: Comparisons With the State-of-the-art Methodsmentioning
confidence: 99%
“…Each node feature m ∈ R 1×9 is concatenated by three parts: the target bounding box, the target confidence and the agent's state (position and angle). This target-oriented method of storing information about visited nodes uses 400× less storage than the methods used in previous works (Fukushima et al 2022;Zhu et al 2021). Since the agent cannot obtain its own absolute position and orientation in unknown environments, the stored coordinates take the starting position as the origin and the starting orientation as the coordinate axis.…”
Section: Navigation Thinking Networkmentioning
confidence: 99%