Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning

Fang, Qiang; Xu, Xin; Wang, Xitong; Zeng, Yujun

doi:10.1049/cit2.12043

Cited by 31 publications

(19 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following the previous works (Mirowski et al 2017;Fang et al 2021), we treat this task as a reinforcement learning problem and utilize the asynchronous advantage actor-critic (A3C) algorithm (Mnih et al 2016). However, in the search thinking network, the complex multi-head attention calculations are difficult to directly learn by the reinforcement learning (Du, Yu, and Zheng 2021); thus, we use imitation learning to pretrain the search thinking network.…”

Section: Policy Learningmentioning

confidence: 99%

Search for or Navigate to? Dual Adaptive Thinking for Object Navigation

Dang¹,

Wang²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Search for" or "Navigate to"? When we find a specific object in an unknown environment, the two choices always arise in our subconscious mind. Before we see the target, we search for the target based on prior experience. After we have located the target, we remember the target location and navigate to this location. However, recent object navigation methods almost only consider using object association to enhance the "search for" phase while neglect the importance of the "navigate to" phase. Therefore, this paper proposes a dual adaptive thinking (DAT) method that flexibly adjusts the thinking strategies in different navigation stages. Dual thinking includes both search thinking according to the object association ability and navigation thinking according to the target location ability. To make the navigation thinking more effective, we design a target-oriented memory graph (TOMG) that stores historical target information and a target-aware multi-scale aggregator (TAMSA) that encodes the relative position of the target. We assess our methods on the AI2-Thor dataset. Compared with state-of-the-art (SOTA) methods, our approach achieves 10.8%, 21.5% and 15.7% increases in the success rate (SR), success weighted by path length (SPL) and success weighted by navigation efficiency (SNE), respectively.

show abstract

Section: Policy Learningmentioning

confidence: 99%

Search for or Navigate to? Dual Adaptive Thinking for Object Navigation

Dang¹,

Wang²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Previous works [36,38] [7,20], we treat this task as a reinforcement learning problem and utilize the asynchronous advantage actor-critic (A3C) algorithm [22], which applies policy gradients to assist the agent in choosing an appropriate action 𝑎 𝑡 in the high-dimensional action space 𝐴. In accordance with the done reminder operation presented in [41], when the agent detects the target, we use the target detection confidence to explicitly enhance the probability of the 𝐷𝑜𝑛𝑒 action in the action domain 𝐴 𝑡 ∈ R 1×6 .…”

Section: Policy Learningmentioning

confidence: 99%

Unbiased Directed Object Attention Graph for Object Navigation

Dang¹,

Shi²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Object navigation tasks require agents to locate specific objects in unknown environments based on visual information. Previously, graph convolutions were used to implicitly explore the relationships between objects. However, due to differences in visibility among objects, it is easy to generate biases in object attention. Thus, in this paper, we propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects, thereby reducing the object attention bias. In particular, we use the DOA graph to perform unbiased adaptive object attention (UAOA) on the object features and unbiased adaptive image attention (UAIA) on the raw images, respectively. To distinguish features in different branches, a concise adaptive branch energy distribution (ABED) method is proposed. We assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art (SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate (SR), success weighted by path length (SPL) and success weighted by action efficiency (SAE), respectively. CCS CONCEPTS• Computing methodologies → Vision for robotics.

show abstract

“…the well known function approximation ability of deep architectures made them a great choice for regressing various RL functions (that we discuss in the next section), one of the earliest implementation is TD-Gammon, neural network that reached champion-level performance in Backgammon decades ago [16]. Current methods, address the highly complex domain of inputs such as images and videos [2,17,18,19,20]. One of the key components of RL is Q-function, van Hasselt showed that the single estimator in Q-learning is suffering from over-estimation, hence a new algorithm with double estimator was proposed that improved the learning process immensely [21].…”

Section: Literature Reviewmentioning

confidence: 99%

Maximum Entropy Dueling Network Architecture in Atari Domain

Nadali¹,

Ebadzadeh²

2021

Preprint

View full text Add to dashboard Cite

In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.

show abstract

Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning

Cited by 31 publications

References 29 publications

Search for or Navigate to? Dual Adaptive Thinking for Object Navigation

Search for or Navigate to? Dual Adaptive Thinking for Object Navigation

Unbiased Directed Object Attention Graph for Object Navigation

Maximum Entropy Dueling Network Architecture in Atari Domain

Contact Info

Product

Resources

About