2021
DOI: 10.1049/cit2.12043
|View full text |Cite
|
Sign up to set email alerts
|

Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning

Abstract: Here, the challenges of sample efficiency and navigation performance in deep reinforcement learning for visual navigation are focused and a deep imitation reinforcement learning approach is proposed. Our contributions are mainly three folds: first, a framework combining imitation learning with deep reinforcement learning is presented, which enables a robot to learn a stable navigation policy faster in the target-driven navigation task. Second, the surrounding images is taken as the observation instead of seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(19 citation statements)
references
References 29 publications
0
19
0
Order By: Relevance
“…Following the previous works (Mirowski et al 2017;Fang et al 2021), we treat this task as a reinforcement learning problem and utilize the asynchronous advantage actor-critic (A3C) algorithm (Mnih et al 2016). However, in the search thinking network, the complex multi-head attention calculations are difficult to directly learn by the reinforcement learning (Du, Yu, and Zheng 2021); thus, we use imitation learning to pretrain the search thinking network.…”
Section: Policy Learningmentioning
confidence: 99%
“…Following the previous works (Mirowski et al 2017;Fang et al 2021), we treat this task as a reinforcement learning problem and utilize the asynchronous advantage actor-critic (A3C) algorithm (Mnih et al 2016). However, in the search thinking network, the complex multi-head attention calculations are difficult to directly learn by the reinforcement learning (Du, Yu, and Zheng 2021); thus, we use imitation learning to pretrain the search thinking network.…”
Section: Policy Learningmentioning
confidence: 99%
“…Previous works [36,38] [7,20], we treat this task as a reinforcement learning problem and utilize the asynchronous advantage actor-critic (A3C) algorithm [22], which applies policy gradients to assist the agent in choosing an appropriate action 𝑎 𝑡 in the high-dimensional action space 𝐴. In accordance with the done reminder operation presented in [41], when the agent detects the target, we use the target detection confidence to explicitly enhance the probability of the 𝐷𝑜𝑛𝑒 action in the action domain 𝐴 𝑡 ∈ R 1×6 .…”
Section: Policy Learningmentioning
confidence: 99%
“…the well known function approximation ability of deep architectures made them a great choice for regressing various RL functions (that we discuss in the next section), one of the earliest implementation is TD-Gammon, neural network that reached champion-level performance in Backgammon decades ago [16]. Current methods, address the highly complex domain of inputs such as images and videos [2,17,18,19,20]. One of the key components of RL is Q-function, van Hasselt showed that the single estimator in Q-learning is suffering from over-estimation, hence a new algorithm with double estimator was proposed that improved the learning process immensely [21].…”
Section: Literature Reviewmentioning
confidence: 99%