2019
DOI: 10.1007/s10994-019-05849-4
|View full text |Cite
|
Sign up to set email alerts
|

Active deep Q-learning with demonstration

Abstract: Recent research has shown that although Reinforcement Learning (RL) can benefit from expert demonstration, it usually takes considerable efforts to obtain enough demonstration. The efforts prevent training decent RL agents with expert demonstration in practice. In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training. Under the framework, we pr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 9 publications
0
15
0
Order By: Relevance
“…The learner may receive feedback in the form of sequences of actions planned by a teacher [10]. Uncertainty-based query was used in [14] but is limited to DQN [53], limiting the possible applications of this method. In contrast, our method can be combined with most of off-policy RL algorithms; and introduces the idea of goal-driven demonstrations.…”
Section: Learning From Interactive Human Feedbackmentioning
confidence: 99%
“…The learner may receive feedback in the form of sequences of actions planned by a teacher [10]. Uncertainty-based query was used in [14] but is limited to DQN [53], limiting the possible applications of this method. In contrast, our method can be combined with most of off-policy RL algorithms; and introduces the idea of goal-driven demonstrations.…”
Section: Learning From Interactive Human Feedbackmentioning
confidence: 99%
“…Deep RL is a considerably new domain for the action advising studies. [19] introduced a novel LfD setup in which the demonstrations dataset is built interactively as in action advising. To do so, they employed uncertainty estimation capable models with LfD loss terms integrated in the learning stage.…”
Section: Related Workmentioning
confidence: 99%
“…In [11], an uncertainty-based advice collection strategy was proposed. According to this, the student adopts a multi-headed neural network architecture to access epistemic uncertainty estimations as in [19]. Later, [12] further studied the state novelty-based idea of [10] to devise a better studentinitiated advice collection method.…”
Section: Related Workmentioning
confidence: 99%
“…Based on deep Q-learning, researchers [26][27] proposed algorithms to help the agent formulate a more useful strategy when playing video games. Chen [28] proposed an algorithm that dynamically estimates the uncertainty of recent states, and utilizes the queried demonstration data by optimizing a supervised loss, in addition to the usual DQN loss [29]. Todd [30] presented an algorithm, Deep Q-learning from Demonstrations (DQfD), which leverages small sets of demonstration data to accelerate the learning process and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.…”
Section: Related Workmentioning
confidence: 99%