2017
DOI: 10.48550/arxiv.1704.03732
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Q-learning from Demonstrations

Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
50
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(50 citation statements)
references
References 3 publications
(3 reference statements)
0
50
0
Order By: Relevance
“…Reinforcement learning with demonstrations can exploit the strength of RL and IL and overcome their respective weaknesses, leading to wide usage in complex robotics controlling tasks. In such frameworks, demonstrations mainly function as an initialization tool to boost RL policies, and expect the following exploration process to find a better policy than that of the supervisor [20], [22], [23]. Aside from bootstrapping exploration in RL, demonstrations can be used to infer the reward function in reinforcement learning [24], which belongs to the branch of inverse reinforcement learning (IRL) and will not be covered in this study.…”
Section: Related Workmentioning
confidence: 99%
“…Reinforcement learning with demonstrations can exploit the strength of RL and IL and overcome their respective weaknesses, leading to wide usage in complex robotics controlling tasks. In such frameworks, demonstrations mainly function as an initialization tool to boost RL policies, and expect the following exploration process to find a better policy than that of the supervisor [20], [22], [23]. Aside from bootstrapping exploration in RL, demonstrations can be used to infer the reward function in reinforcement learning [24], which belongs to the branch of inverse reinforcement learning (IRL) and will not be covered in this study.…”
Section: Related Workmentioning
confidence: 99%
“…For tasks with sparse rewards (such as the insertion task considered in this paper), RL algorithms (e.g., DDPG [13]) converge slowly and are not data-efficient due to the difficulty to explore non-zero rewards. To address the problem, we may add demonstration data into the replay buffer [14,15,16,17]. However, these algorithms either need a large amount of demonstration data to balance the data distribution, or may still diverge due to the difficulty in exploration.…”
Section: Related Workmentioning
confidence: 99%
“…The HAT algorithm (Taylor et al, 2011) introduces an intermediate policy summarization step, in which the demonstrated data is translated into an approximate policy that is then used to bias exploration in a final RL stage. In Hester et al (2017), the policy is simultaneously trained on expert data and collected data, using a combination of supervised and temporal difference losses. In Salimans & Chen (2018), the RL agent is at the start of each episode reset to a state in the single demonstration.…”
Section: Demonstration-and Plan-based Reward Shapingmentioning
confidence: 99%