2022
DOI: 10.48550/arxiv.2202.04628
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Abstract: A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(11 citation statements)
references
References 10 publications
0
11
0
Order By: Relevance
“…The idea here is to utilize expert's demonstrations to guide the standard learning procedure in RL algorithms [7], [8], [18], [34], [35]. Authors in [21], [36] proposed to include expert demonstrations to replay buffers and utilize them to accelerate the learning.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
See 4 more Smart Citations
“…The idea here is to utilize expert's demonstrations to guide the standard learning procedure in RL algorithms [7], [8], [18], [34], [35]. Authors in [21], [36] proposed to include expert demonstrations to replay buffers and utilize them to accelerate the learning.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
“…But as mentioned previously, the major drawback here is also their dependence upon the availability of demonstrations, which are hard to get in practice for continuous control problems. For instance, expert's demonstrations in [8] are obtained by running TRPO with dense rewards and then later used to train a policy with sparse rewards in the same environment. This could be difficult to achieve in practice.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
See 3 more Smart Citations