Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Rengarajan, Desik; Vaidya, Gargi; Sarvesh, Akshay; Kalathil, Dileep; Shakkottai, Srinivas

doi:10.48550/arxiv.2202.04628

Cited by 3 publications

(11 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea here is to utilize expert's demonstrations to guide the standard learning procedure in RL algorithms [7], [8], [18], [34], [35]. Authors in [21], [36] proposed to include expert demonstrations to replay buffers and utilize them to accelerate the learning.…”

Section: Learning From Demonstrationmentioning

confidence: 99%

“…But as mentioned previously, the major drawback here is also their dependence upon the availability of demonstrations, which are hard to get in practice for continuous control problems. For instance, expert's demonstrations in [8] are obtained by running TRPO with dense rewards and then later used to train a policy with sparse rewards in the same environment. This could be difficult to achieve in practice.…”

Section: Learning From Demonstrationmentioning

confidence: 99%

“…Reward engineering for robotic tasks is difficult due to complex state space representations and usually requires manually-designed perception systems of the environment [5]. Hence, it makes sense to work directly with naturally specified sparse rewards [6]- [8]. For example, it is much easier to specify a binary reward (1 for successful completion of a task and 0 otherwise) than to come up with a dense reward structure.…”

Section: Introductionmentioning

confidence: 99%

“…For instance, authors in [19] induce such behaviors via intrinsic curiosity, and [20] utilizes information to motivate the exploration. Another line of work utilizes expert's demonstrations to learn effectively in sparse reward environments [7], [8], [14], [21]. The main idea here is to either use available demonstration to clone an expert's behavior (imitation learning) or just utilize demonstrations to provide additional rewards to guide the exploration [7], [8].…”

Section: Introductionmentioning

confidence: 99%

“…Another line of work utilizes expert's demonstrations to learn effectively in sparse reward environments [7], [8], [14], [21]. The main idea here is to either use available demonstration to clone an expert's behavior (imitation learning) or just utilize demonstrations to provide additional rewards to guide the exploration [7], [8]. But the major limitation of these approaches depends on the quality of the expert demonstrations.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

Chakraborty¹,

Bedi²,

Koppel³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either reward shaping or expert demonstrations for the sparse reward environment. However, obtaining high-quality demonstrations is quite expensive and sometimes even impossible. We propose a heavytailed policy parametrization along with a modified momentumbased policy gradient tracking scheme (HT-SPG) to induce a stable exploratory behavior to the algorithm. The proposed algorithm does not require access to expert demonstrations. We test the performance of HT-SPG on various benchmark tasks of continuous control with sparse rewards such as 1D Mario, Pathological Mountain Car, Sparse Pendulum in OpenAI Gym, and Sparse MuJoCo environments (Hopper-v2). We show consistent performance improvement across all tasks in terms of high average cumulative reward. HT-SPG also demonstrates improved convergence speed with minimum samples, thereby emphasizing the sample efficiency of our proposed algorithm.

show abstract

Section: Learning From Demonstrationmentioning

confidence: 99%

Section: Learning From Demonstrationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

Chakraborty¹,

Bedi²,

Koppel³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey

Majid

Saaybi

François-Lavet

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) and evolution strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist. To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided. After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects, such as scalability, exploration, adaptation to dynamic environments, and multiagent learning. Current research challenges are also discussed, including sample efficiency, exploration versus exploitation, dealing with sparse rewards, and learning to plan. Then, the benefits of hybrid algorithms that combine DRL and ESs are highlighted.

show abstract

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Dawood¹,

Dengler²,

Jorge³

et al. 2022

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) has recently proven great success in various domains. Yet, the design of the reward function requires detailed domain expertise and tedious finetuning to ensure that agents are able to learn the desired behaviour. Using a sparse reward conveniently mitigates these challenges. However, the sparse reward represents a challenge on its own, often resulting in unsuccessful training of the agent. In this paper, we therefore address the sparse reward problem in RL. Our goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains. Hence, we propose to use model predictive control (MPC) as an experience source for training RL agents in sparse reward environments. Without the need for reward shaping, we successfully apply our approach in the field of mobile robot navigation both in simulation and real-world experiments with a Kuboki Turtlebot 2. We furthermore demonstrate great improvement over pure RL algorithms in terms of success rate as well as number of collisions and timeouts. Our experiments show that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.

show abstract

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Cited by 3 publications

References 10 publications

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Contact Info

Product

Resources

About