Plan-based reward shaping for reinforcement learning

Grzes, Marek; Kudenko, Daniel

doi:10.1109/is.2008.4670492

Cited by 50 publications

(72 citation statements)

References 7 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is, however, often not the case in many practical applications. In many domains, heuristic knowledge can be easily identified by the designer of the system [24] or acquired using reasoning or learning [12]. In the area of single-agent reinforcement learning, potential-based reward shaping has been proven to be a principled and theoretically correct method of incorporating heuristic knowledge into an agent [21].…”

Section: Introductionmentioning

confidence: 99%

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

Devlin

Kudenko

Grzes

2011

Advs. Complex Syst.

Self Cite

View full text Add to dashboard Cite

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.

show abstract

Section: Introductionmentioning

confidence: 99%

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

Devlin

Kudenko

Grzes

2011

Advs. Complex Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Lastly, planning has been combined with RL through reward shaping [8,14]. Reward shaping is a technique to hasten Reinforcement Learning when the reward is sparse, and the agent has to execute a long sequence of actions before getting any feedback about its choices.…”

Section: Related Workmentioning

confidence: 99%

“…The reward function is enriched by adding a term which provides feedback for intermediate states, helping guide the agent towards the goal. Grzes and Kudenko proposed a method [14] in which the agent computes a plan on a STRIPS representation of the domain, and uses it to define a shaping function. The function guides the agent along the plan, helping it find the goal sooner.…”

Section: Related Workmentioning

confidence: 99%

A synthesis of automated planning and reinforcement learning for efficient, robust decision-making

Leonetti

Iocchi

Stone

2016

Artificial Intelligence

View full text Add to dashboard Cite

Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.

show abstract

“…A related approach to learning the potential function for reward shaping was investigated by the authors of this paper in Grzes and Kudenko (2008b). In this case, symbolic knowledge which is represented in the form of STRIPS operators is used to create a high level symbolic plan (Ghallab et al, 2004).…”

Section: Related Workmentioning

confidence: 99%

Online learning of shaping rewards in reinforcement learning

Grze¹,

Kudenko²

2010

Neural Networks

Self Cite

View full text Add to dashboard Cite

a b s t r a c tPotential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.

show abstract

Plan-based reward shaping for reinforcement learning

Cited by 50 publications

References 7 publications

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

A synthesis of automated planning and reinforcement learning for efficient, robust decision-making

Online learning of shaping rewards in reinforcement learning

Contact Info

Product

Resources

About