2008 4th International IEEE Conference Intelligent Systems 2008
DOI: 10.1109/is.2008.4670492
|View full text |Cite
|
Sign up to set email alerts
|

Plan-based reward shaping for reinforcement learning

Abstract: Reinforcement learning, while being a highly popular learning technique for agents and multi-agent systems, has so far encountered difficulties when applying it to more complex domains due to scaling-up problems. This paper focuses on the use of domain knowledge to improve the convergence speed and optimality of various RL techniques. Specifically, we propose the use of high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
72
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 50 publications
(72 citation statements)
references
References 7 publications
(12 reference statements)
0
72
0
Order By: Relevance
“…This is, however, often not the case in many practical applications. In many domains, heuristic knowledge can be easily identified by the designer of the system [24] or acquired using reasoning or learning [12]. In the area of single-agent reinforcement learning, potential-based reward shaping has been proven to be a principled and theoretically correct method of incorporating heuristic knowledge into an agent [21].…”
Section: Introductionmentioning
confidence: 99%
“…This is, however, often not the case in many practical applications. In many domains, heuristic knowledge can be easily identified by the designer of the system [24] or acquired using reasoning or learning [12]. In the area of single-agent reinforcement learning, potential-based reward shaping has been proven to be a principled and theoretically correct method of incorporating heuristic knowledge into an agent [21].…”
Section: Introductionmentioning
confidence: 99%
“…Lastly, planning has been combined with RL through reward shaping [8,14]. Reward shaping is a technique to hasten Reinforcement Learning when the reward is sparse, and the agent has to execute a long sequence of actions before getting any feedback about its choices.…”
Section: Related Workmentioning
confidence: 99%
“…The reward function is enriched by adding a term which provides feedback for intermediate states, helping guide the agent towards the goal. Grzes and Kudenko proposed a method [14] in which the agent computes a plan on a STRIPS representation of the domain, and uses it to define a shaping function. The function guides the agent along the plan, helping it find the goal sooner.…”
Section: Related Workmentioning
confidence: 99%
“…A related approach to learning the potential function for reward shaping was investigated by the authors of this paper in Grzes and Kudenko (2008b). In this case, symbolic knowledge which is represented in the form of STRIPS operators is used to create a high level symbolic plan (Ghallab et al, 2004).…”
Section: Related Workmentioning
confidence: 99%