2010
DOI: 10.1016/j.neunet.2010.01.001
|View full text |Cite
|
Sign up to set email alerts
|

Online learning of shaping rewards in reinforcement learning

Abstract: a b s t r a c tPotential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 63 publications
(44 citation statements)
references
References 20 publications
0
44
0
Order By: Relevance
“…In RL, it may refer either to training the agent on successive tasks of increasing complexity, until the desired complexity is reached [17,27,57,59,61,65], or, more commonly, to supplementing the MDP's reward function with additional, artificial rewards [3,14,15,24,38,49,50,53,85]. This article employs shaping functions in the latter sense.…”
Section: Shapingmentioning
confidence: 99%
See 2 more Smart Citations
“…In RL, it may refer either to training the agent on successive tasks of increasing complexity, until the desired complexity is reached [17,27,57,59,61,65], or, more commonly, to supplementing the MDP's reward function with additional, artificial rewards [3,14,15,24,38,49,50,53,85]. This article employs shaping functions in the latter sense.…”
Section: Shapingmentioning
confidence: 99%
“…In single-task RL, one approach is to construct an initial shaping function based on intuition [38] or an initial task model [24], and refine it through interaction with the task. Elfwing et al [15,16] evolve a shaping function that, when transferred to a real robot, results in better performance than when transferring Q-values.…”
Section: Potential-based Shapingmentioning
confidence: 99%
See 1 more Smart Citation
“…If such a selection can be done, reward shaping can significantly improve the learning performance [6]. Improperly implemented, reward shaping, which is directly incorporated in the value update process, can also harm the convergence of a learning algorithm, modifying the optimum policy.…”
Section: Reward Shapingmentioning
confidence: 99%
“…(1) which in this work is defined as F : S × S → R, γ = 1. Based on prior work [6] we choose to set Φ(s) = V (s) shown to be an effective potential function. TiMRLA which normally learns the original models of the transition and reward functions of the source task, now learns the augmented reward function R , forming a training set of tuples < s, a > with their corresponding shaped rewards r = r + F (s, s ).…”
Section: Transferring Shaping Rewardsmentioning
confidence: 99%