Twenty-First International Conference on Machine Learning - ICML '04 2004
DOI: 10.1145/1015330.1015401
|View full text |Cite
|
Sign up to set email alerts
|

Bellman goes relational

Abstract: Motivated by the interest in relational reinforcement learning, we introduce a novel relational Bellman update operator called ReBel. It employs a constraint logic programming language to compactly represent Markov decision processes over relational domains. Using ReBel, a novel value iteration algorithm is developed in which abstraction (over states and actions) plays a major role. This framework provides new insights into relational reinforcement learning. Convergence results as well as experiments are prese… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 63 publications
(53 citation statements)
references
References 12 publications
(10 reference statements)
0
53
0
Order By: Relevance
“…The idea is to construct minimal logical partitions of the state space required to make all necessary value function distinctions. For example, Kersting et al [13] present an exact value iteration for relational MDPs. Sanner et al [17] exploit factored transition models of first-order MPDs to approximate the value function based on linear combinations of abstract first-order value functions.…”
Section: Related Workmentioning
confidence: 99%
“…The idea is to construct minimal logical partitions of the state space required to make all necessary value function distinctions. For example, Kersting et al [13] present an exact value iteration for relational MDPs. Sanner et al [17] exploit factored transition models of first-order MPDs to approximate the value function based on linear combinations of abstract first-order value functions.…”
Section: Related Workmentioning
confidence: 99%
“…The fourth contrast is with those methods [13,14,19,24,29] that rely upon learning, that is, upon training agents to perform well in simulated environments. Here the evolving experience of the agent is effectively translated into merit-oriented weightings of the alternative actions available to each perception.…”
Section: Positioningmentioning
confidence: 99%
“…The key observation is that each RMDP induces a traditional MDP [15], which can be obtained by starting in some initial ground state and then applying each abstract transition until no more new ground states can be computed. Thus, the existence of an optimal policy π for each (resulting) ground MDP is guaranteed.…”
Section: Relational Navigation Policiesmentioning
confidence: 99%
“…Later, Dietterich and Flann [22] combined this idea with reinforcement learning by associating these generalized state descriptions with values obtained from value iteration. Subsequently, Boutilier et al [23] and Kersting et al [15] generalized Dietterich and Flann's approach to relational domains, i.e., RMDPs. Recently, Mausam and Weld [10] suggested to approximate the value function by inducing a relational regression tree from observed traces.…”
Section: Relational Navigation Policiesmentioning
confidence: 99%