2007
DOI: 10.1561/2200000003
|View full text |Cite
|
Sign up to set email alerts
|

Learning Representation and Control in Markov Decision Processes: New Frontiers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2007
2007
2017
2017

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 51 publications
(33 citation statements)
references
References 115 publications
(163 reference statements)
0
28
0
Order By: Relevance
“…A directed graph was constructed from the samples where vertices correspond to state variables. The digraph construction followed along the lines discussed in (Mahadevan et al, 2006) with a slight difference for the continuous domain (discussed in detail below). In the discrete domain, directed edges were added for actual state transitions seen in the training episodes.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A directed graph was constructed from the samples where vertices correspond to state variables. The digraph construction followed along the lines discussed in (Mahadevan et al, 2006) with a slight difference for the continuous domain (discussed in detail below). In the discrete domain, directed edges were added for actual state transitions seen in the training episodes.…”
Section: Methodsmentioning
confidence: 99%
“…Both directed Laplacians are symmetric matrices, ensuring a complete orthonormal basis of real eigenvectors. The symmetrization 1 2 (ΨP + P T Ψ) essentially creates an undirected graph with edge weights Mahadevan et al (2006) proposed using three different symmetrization techniques:…”
Section: Directed Graph Laplacianmentioning
confidence: 99%
“…One class of methods aims at constructing a parsimonious set of features (basis functions). These include tuning the parameter of Gaussian RBF either using a gradient-or the cross-entropymethod in the context of LSTD (Menache et al, 2005), deriving new basis functions with nonparametric techniques (Keller et al, 2006;Parr et al, 2007) or using a combination of numerical analysis and nonparametric techniques (Mahadevan, 2009). These methods, however, do not attempt to control the tradeoff between the approximation and estimation errors.…”
Section: The Choice Of the Function Spacementioning
confidence: 99%
“…Option goal states have been selected by a variety of methods, the most common relying on computing visit or reward statistics over individual states to identify useful subgoals (Digney 1996, McGovern and Barto 2001, Şimşek and Barto 2004, 2009). Graph-based methods (Mannor et al 2004, Menache et al 2002, Şimşek et al 2005) build a state-transition graph and use its properties (e.g., local graph cuts, Şimşek et al 2005) to identify option goals.…”
Section: Hierarchical Reinforcement Learningmentioning
confidence: 99%