2015
DOI: 10.1109/tac.2015.2418411
|View full text |Cite
|
Sign up to set email alerts
|

Classification-Based Approximate Policy Iteration

Abstract: Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classificationbased approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 36 publications
(86 reference statements)
0
4
0
Order By: Relevance
“…The optimal solution is to use a large number of patterns (several hundred pixels) that can be used to test different spectral properties of the tested objects in individual iterations of the classification [41]. In this case, the iteration that offers the best fit for each class can be chosen, and then the main part of the classification can be run [42]. This situation is difficult in the case of mapping invasive and expansive plants because the spatial patterns created by the studied species and the environment are very variable, depending on local conditions, e.g., agricultural and agrotechnical procedures [43], as well as land cover, e.g., wasteland [44,45].…”
Section: Discussionmentioning
confidence: 99%
“…The optimal solution is to use a large number of patterns (several hundred pixels) that can be used to test different spectral properties of the tested objects in individual iterations of the classification [41]. In this case, the iteration that offers the best fit for each class can be chosen, and then the main part of the classification can be run [42]. This situation is difficult in the case of mapping invasive and expansive plants because the spatial patterns created by the studied species and the environment are very variable, depending on local conditions, e.g., agricultural and agrotechnical procedures [43], as well as land cover, e.g., wasteland [44,45].…”
Section: Discussionmentioning
confidence: 99%
“…Weighting the error in policies with a value function is reminiscent of the loss function appearing in some classification-based approximate policy iteration methods such as the work by Lazaric et al (2010); Farahmand et al (2015); Lazaric et al (2016) (and different from the original formulation by Lagoudakis & Parr (2003b) and more recent instantiation by Silver et al (2017a) whose policy loss does not incorporate the value functions), Policy Search by Dynamic Programming (Bagnell et al, 2004), and Conservative Policy Iteration (Kakade & Langford, 2002).…”
Section: Convergence Of Model-based Pgmentioning
confidence: 99%
“…There has also been work to use classifiers to represent policies in RL (Bagnell et al, 2003;Rexakis & Lagoudakis, 2008;Dimitrakakis & Lagoudakis, 2008;Blatt & Hero, 2006), which is tangential to our work; our focus is on using the principle Structural Risk Minimization for RL. Additional work uses classification theory to bound performance for on-policy data (Lazaric et al, 2010;Farahmand et al, 2012), for which Section 3.1.3 can be seen as extending to batch, off-policy data.…”
Section: Related Workmentioning
confidence: 99%