Classification-Based Approximate Policy Iteration

Farahmand, Amir-massoud; Precup, Doina; Barreto, André; Ghavamzadeh, Mohammad

doi:10.1109/tac.2015.2418411

Cited by 7 publications

(4 citation statements)

References 36 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The optimal solution is to use a large number of patterns (several hundred pixels) that can be used to test different spectral properties of the tested objects in individual iterations of the classification [41]. In this case, the iteration that offers the best fit for each class can be chosen, and then the main part of the classification can be run [42]. This situation is difficult in the case of mapping invasive and expansive plants because the spatial patterns created by the studied species and the environment are very variable, depending on local conditions, e.g., agricultural and agrotechnical procedures [43], as well as land cover, e.g., wasteland [44,45].…”

Section: Discussionmentioning

confidence: 99%

Mapping Invasive Plant Species with Hyperspectral Data Based on Iterative Accuracy Assessment Techniques

2021

View full text Add to dashboard Cite

Recent developments in computer hardware made it possible to assess the viability of permutation-based approaches in image classification. Such approaches sample a reference dataset multiple times in order to train an arbitrary number of machine learning models while assessing their accuracy. So-called iterative accuracy assessment techniques or Monte-Carlo-based approaches can be a useful tool when it comes to assessment of algorithm/model performance but are lacking when it comes to actual image classification and map creation. Due to the multitude of models trained, one has to somehow reason which one of them, if any, should be used in the creation of a map. This poses an interesting challenge since there is a clear disconnect between algorithm assessment and the act of map creation. Our work shows one of the ways this disconnect can be bridged. We calculate how often a given pixel was classified as given class in all variations of a multitude of post-classification images delivered by models trained during the iterative assessment procedure. As a classification problem, a mapping of Calamagrostis epigejos, Rubus spp., Solidago spp. invasive plant species using three HySpex hyperspectral datasets collected in June, August and September was used. As a classification algorithm, the support vector machine approach was chosen, with training hyperparameters obtained using a grid search approach. The resulting maps obtained F1-scores ranging from 0.87 to 0.89 for Calamagrostis epigejos, 0.89 to 0.97 for Rubus spp. and 0.99 for Solidago spp.

show abstract

Section: Discussionmentioning

confidence: 99%

Mapping Invasive Plant Species with Hyperspectral Data Based on Iterative Accuracy Assessment Techniques

2021

View full text Add to dashboard Cite

show abstract

“…Weighting the error in policies with a value function is reminiscent of the loss function appearing in some classification-based approximate policy iteration methods such as the work by Lazaric et al (2010); Farahmand et al (2015); Lazaric et al (2016) (and different from the original formulation by Lagoudakis & Parr (2003b) and more recent instantiation by Silver et al (2017a) whose policy loss does not incorporate the value functions), Policy Search by Dynamic Programming (Bagnell et al, 2004), and Conservative Policy Iteration (Kakade & Langford, 2002).…”

Section: Convergence Of Model-based Pgmentioning

confidence: 99%

Policy-Aware Model Learning for Policy Gradient Methods

Romina¹,

Ghavamzadeh²,

Farahmand³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner. We focus on policy gradient type of planning algorithms and derive new loss functions for model learning that incorporate how the planner uses the model. We call this approach Policy-Aware Model Learning (PAML). We theoretically analyze a generic model-based policy gradient algorithm and provide a convergence guarantee for the optimized policy. We also empirically evaluate PAML on some benchmark problems, showing promising results.

show abstract

“…There has also been work to use classifiers to represent policies in RL (Bagnell et al, 2003;Rexakis & Lagoudakis, 2008;Dimitrakakis & Lagoudakis, 2008;Blatt & Hero, 2006), which is tangential to our work; our focus is on using the principle Structural Risk Minimization for RL. Additional work uses classification theory to bound performance for on-policy data (Lazaric et al, 2010;Farahmand et al, 2012), for which Section 3.1.3 can be seen as extending to batch, off-policy data.…”

Section: Related Workmentioning

confidence: 99%

Structural Return Maximization for Reinforcement Learning

Joseph,

Velez,

Roy

2014

Preprint

View full text Add to dashboard Cite

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy from a designer-provided class of policies given a fixed set of training data. Choosing the policy which maximizes an estimate of return often leads to over-fitting when only limited data is available, due to the size of the policy class in relation to the amount of data available. In this work, we focus on learning policy classes that are appropriately sized to the amount of data available. We accomplish this by using the principle of Structural Risk Minimization, from Statistical Learning Theory, which uses Rademacher complexity to identify a policy class that maximizes a bound on the return of the best policy in the chosen policy class, given the available data. Unlike similar batch RL approaches, our bound on return requires only extremely weak assumptions on the true system.

show abstract

Classification-Based Approximate Policy Iteration

Cited by 7 publications

References 36 publications

Mapping Invasive Plant Species with Hyperspectral Data Based on Iterative Accuracy Assessment Techniques

Mapping Invasive Plant Species with Hyperspectral Data Based on Iterative Accuracy Assessment Techniques

Policy-Aware Model Learning for Policy Gradient Methods

Structural Return Maximization for Reinforcement Learning

Contact Info

Product

Resources

About