Model selection in reinforcement learning

Farahmand, Amir-massoud; Szepesvári, Csaba

doi:10.1007/s10994-011-5254-7

Cited by 33 publications

(46 citation statements)

References 31 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our experiments are run using GridLAB-D 5 , an open-source smart-grid simulator that was developed for the U.S. Dept. of Energy.…”

Section: Methodsmentioning

confidence: 99%

“…There, the setup was offline, supervised learning for learning the transition function, while ours is an online reinforcement learning setup, for approximating the value function, where there are no labels over the data, but only the values to which FVI converge to, which could be different then the real state values. A paper that is closely related to ours is [5], which designs an abstract model-selection algorithm and proves theoretical guarantees about it. Similarly to here, they consider batch RL, in which a data set D of sampled transitions from the MDP is given, and is used for selecting a candidate value function by minimizing a Bellman error.…”

Section: Related Workmentioning

confidence: 95%

See 1 more Smart Citation

Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System

Urieli

Stone

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. This paper investigates the application of value-function-based reinforcement learning to a smart energy control system, specifically the task of controlling an HVAC system to minimize energy while satisfying residents' comfort requirements. In theory, value-function-based reinforcement learning methods can solve control problems such as this one optimally. However, since choosing an appropriate parametric representation of the value function turns out to be difficult, we develop an alternative method, which results in a practical algorithm for value function approximation in continuous state-spaces. To avoid the need to carefully design a parametric representation for the value function, we use a smooth non-parametric function approximator, specifically Locally Weighted Linear Regression (LWR). LWR is used within Fitted Value Iteration (FVI), which has met with several practical successes. However, for efficiency reasons, LWR is used with a limited sample-size, which leads to poor performance without careful tuning of LWR's parameters. We therefore develop an efficient meta-learning procedure that performs online model-selection and tunes LWR's parameters based on the Bellman error. Our algorithm is fully implemented and tested in a realistic simulation of the HVAC control domain, and results in significant energy savings.

show abstract

“…Our experiments are run using GridLAB-D 5 , an open-source smart-grid simulator that was developed for the U.S. Dept. of Energy.…”

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 95%

Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System

Urieli

Stone

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

show abstract

“…Farahmand et al [36] presented a regularized fitted Q iteration algorithm based on L2 regularization to control the complexity of the value function. Farahmand and Szepesvári [37] developed a complexity regularization-based algorithm to solve the problem of model selection in the batch RL algorithms, which was formulated as finding an action-value function with a small Bellman error among a set of candidate functions. The L2 regularized LSTD problem is presented by adding an L2 penalty term into the projection equation (16) …”

Section: Batch Rl Based On Feature Selectionmentioning

confidence: 99%

Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey

Liu

Wang

2015

Int. J. Autom. Comput.

View full text Add to dashboard Cite

Tremendous amount of data are being generated and saved in many complex engineering and social systems every day. It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning (RL) algorithms for discounted Markov decision processes (MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lp norms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.

show abstract

“…If Π matches the regularity of the policy, we achieve better error upper bounds. PolicyEval and Π should ideally be chosen by an automatic model selection algorithm [25].…”

Section: Capi Frameworkmentioning

confidence: 99%

Classification-Based Approximate Policy Iteration

Farahmand

Precup

Barreto

et al. 2015

IEEE Trans. Automat. Contr.

Self Cite

View full text Add to dashboard Cite

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classificationbased approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results. 1

show abstract

Model selection in reinforcement learning

Cited by 33 publications

References 31 publications

Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System

Model-Selection for Non-parametric Function Approximation in Continuous Control Problems: A Case Study in a Smart Energy System

Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey

Classification-Based Approximate Policy Iteration

Contact Info

Product

Resources

About