2011
DOI: 10.1007/s10994-011-5254-7
|View full text |Cite
|
Sign up to set email alerts
|

Model selection in reinforcement learning

Abstract: We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
36
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(46 citation statements)
references
References 31 publications
(40 reference statements)
2
36
0
Order By: Relevance
“…Our experiments are run using GridLAB-D 5 , an open-source smart-grid simulator that was developed for the U.S. Dept. of Energy.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our experiments are run using GridLAB-D 5 , an open-source smart-grid simulator that was developed for the U.S. Dept. of Energy.…”
Section: Methodsmentioning
confidence: 99%
“…There, the setup was offline, supervised learning for learning the transition function, while ours is an online reinforcement learning setup, for approximating the value function, where there are no labels over the data, but only the values to which FVI converge to, which could be different then the real state values. A paper that is closely related to ours is [5], which designs an abstract model-selection algorithm and proves theoretical guarantees about it. Similarly to here, they consider batch RL, in which a data set D of sampled transitions from the MDP is given, and is used for selecting a candidate value function by minimizing a Bellman error.…”
Section: Related Workmentioning
confidence: 95%
“…Farahmand et al [36] presented a regularized fitted Q iteration algorithm based on L2 regularization to control the complexity of the value function. Farahmand and Szepesvári [37] developed a complexity regularization-based algorithm to solve the problem of model selection in the batch RL algorithms, which was formulated as finding an action-value function with a small Bellman error among a set of candidate functions. The L2 regularized LSTD problem is presented by adding an L2 penalty term into the projection equation (16) …”
Section: Batch Rl Based On Feature Selectionmentioning
confidence: 99%
“…If Π matches the regularity of the policy, we achieve better error upper bounds. PolicyEval and Π should ideally be chosen by an automatic model selection algorithm [25].…”
Section: Capi Frameworkmentioning
confidence: 99%