The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2008
DOI: 10.1016/j.jcss.2007.08.009
|View full text |Cite
|
Sign up to set email alerts
|

An analysis of model-based Interval Estimation for Markov Decision Processes

Abstract: Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
235
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 295 publications
(237 citation statements)
references
References 16 publications
(32 reference statements)
2
235
0
Order By: Relevance
“…Most approaches for exploration focus on the tabular case and generally learn models of the environment (e.g., Brafman & Tennenholtz, 2002;Kearns & Singh, 2002;Strehl & Littman, 2008). The community is just beginning to investigate exploration strategies in model-free settings when function approximation is required (e.g., Bellemare et al, 2016b;Machado, Bellemare, & Bowling, 2017;Martin et al, 2017;Osband, Blundell, Pritzel, & Roy, 2016;Ostrovski et al, 2017;Vezhnevets et al, 2017).…”
Section: Explorationmentioning
confidence: 99%
“…Most approaches for exploration focus on the tabular case and generally learn models of the environment (e.g., Brafman & Tennenholtz, 2002;Kearns & Singh, 2002;Strehl & Littman, 2008). The community is just beginning to investigate exploration strategies in model-free settings when function approximation is required (e.g., Bellemare et al, 2016b;Machado, Bellemare, & Bowling, 2017;Martin et al, 2017;Osband, Blundell, Pritzel, & Roy, 2016;Ostrovski et al, 2017;Vezhnevets et al, 2017).…”
Section: Explorationmentioning
confidence: 99%
“…Another possibility, called the average loss (Strehl and Littman, 2008a), compares the loss in cumulative reward of an agent on the sequence of states the agent actually visits:…”
Section: Average Lossmentioning
confidence: 99%
“…Furthermore, it can be shown that every PAC-MDP algorithm is probably approximately correct in the average loss criterion (Strehl and Littman, 2008a).…”
Section: Average Lossmentioning
confidence: 99%
See 1 more Smart Citation
“…They later added a Bayesian model-based method that maintains a distribution over MDPs, determines value functions for sampled MDPs, and then uses those value functions to approximate the true value distribution (Dearden et al, 1999). In modelbased interval estimation (MBIE) one tries to build confidence intervals for the transition probability and reward estimates and then optimistically selects the action maximising the value within those confidence intervals (Wiering & Schmidhuber, 1998;Strehl & Littman, 2008). Strehl & Littman (2008) proved that MBIE is able to find near-optimal policies in polynomial time.…”
Section: Efficient Exploration In Reinforcement Learningmentioning
confidence: 99%