Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553441
|View full text |Cite
|
Sign up to set email alerts
|

Near-Bayesian exploration in polynomial time

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
155
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 139 publications
(157 citation statements)
references
References 9 publications
2
155
0
Order By: Relevance
“…In general, computing the Bayes optimal policy is often intractable, making approximation inevitable (see, e.g., Duff (2002)). It remains an active research area to develop efficient algorithms to approximate the Bayes optimal policy (Poupart et al, 2006;Kolter and Ng, 2009).…”
Section: Bayesian Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…In general, computing the Bayes optimal policy is often intractable, making approximation inevitable (see, e.g., Duff (2002)). It remains an active research area to develop efficient algorithms to approximate the Bayes optimal policy (Poupart et al, 2006;Kolter and Ng, 2009).…”
Section: Bayesian Frameworkmentioning
confidence: 99%
“…For instance, the analytic tools developed for PAC-MDP algorithms are used to derive the BEB algorithm that approximates the Bayes optimal policy except for polynomially many steps (Kolter and Ng, 2009). As another example, the notion of known state-actions is combined with the posterior distribution of models to yield a randomized PAC-MDP algorithm (BOSS) that is able to use prior knowledge about MDP models (Asmuth et al, 2009).…”
Section: Bayesian Frameworkmentioning
confidence: 99%
“…: a uniform distribution consisting to assume each transition has been observed once). Bayesian Exploration Bonus (BEB) (Kolter and Ng, 2009a) builds the expected MDP given the current history at each timestep. The reward function of this MDP is slightly modified to give an exploration bonus to transitions which have been observed less frequently.…”
Section: State-of-the-artmentioning
confidence: 99%
“…First, sampling possible transition probabilities, based on past observations, relies on the computation of P( f |h t ) ∝ P(h t | f )P( f ), which is intractable for most probabilistic models (Duff, 2002;Kaelbling et al, 1998;Kolter and Ng, 2009b). Second, the BAMDP state space is actually made of all possible histories and is infinite.…”
Section: Solving Bamdpmentioning
confidence: 99%
See 1 more Smart Citation