Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143932
|View full text |Cite
|
Sign up to set email alerts
|

An analytic solution to discrete Bayesian reinforcement learning

Abstract: Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while mi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
153
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 178 publications
(155 citation statements)
references
References 13 publications
2
153
0
Order By: Relevance
“…1 shows per-step regret of the algorithms as the function of the number of states. As predicted by the theoretical bounds, the per-step regret ∆ of UCRL2 significantly increases as the number of states increases, whereas the average regret of our RLPA is essentially independent of the state space size 10 . Although UCWM has a lower regret than RLPA for a small number of states, it quickly loses its advantage as the number of states grows.…”
Section: Methodssupporting
confidence: 58%
“…1 shows per-step regret of the algorithms as the function of the number of states. As predicted by the theoretical bounds, the per-step regret ∆ of UCRL2 significantly increases as the number of states increases, whereas the average regret of our RLPA is essentially independent of the state space size 10 . Although UCWM has a lower regret than RLPA for a small number of states, it quickly loses its advantage as the number of states grows.…”
Section: Methodssupporting
confidence: 58%
“…In general, computing the Bayes optimal policy is often intractable, making approximation inevitable (see, e.g., Duff (2002)). It remains an active research area to develop efficient algorithms to approximate the Bayes optimal policy (Poupart et al, 2006;Kolter and Ng, 2009).…”
Section: Bayesian Frameworkmentioning
confidence: 99%
“…This augmented state representation results in a enormous state space, making the full Bayesian algorithm intractable. Attempts have been made to approximate the full algorithm by parameterizing the model and tying model parameters together [7] or sampling from the model distribution [8], [9], but these methods are still only tested in domains with 5-36 states. In addition to requiring a large amount of time to compute a policy, these methods must maintain a belief state over the model and require the user to create a well-defined model prior.…”
Section: Related Workmentioning
confidence: 99%