2011
DOI: 10.1613/jair.3125
|View full text |Cite
|
Sign up to set email alerts
|

A Monte-Carlo AIXI Approximation

Abstract: This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
104
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 112 publications
(105 citation statements)
references
References 72 publications
1
104
0
Order By: Relevance
“…Summing up, while there has been some work on comparing humans and machines on some specific tasks, e.g., humans and Q-learning in [2], this paper may start a series of experimental research comparing several artificial agents (such as other algorithms in reinforcement learning, MonteCarlo AIXI [16], etc.) and other biological agents (children, other apes, etc) for general tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Summing up, while there has been some work on comparing humans and machines on some specific tasks, e.g., humans and Q-learning in [2], this paper may start a series of experimental research comparing several artificial agents (such as other algorithms in reinforcement learning, MonteCarlo AIXI [16], etc.) and other biological agents (children, other apes, etc) for general tasks.…”
Section: Discussionmentioning
confidence: 99%
“…The above can easily be shown (for example, see Proposition 1 in the work of Veness et al (2011)) to define a valid environment model. Because of this, we can simply use…”
Section: Definition 2 Given a Finite Set Of Environment Modelsmentioning
confidence: 98%
“…Note however that in some special cases, more efficient techniques exist with time complexity sublinear in |M|. One example is Context Tree Weighting (Willems et al, 1995), which was used as the basis for our previous AIXI approximations ( Veness et al, 2010Veness et al, , 2011.…”
Section: Definition 2 Given a Finite Set Of Environment Modelsmentioning
confidence: 99%
See 2 more Smart Citations