2019
DOI: 10.48550/arxiv.1903.01021
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Strongly Asymptotically Optimal Agent in General Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…The API allows anyone to design their demos based on existing agents and environments, and for new agents and environments to be added and interfaced into the system. There has been some related work in adapting GRL results to a practical setting [Cohen et al, 2019;Lamont et al, 2017] that successfully implemented an AIXI model using a Monte Carlo Tree Search planning algorithm. As far as we are aware, theoretical predictions in the context of wireheading have not been verified experimentally before, with the single exception of an AIXIjs demo [Aslanides, 2017].…”
Section: Methodsmentioning
confidence: 99%
“…The API allows anyone to design their demos based on existing agents and environments, and for new agents and environments to be added and interfaced into the system. There has been some related work in adapting GRL results to a practical setting [Cohen et al, 2019;Lamont et al, 2017] that successfully implemented an AIXI model using a Monte Carlo Tree Search planning algorithm. As far as we are aware, theoretical predictions in the context of wireheading have not been verified experimentally before, with the single exception of an AIXIjs demo [Aslanides, 2017].…”
Section: Methodsmentioning
confidence: 99%
“…Kolmogorov complexity has also been considered in the context of reinforcement learning as a tool for complexityconstrained inference [9], [2], [15] based on Solomonoff's theory of inductive inference [19]. We differ by focusing instead on constraining the computational complexity of the obtained policy itself, assuming the underlying system to be known.…”
Section: B Contributionmentioning
confidence: 99%
“…To see this, let M be the Turing machine that takes a binary string p, inverts all zeros and ones, and outputs the result p. In particular, M (x * 1:T ) = x1:T . 9 Let in turn M be the Turing machine that given input p simulates the universal Turing machine U corresponding to φ, obtains the output U (p) and then feeds it as input to M . Then U (p) = x * 1:T implies M (p) = x1:T .…”
Section: Appendixmentioning
confidence: 99%