Consistency of Feature Markov Processes

Foundations of Trusted Autonomy

2018

Self Cite

Artificial intelligence (AI) bears the promise of making us all healthier, wealthier, and happier by reducing the need for human labour and by vastly increasing our scientific and technological progress.Since the inception of the AI research field in the mid-twentieth century, a range of practical and theoretical approaches have been investigated. This chapter will discuss universal artificial intelligence (UAI) as a unifying framework and foundational theory for many (most?) of these approaches. The development of a foundational theory has been pivotal for many other research fields. Well-known examples include the development of Zermelo-Fraenkel set theory (ZFC) for mathematics, Turingmachines for computer science, evolution for biology, and decision and game theory for economics and the social sciences. Successful foundational theories give a precise, coherent understanding of the field, and offer a common language for communicating research. As most research studies focus on one narrow question, it is essential that the value of each isolated result can be appreciated in light of a broader framework or goal formulation. UAI offers several benefits to AI research beyond the general advantages of foundational theories just mentioned. Substantial attention has recently been called to the safety of autonomous AI systems [10]. A highly intelligent autonomous system may cause substantial unintended harm if constructed carelessly. The trustworthiness of autonomous agents may be much improved if their design is grounded in a formal theory (such as UAI) that allows formal verification of their behavioural properties. Unsafe designs can be ruled out at an early stage, and adequate attention can be given to crucial design choices.

Section: Feature Reinforcement Learningmentioning

confidence: 90%

Universal Artificial Intelligence

Foundations of Trusted Autonomy

2018

Self Cite

“…FRL starts with a class of maps Φ, compares different φ ∈ Φ, and selects the most appropriate one given the experience h t so far. Several criteria based on how well φ reduces P to an MDP have been devised [Hut09b,Hut09a] and theoretically [SH10] and experimentally [NSH11] investigated [Ngu13]. Theorems 5-9 show that demanding P φ to be approximately MDP is overly restrictive.…”

Section: Feature Reinforcement Learningmentioning

confidence: 99%

Extreme State Aggregation beyond MDPs

Lecture Notes in Computer Science

2014

Self Cite

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp. MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.

“…The interested reader is referred to [14] for more detailed analytical formulas, and [26] for further motivation and consistency proofs of the ΦMDP model.…”

Section: Cost Functionmentioning

confidence: 99%

“…The recently introduced Feature Markov Decision Process (ΦMDP) framework [14] attempts to reduce actual RL tasks to MDPs for the purpose of attacking the general RL problem where the environment's model as well as the set of states are unknown. In [26], Sunehag and Hutter take a step further in the theoretical investigation of Feature Reinforcement Learning by proving consistency results. In this article, we develop an actual Feature Reinforcement Learning algorithm and empirically analyze its performance in a number of environments.…”

Section: Introductionmentioning

confidence: 99%

Feature Reinforcement Learning in Practice

Nguyen

Sunehag

Lecture Notes in Computer Science

2012

Self Cite

Abstract. Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [12]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm [18] and the recent active-LZ algorithm [6], and is competitive with MC-AIXI-CTW [26] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our ΦMDP algorithm is much simpler, yet consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.