Peter Sunehag scite author profile

Hutter

2013

Ray Solomonoff invented the notion of universal induction featuring an aptly termed "universal" prior probability function over all possible computable environments [Sol64]. The essential property of this prior was its ability to dominate all other such priors. Later, Levin introduced another construction -a mixture of all possible priors or "universal mixture" [ZL70]. These priors are well known to be equivalent up to multiplicative constants. Here, we seek to clarify further the relationships between these three characterisations of a universal prior (Solomonoff's, universal mixtures, and universally dominant priors). We see that the the constructions of Solomonoff and Levin define an identical class of priors, while the class of universally dominant priors is strictly larger. We provide some characterisation of the discrepancy.

Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems

Lever

Liu

et al. 2019

In nature, group behaviours such as flocking as well as crossspecies symbiotic partnerships are observed in vastly different forms and circumstances. We hypothesize that such strategies can arise in response to generic predator-prey pressures in a spatial environment with range-limited sensation and action. We evaluate whether these forms of coordination can emerge by independent multi-agent reinforcement learning in simple multiple-species ecosystems. In contrast to prior work, we avoid hand-crafted shaping rewards, specific actions, or dynamics that would directly encourage coordination across agents. Instead we test whether coordination emerges as a consequence of adaptation without encouraging these specific forms of coordination, which only has indirect benefit. Our simulated ecosystems consist of a generic food chain involving three trophic levels: apex predator, mid-level predator, and prey. We conduct experiments on two different platforms, a 3D physics engine with tens of agents as well as in a 2D grid world with up to thousands. The results clearly confirm our hypothesis and show substantial coordination both within and across species. To obtain these results, we leverage and adapt recent advances in deep reinforcement learning within an ecosystem training protocol featuring homogeneous groups of independent agents from different species (sets of policies), acting in many different random combinations in parallel habitats. The policies utilize neural network architectures that are invariant to agent individuality but not type (species) and that generalize across varying numbers of observed other agents. While the emergence of complexity in artificial ecosystems have long been studied in the artificial life community, the focus has been more on individual complexity and genetic algorithms or explicit modelling, and less on group complexity and reinforcement learning emphasized in this article. Unlike what the name and intuition suggests, reinforcement learning adapts over evolutionary history rather than a life-time and is here addressing the sequential optimization of fitness that is usually approached by genetic algorithms in the artificial life community. We utilize a shift from procedures to objectives, allowing us to bring new powerful machinery to bare, and we see emergence of complex behaviour from a sequence of simple optimization problems.

Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems

Sunehag¹,

Lever²,

Liu³

et al. 2019

In nature, group behaviours such as flocking as well as crossspecies symbiotic partnerships are observed in vastly different forms and circumstances. We hypothesize that such strategies can arise in response to generic predator-prey pressures in a spatial environment with range-limited sensation and action. We evaluate whether these forms of coordination can emerge by independent multi-agent reinforcement learning in simple multiple-species ecosystems. In contrast to prior work, we avoid hand-crafted shaping rewards, specific actions, or dynamics that would directly encourage coordination across agents. Instead we test whether coordination emerges as a consequence of adaptation without encouraging these specific forms of coordination, which only has indirect benefit. Our simulated ecosystems consist of a generic food chain involving three trophic levels: apex predator, mid-level predator, and prey. We conduct experiments on two different platforms, a 3D physics engine with tens of agents as well as in a 2D grid world with up to thousands. The results clearly confirm our hypothesis and show substantial coordination both within and across species. To obtain these results, we leverage and adapt recent advances in deep reinforcement learning within an ecosystem training protocol featuring homogeneous groups of independent agents from different species (sets of policies), acting in many different random combinations in parallel habitats. The policies utilize neural network architectures that are invariant to agent individuality but not type (species) and that generalize across varying numbers of observed other agents. While the emergence of complexity in artificial ecosystems have long been studied in the artificial life community, the focus has been more on individual complexity and genetic algorithms or explicit modelling, and less on group complexity and reinforcement learning emphasized in this article. Unlike what the name and intuition suggests, reinforcement learning adapts over evolutionary history rather than a lifetime and is here addressing the sequential optimization of fitness that is usually approached by genetic algorithms in the artificial life community. We utilize a shift from procedures to objectives, allowing us to bring new powerful machinery to bare, and we see emergence of complex behaviour from a sequence of simple optimization problems.

Wearable sensor activity analysis using semi-Markov models with a grammar

Thomas

Pervasive and Mobile Computing

Dror

et al. 2010

Detailed monitoring of training sessions of elite athletes is an important component of their training. In this paper we describe an application that performs a precise segmentation and labeling of swimming sessions. This allows a comprehensive break-down of the training session, including lap times, detailed statistics of strokes, and turns. To this end we use semi-Markov models (SMM), a formalism for labeling and segmenting sequential data, trained in a max-margin setting. To reduce the computational complexity of the task and at the same time enforce sensible output, we introduce a grammar into the SMM framework. Using the trained model on test swimming sessions of different swimmers provides highly accurate segmentation as well as perfect labeling of individual segments. The results are significantly better than those achieved by discriminative hidden Markov models.

Sparse Kernel-SARSA(λ) with an Eligibility Trace

Robards

Sanner

et al. 2011

Abstract.We introduce the first online kernelized version of SARSA(λ) to permit sparsification for arbitrary λ for 0 ≤ λ ≤ 1; this is possible via a novel kernelization of the eligibility trace that is maintained separately from the kernelized value function. This separation is crucial for preserving the functional structure of the eligibility trace when using sparse kernel projection techniques that are essential for memory efficiency and capacity control. The result is a simple and practical Kernel-SARSA(λ) algorithm for general 0 ≤ λ ≤ 1 that is memory-efficient in comparison to standard SARSA(λ) (using various basis functions) on a range of domains including a real robotics task running on a Willow Garage PR2 robot.