Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Sun, Yu; Gomez, Faustino; Schmidhuber, Jürgen

doi:10.1007/978-3-642-22887-2_5

Cited by 108 publications

(111 citation statements)

References 7 publications

Supporting

Mentioning

108

Contrasting

Order By: Relevance

“…As a generalization of exploration methods in reinforcement learning, such as [18], ideas have been suggested such as planning to be surprised [56] or the combination of empirical learning progress with visit counts [39].…”

Section: Planning Topicsmentioning

confidence: 99%

The strategic student approach for life-long exploration and learning

Lopes

Oudeyer

2012

2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)

View full text Add to dashboard Cite

Abstract-This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which conditions a strategy where time allocation or learning method is chosen from the easier to the more complex topic is optimal. Then, we show an algorithm, based on multi-armed bandit techniques, that allows empirical online evaluation of learning progress and approximates the optimal solution under more general conditions. Finally, we show that the strategic student problem formulation allows to view in a common framework many previous approaches to active and developmental learning.

show abstract

Section: Planning Topicsmentioning

confidence: 99%

The strategic student approach for life-long exploration and learning

Lopes

Oudeyer

2012

2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)

View full text Add to dashboard Cite

show abstract

“…In the context of online learning, one way to avoid bad-bootstraps is to select actions based on (expected) epistemic value (Schwartenbeck et al, 2018;Friston et al, 2017;Sun et al, 2011), where agents seek out novel interactions based on counterfactually informed beliefs about which actions will lead to informative transitions. By utilising the uncertainty encoded by (beliefs about) model parameters, this approach can proactively identify optimally informative transitions.…”

Section: Learning Action-oriented Models: Good and Bad Bootstrapsmentioning

confidence: 99%

“…However, random exploration of this sort is likely to be inefficient in rich and complex environments. In such environments, a more powerful method is to utilize the uncertainty quantified by probabilistic models to determine epistemic (or intrinsic, information-seeking, uncertainty reducing) actions that attempt to minimize the model uncertainty in a directed manner (Stadie et al, 2015;Houthooft et al, 2016;Sun et al, 2011;Friston et al, 2015;Burda et al, 2018;Friston et al, 2017). While epistemic actions can help avoid bad-bootstraps and sub-optimal convergence, such actions necessarily increase the diversity and dimensionality of sampled data, thus sacrificing the benefits afforded by learning in the presence of goal-directed actions.…”

Section: Introductionmentioning

confidence: 99%

Learning action-oriented models through active inference

Tschantz

Seth

Buckley

2019

Preprint

View full text Add to dashboard Cite

Converging theories suggest that organisms learn and exploit probabilistic models of their environment. However, it remains unclear how such models can be learned in practice. The open-ended complexity of natural environments means that it is generally infeasible for organisms to model their environment comprehensively. Alternatively, action-oriented models attempt to encode a parsimonious representation of adaptive agent-environment interactions. One approach to learning action-oriented models is to learn online in the presence of goal-directed behaviours. This constrains an agent to behaviourally relevant trajectories, reducing the diversity of the data a model need account for. Unfortunately, this approach can cause models to prematurely converge to sub-optimal solutions, through a process we refer to as a bad-bootstrap. Here, we exploit the normative framework of active inference to show that efficient action-oriented models can be learned by balancing goal-oriented and epistemic (information-seeking) behaviours in a principled manner. We illustrate our approach using a simple agent-based model of bacterial chemotaxis. We first demonstrate that learning via goal-directed behaviour indeed constrains models to behaviorally relevant aspects of the environment, but that this approach is prone to sub-optimal convergence. We then demonstrate that epistemic behaviours facilitate the construction of accurate and comprehensive models, but that these models are not tailored to any specific behavioural niche and are therefore less efficient in their use of data. Finally, we show that active inference agents learn models that are parsimonious, tailored to action, and which avoid bad bootstraps and sub-optimal convergence. Critically, our results indicate that models learned through active inference can support adaptive behaviour in spite of, and indeed because of, their departure from veridical representations of the environment. Our approach provides a principled method for learning adaptive models from limited interactions with an environment, highlighting a route to sample efficient learning algorithms.

show abstract

“…This is why expected Bayesian surprise has to be maximised when selecting actions, where it 830 plays the role of epistemic affordance (Parr and Friston 2017). As noted above, this is an important 831 imperative that underwrites uncertainty reducing, exploratory behaviour; known as intrinsic motivation in 832 neurorobotics (Schmidhuber 2006) or salience when 'planning to be surprised' (Sun, Gomez et al 2011, 833 Barto, Mirolli et al 2013). An intuitive way of thinking about whether surprise should be maximised or 834 minimised is to appeal to the analogy of scientific experiment.…”

Section: Simulations 673mentioning

confidence: 99%

Active Listening

Friston

Sajid²,

Quiroga-Martinez³

et al. 2020

Preprint

View full text Add to dashboard Cite

16This paper introduces active listening, as a unified framework for synthesising and recognising speech. The 17 notion of active listening inherits from active inference, which considers perception and action under one 18 universal imperative: to maximise the evidence for our (generative) models of the world. First, we describe 19 a generative model of spoken words that simulates (i) how discrete lexical, prosodic, and speaker attributes 20give rise to continuous acoustic signals; and conversely (ii) how continuous acoustic signals are recognised 21 as words. The 'active' aspect involves (covertly) segmenting spoken sentences and borrows ideas from 22 active vision. It casts speech segmentation as the selection of internal actions, corresponding to the 23 placement of word boundaries. Practically, word boundaries are selected that maximise the evidence for an 24 internal model of how individual words are generated. We establish face validity by simulating speech 25 recognition and showing how the inferred content of a sentence depends on prior beliefs and background 26 noise. Finally, we consider predictive validity by associating neuronal or physiological responses, such as 27 the mismatch negativity and P300, with belief updating under active listening, which is greatest in the 28 absence of accurate prior beliefs about what will be heard next. 29 30 31 Key words: speech recognition, voice, active inference, active listening, segmentation, variational Bayes, 32 audition.'invariant' (Liberman, Cooper et al. 1967)-words are never heard out of a particular context. When 54 considering how words are generated, there is wide variability in the pronunciation of the same word among 55 different speakers (Hillenbrand, Getty et al. 1995, Remez 2010)-and even when spoken by the same 56 speaker, pronunciation depends on prosody (Bänziger and Scherer, 2005). From the perspective of 57 recognition, two signals that are acoustically identical can be perceived as different words or phonemes by 58 human listeners, depending on their context-for example, the preceding words or phonemes (Mann 1980, 59 Miller, Green et al. 1984, preceding spectral content (Holt, Lotto et al. 2000), or the duration of a vowel 60 that follows a consonant (Miller and Liberman 1979). The current approach considers the processes The idea that speech segmentation and lexical inference operate together did not figure in early accounts of 63 speech recognition. For example, the Fuzzy Logic Model of Perception (FLMP) (Oden and Massaro 1978, 64 Massaro 1987, Massaro 1989 matches acoustic features with prototype representations to recognise 65 phonemes, even when considered in the context of words and sentences. Similarly, the Neighbourhood 66 Activation Model (NAM) (Luce 1986, Luce and Pisoni 1998) considers individual word recognition; it 67 accounts for effects of word frequency, but does not address the segmentation problem. Later connectionist 68 accounts, such as TRACE (McClelland and Elman 1986), assumed that competition between lexical nodes ...

show abstract

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Abstract: Abstract. To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.

Cited by 108 publications

References 7 publications

The strategic student approach for life-long exploration and learning

The strategic student approach for life-long exploration and learning

Learning action-oriented models through active inference

Active Listening

Contact Info

Product

Resources

About