Action and Perception as Divergence Minimization

Hafner, Danijar; Ortéga, Pascal; Ba, Jimmy; Parr, Thomas; Friston, Karl J.; Heess, Nicolas

doi:10.48550/arxiv.2009.01791

Cited by 5 publications

(7 citation statements)

References 106 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The derivation of alternative functionals that preserve the desirable epistemic behavior of EFE optimization is an active research area [41,8]. There have been several interesting proposals such as the Free Energy of the Expected Future [22,7,9] or Generalized Free Energy [5], as well as amortization strategies [42,43]. However, the approach for a majority of the alternative functionals is to facilitate epistemics by the same mutual information term utilized by EFE while finessing the remainder of the functional.…”

Section: Discussionmentioning

confidence: 99%

“…The AIF literature describes multiple Free Energy (FE) objectives for policy planning, e.g., the Expected FE [4], Generalized FE [5] and Predicted (Bethe) FE [6] (among others, see e.g. [7,8,9]). Traditionally, the Expected Free Energy (EFE) is evaluated for a selection of policies, and a posterior distribution over policies is constructed from the corresponding EFEs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Active Inference and Epistemic Value in Graphical Models

van de Laar,

Koudahl,

van Erp

et al. 2021

Preprint

View full text Add to dashboard Cite

The Free Energy Principle (FEP) postulates that biological agents perceive and interact with their environment in order to minimize a Variational Free Energy (VFE) with respect to a generative model of their environment. The inference of a policy (future control sequence) according to the FEP is known as Active Inference (AIF). The AIF literature describes multiple VFE objectives for policy planning that lead to epistemic (information-seeking) behavior. However, most objectives have limited modeling flexibility. This paper approaches epistemic behavior from a constrained Bethe Free Energy (CBFE) perspective. Crucially, variational optimization of the CBFE can be expressed in terms of message passing on free-form generative models.The key intuition behind the CBFE is that we impose a point-mass constraint on predicted outcomes, which explicitly encodes the assumption that the agent will make observations in the future. We interpret the CBFE objective in terms of its constituent behavioral drives. We then illustrate resulting behavior of the CBFE by planning and interacting with a simulated T-maze environment. Simulations for the T-maze task illustrate how the CBFE agent exhibits an epistemic drive, and actively plans ahead to account for the impact of predicted outcomes. Compared to an EFE agent, the CBFE agent incurs expected reward in significantly more environmental scenarios. We conclude that CBFE optimization by message passing suggests a general mechanism for epistemic-aware AIF in free-form generative models.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Active Inference and Epistemic Value in Graphical Models

van de Laar,

Koudahl,

van Erp

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Such loop-closures have a further functional significance in allowing experiences to be bound together into a unified representational system where updates can be propagated in a mutually-constrained wholistic fashion, so providing a basis for the rapid and flexible construction and refinement of knowledge structures in the form of cognitive schemas that have both graph-like and map-like properties. With further experience, these schemata can then be transferred to the neocortex in the form of more stable adaptive action and thought tendencies, so forming a powerful hybrid architecture for instantiating robust causal world models (Hafner et al, 2020;Safron, 2021b).…”

Section: Latentslam a Bio-inspired Slam Algorithmmentioning

confidence: 99%

Generalized Simultaneous Localization and Mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition

Safron

Çatal²,

Verbelen³

2022

Front. Syst. Neurosci.

View full text Add to dashboard Cite

Simultaneous localization and mapping (SLAM) represents a fundamental problem for autonomous embodied systems, for which the hippocampal/entorhinal system (H/E-S) has been optimized over the course of evolution. We have developed a biologically-inspired SLAM architecture based on latent variable generative modeling within the Free Energy Principle and Active Inference (FEP-AI) framework, which affords flexible navigation and planning in mobile robots. We have primarily focused on attempting to reverse engineer H/E-S “design” properties, but here we consider ways in which SLAM principles from robotics may help us better understand nervous systems and emergent minds. After reviewing LatentSLAM and notable features of this control architecture, we consider how the H/E-S may realize these functional properties not only for physical navigation, but also with respect to high-level cognition understood as generalized simultaneous localization and mapping (G-SLAM). We focus on loop-closure, graph-relaxation, and node duplication as particularly impactful architectural features, suggesting these computational phenomena may contribute to understanding cognitive insight (as proto-causal-inference), accommodation (as integration into existing schemas), and assimilation (as category formation). All these operations can similarly be describable in terms of structure/category learning on multiple levels of abstraction. However, here we adopt an ecological rationality perspective, framing H/E-S functions as orchestrating SLAM processes within both concrete and abstract hypothesis spaces. In this navigation/search process, adaptive cognitive equilibration between assimilation and accommodation involves balancing tradeoffs between exploration and exploitation; this dynamic equilibrium may be near optimally realized in FEP-AI, wherein control systems governed by expected free energy objective functions naturally balance model simplicity and accuracy. With respect to structure learning, such a balance would involve constructing models and categories that are neither too inclusive nor exclusive. We propose these (generalized) SLAM phenomena may represent some of the most impactful sources of variation in cognition both within and between individuals, suggesting that modulators of H/E-S functioning may potentially illuminate their adaptive significances as fundamental cybernetic control parameters. Finally, we discuss how understanding H/E-S contributions to G-SLAM may provide a unifying framework for high-level cognition and its potential realization in artificial intelligences.

show abstract

“…As a final note, we highlight alternative approaches to task specification. Building on the Free Energy Principle [13,12], Hafner et al [17] consider a variety of task types in terms of minimization of distance to a desired target distribution [3]. Alternatively, Littman et al [30] and Li et al [28] propose variations of linear temporal logic (LTL) as a mechanism for specifying a task to RL agents, with related literature extending LTL to the multi-task [58] and multi-agent [18] settings, or using reward machines for capturing task structure [19].…”

Section: Other Perspectives On Rewardmentioning

confidence: 99%

On the Expressivity of Markov Reward

Abel¹,

Dabney²,

Harutyunyan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.35th Conference on Neural Information Processing Systems (NeurIPS 2021).

show abstract

Action and Perception as Divergence Minimization

Cited by 5 publications

References 106 publications

Active Inference and Epistemic Value in Graphical Models

Active Inference and Epistemic Value in Graphical Models

Generalized Simultaneous Localization and Mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition

On the Expressivity of Markov Reward

Contact Info

Product

Resources

About