Learned but Not Chosen: A Reward Competition Feedback Model for the Origins of Sexual Preferences and Orientations

Safron, Adam; Klimaj, Victoria

doi:10.1007/978-3-030-84273-4_16

Cited by 1 publication

(1 citation statement)

References 233 publications

(283 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model calls for a research program to characterize the following issues: ranges of human-typical VCs; developmental circumstances that give rise to different VCs; stability of various VCs to boundary conditions; and means of verifying the existence of particular VCs in humans and in human-like AI systems. While presently under-specified, we believe this kind of conceptual frameworkinformed by concepts from (generalized) evolutionary game theory-may be helpful in working towards proofs (or at least heuristics) with respect to the regimes under which potentially transient preferences may become stabilized as more enduring orientations and personalities [45,46].…”

Section: Cultural Acquisition Of Stable Prosocial Valuesmentioning

confidence: 99%

Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents

Safron

Sheikhbahaee

Hay³

et al. 2023

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

Humanity faces multiple existential risks in the coming decades due to technological advances in AI, and the possibility of unintended behaviors emerging from such systems. We believe that better outcomes may be possible by rigorously exploring frameworks for intelligent (goaloriented) behavior inspired by computational neuroscience. Here, we explore how the Free Energy Principle and Active Inference (FEP-AI) framework may provide solutions for these challenges via affording the realization of control systems operating according to principles of hierarchical Bayesian modeling and prediction-error (i.e., surprisal) minimization. Such FEP-AI agents are equipped with hierarchically-organized world models capable of counterfactual planning, realized by the kinds of reciprocal message passing performed by mammalian nervous systems, so allowing for the flexible construction of representations of self-world dynamics with varying degrees of temporal depth. We will describe how such systems can not only infer the abstract causal structure of their environment, but also develop capacities for "theory of mind" and collaborative (human-aligned) decision making. Such architectures could help to sidestep potentially dangerous combinations of systems with high intelligence and human-incompatible values, since such mental processes are entangled (rather than orthogonal) in FEP-AI agents. We will further describe how (meta-)learned deep goal hierarchies may also welldescribe biological systems, suggesting that potential risks from "mesaoptimisers" may actually represent one of the most promising approaches to AI safety: minimizing prediction-error relative to causal self-world models can be used to cultivate modes of policy selection and agent personalities that robustly optimize for achieving goals that are consistently aligned with both individual and shared values. Finally, we will describe how iterative policy selection and preference learning can result in "value cores" or self-reinforcing, relatively stable attracting states that agents will seek to return to through their goal-oriented imaginings and actions.

show abstract

Section: Cultural Acquisition Of Stable Prosocial Valuesmentioning

confidence: 99%