Humanity faces multiple existential risks in the coming decades due to technological advances in AI, and the possibility of unintended behaviors emerging from such systems. We believe that better outcomes may be possible by rigorously exploring frameworks for intelligent (goaloriented) behavior inspired by computational neuroscience. Here, we explore how the Free Energy Principle and Active Inference (FEP-AI) framework may provide solutions for these challenges via affording the realization of control systems operating according to principles of hierarchical Bayesian modeling and prediction-error (i.e., surprisal) minimization. Such FEP-AI agents are equipped with hierarchically-organized world models capable of counterfactual planning, realized by the kinds of reciprocal message passing performed by mammalian nervous systems, so allowing for the flexible construction of representations of self-world dynamics with varying degrees of temporal depth. We will describe how such systems can not only infer the abstract causal structure of their environment, but also develop capacities for "theory of mind" and collaborative (human-aligned) decision making. Such architectures could help to sidestep potentially dangerous combinations of systems with high intelligence and human-incompatible values, since such mental processes are entangled (rather than orthogonal) in FEP-AI agents. We will further describe how (meta-)learned deep goal hierarchies may also welldescribe biological systems, suggesting that potential risks from "mesaoptimisers" may actually represent one of the most promising approaches to AI safety: minimizing prediction-error relative to causal self-world models can be used to cultivate modes of policy selection and agent personalities that robustly optimize for achieving goals that are consistently aligned with both individual and shared values. Finally, we will describe how iterative policy selection and preference learning can result in "value cores" or self-reinforcing, relatively stable attracting states that agents will seek to return to through their goal-oriented imaginings and actions.