Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda

Soares, Nate; Fallenstein, Benya

doi:10.1007/978-3-662-54033-6_5

Cited by 59 publications

(69 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section will discuss some issues that nonetheless arise, and ways in which those issues can potentially be addressed. For more comprehensive overviews of safety concerns of intelligent agents, see [4,21,76,83]. …”

Section: Predicting and Controlling Behaviourmentioning

confidence: 99%

Universal Artificial Intelligence

Hutter

2018

Foundations of Trusted Autonomy

View full text Add to dashboard Cite

Artificial intelligence (AI) bears the promise of making us all healthier, wealthier, and happier by reducing the need for human labour and by vastly increasing our scientific and technological progress.Since the inception of the AI research field in the mid-twentieth century, a range of practical and theoretical approaches have been investigated. This chapter will discuss universal artificial intelligence (UAI) as a unifying framework and foundational theory for many (most?) of these approaches. The development of a foundational theory has been pivotal for many other research fields. Well-known examples include the development of Zermelo-Fraenkel set theory (ZFC) for mathematics, Turingmachines for computer science, evolution for biology, and decision and game theory for economics and the social sciences. Successful foundational theories give a precise, coherent understanding of the field, and offer a common language for communicating research. As most research studies focus on one narrow question, it is essential that the value of each isolated result can be appreciated in light of a broader framework or goal formulation. UAI offers several benefits to AI research beyond the general advantages of foundational theories just mentioned. Substantial attention has recently been called to the safety of autonomous AI systems [10]. A highly intelligent autonomous system may cause substantial unintended harm if constructed carelessly. The trustworthiness of autonomous agents may be much improved if their design is grounded in a formal theory (such as UAI) that allows formal verification of their behavioural properties. Unsafe designs can be ruled out at an early stage, and adequate attention can be given to crucial design choices.

show abstract

Section: Predicting and Controlling Behaviourmentioning

confidence: 99%

Universal Artificial Intelligence

Hutter

2018

Foundations of Trusted Autonomy

View full text Add to dashboard Cite

show abstract

“…The question of how to model agents as an ordinary part of the environment is of interest in the speculative study of human-level and smarter-than-human artificial intelligence [13,14]. Although such systems are still firmly in the domain of futurism, there has been a recent wave of interest in foundational research aimed at understanding their behavior, in order to ensure that they will behave as intended if and when they are developed [15,16,14].…”

Section: Related Workmentioning

confidence: 99%

Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence

Fallenstein

Taylor

Christiano

2015

Logic, Rationality, and Interaction

View full text Add to dashboard Cite

Classical game theory treats players as special-a description of a game contains a full, explicit enumeration of all players-even though in the real world, "players" are no more fundamentally special than rocks or clouds. It isn't trivial to find a decision-theoretic foundation for game theory in which an agent's coplayers are a non-distinguished part of the agent's environment. Attempts to model both players and the environment as Turing machines, for example, fail for standard diagonalization reasons.

show abstract

“…$15.00 https://doi.org /10.1145/3375627.3375872 Algorithmic bias fits into a larger body of concerns dealing with the integration and performance of technology artifacts in society. The problem of value alignment [2,15,22] captures this larger body of concerns: how can we design systems that achieve their specific objectives while remaining aligned with the broader values of society? The values implicated can include fairness, privacy, safety, trustworthiness, etc.…”

Section: Introductionmentioning

confidence: 99%

Steps Towards Value-Aligned Systems

Osoba

Boudreaux

Yeung

2020

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

View full text Add to dashboard Cite

Algorithmic (including AI/ML) decision-making artifacts are an established and growing part of our decision-making ecosystem. They are now indispensable tools that help us manage the flood of information we use to try to make effective decisions in a complex world. The current literature is full of examples of how individual artifacts violate societal norms and expectations (e.g. violations of fairness, privacy, or safety norms). Against this backdrop, this discussion highlights an under-emphasized perspective in the body of research focused on assessing value misalignment in AI-equipped sociotechnical systems. The research on value misalignment so far has a strong focus on the behavior of individual tech artifacts. This discussion argues for a more structured systems-level approach for assessing value-alignment in sociotechnical systems. We rely primarily on the research on fairness to make our arguments more concrete. And we use the opportunity to highlight how adopting a system perspective improves our ability to explain and address value misalignments better. Our discussion ends with an exploration of priority questions that demand attention if we are to assure the value alignment of whole systems, not just individual artifacts.

show abstract

Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda

Cited by 59 publications

References 17 publications

Universal Artificial Intelligence

Universal Artificial Intelligence

Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence

Steps Towards Value-Aligned Systems

Contact Info

Product

Resources

About