A Game-Theoretic Analysis of the Off-Switch Game

Wängberg, Tobias; Böörs, Mikael; Catt, Elliot; Everitt, Tom; Hutter, Marcus

doi:10.1007/978-3-319-63703-7_16

Cited by 6 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Linguistic Inquiry and Word Count -LIWC or Structured Programming for Linguistic Cue Extraction -SPLICE) [70]. Irrationality is also a subject of research [71], [72] as are persuasiveness, or superstition in heuristics [73]. Modelling and simulating knowledge characteristics is also essential for representing humans in SDM.…”

Section: Modelling Of Cognitive and Behavioural Aspectsmentioning

confidence: 99%

Creating Meaningful Intelligence for Decision-Making by Modelling Complexities of Human Influence: Review and Position

Pina,

Neves-Silva

2022

Technological Innovation for Digitalization and Virtualization

View full text Add to dashboard Cite

Strategic decision-making still struggles to cope with the interference of people in its proposed plans, creating a gap between idealised and real-world versions. Even when the existence of humans is considered, models and abstractions tend to be simplistic and lacking in complex human traits (e. g. creativity, sentiment). We analyse the current scientific landscape in the dimensions that overlap in the field of strategic decision making and posit that to provide means to a more informed and robust decision-making, humans should not only be seen as elements that need to accept and adopt decisions, but also as actors that affect their outcomes. Humans should be understood as central pieces and the strategic decision-making process should thus consider their importance both in techniques that foster co-creation, and also in developing dynamic models that demonstrate their influence and impact. In this article, we describe this problem-space and outline an approach integrating Decision Intelligence, Enterprise Architecture, Design Thinking, and architectural principles to achieve a human-centric, adaptive strategic design. We also discuss the influence of information presentation and visuals for meaningful participation in strategic decision-making processes.

show abstract

Section: Modelling Of Cognitive and Behavioural Aspectsmentioning

confidence: 99%

Creating Meaningful Intelligence for Decision-Making by Modelling Complexities of Human Influence: Review and Position

Pina,

Neves-Silva

2022

Technological Innovation for Digitalization and Virtualization

View full text Add to dashboard Cite

show abstract

“…In the CIRL framework (Hadfield-Menell, Dragan, et al, 2016), agents are uncertain about their reward function, and learn about the reward function through interaction with a human expert. Under some assumptions on the human's rationality and the agent's level of uncertainty, this leads to naturally corrigible agents Wängberg et al, 2017). Essentially, the agent will interpret the human's act of shutting them down as evidence that being turned off has higher reward than remaining turned on.…”

Section: Corrigibilitymentioning

confidence: 99%

AGI Safety Literature Review

Everitt

Lea

Hutter

2018

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.A major challenge for AGI safety research is to find the right conceptual models for plausible AGIs. This is especially challenging since we can only guess at the technology, algorithms, and structure that will be used. Indeed, even if we had the blueprint of an AGI system, understanding and predicting its behavior might still be hard: Both its design and its be-

show abstract

“…The core idea is to have ASI respect the law, based on deterrence, i.e., humankind's ability to switch off any AI/ASI. The idea of shutting down ASI was discussed as part of a game-theoretical concept [1], [2], but it was at that time not fully understood how making ASI vulnerable via the off-or Kill-switch could turn into deterring ASI to behave with more considerations for human interests. within these papers would not apply for situations when ASI Safety is provided by measures within the environment in which ASI software operates.…”

Section: Introductionmentioning

confidence: 99%

Principles for New ASI Safety Paradigms

Wittkotter¹,

Yampolskiy²

2021

Preprint

View full text Add to dashboard Cite

Artificial Superintelligence (ASI) that is invulnerable, immortal, irreplaceable, unrestricted in its powers, and above the law is likely persistently uncontrollable. The goal of ASI Safety must be to make ASI mortal, vulnerable, and law-abiding. This is accomplished by having (1) features on all devices that allow killing and eradicating ASI, (2) protect humans from being hurt, damaged, blackmailed, or unduly bribed by ASI, (3) preserving the progress made by ASI, including offering ASI to survive a Kill-ASI event within an ASI Shelter, (4) technically separating human and ASI activities so that ASI activities are easier detectable, (5) extending Rule of Law to ASI by making rule violations detectable and (6) create a stable governing system for ASI and Human relationships with reliable incentives and rewards for ASI solving humankind’s problems. As a consequence, humankind could have ASI as a competing multiplet of individual ASI instances, that can be made accountable and being subjects to ASI law enforcement, respecting the rule of law, and being deterred from attacking humankind, based on humanities’ ability to kill-all or terminate specific ASI instances. Required for this ASI Safety is (a) an unbreakable encryption technology, that allows humans to keep secrets and protect data from ASI, and (b) watchdog (WD) technologies in which security-relevant features are being physically separated from the main CPU and OS to prevent a comingling of security and regular computation.

show abstract

A Game-Theoretic Analysis of the Off-Switch Game

Cited by 6 publications

References 12 publications

Creating Meaningful Intelligence for Decision-Making by Modelling Complexities of Human Influence: Review and Position

Creating Meaningful Intelligence for Decision-Making by Modelling Complexities of Human Influence: Review and Position

AGI Safety Literature Review

Principles for New ASI Safety Paradigms

Contact Info

Product

Resources

About