2017
DOI: 10.1007/978-3-319-63703-7_16
|View full text |Cite
|
Sign up to set email alerts
|

A Game-Theoretic Analysis of the Off-Switch Game

Abstract: The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot's best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…Linguistic Inquiry and Word Count -LIWC or Structured Programming for Linguistic Cue Extraction -SPLICE) [70]. Irrationality is also a subject of research [71], [72] as are persuasiveness, or superstition in heuristics [73]. Modelling and simulating knowledge characteristics is also essential for representing humans in SDM.…”
Section: Modelling Of Cognitive and Behavioural Aspectsmentioning
confidence: 99%
“…Linguistic Inquiry and Word Count -LIWC or Structured Programming for Linguistic Cue Extraction -SPLICE) [70]. Irrationality is also a subject of research [71], [72] as are persuasiveness, or superstition in heuristics [73]. Modelling and simulating knowledge characteristics is also essential for representing humans in SDM.…”
Section: Modelling Of Cognitive and Behavioural Aspectsmentioning
confidence: 99%
“…In the CIRL framework (Hadfield-Menell, Dragan, et al, 2016), agents are uncertain about their reward function, and learn about the reward function through interaction with a human expert. Under some assumptions on the human's rationality and the agent's level of uncertainty, this leads to naturally corrigible agents Wängberg et al, 2017). Essentially, the agent will interpret the human's act of shutting them down as evidence that being turned off has higher reward than remaining turned on.…”
Section: Corrigibilitymentioning
confidence: 99%
“…The core idea is to have ASI respect the law, based on deterrence, i.e., humankind's ability to switch off any AI/ASI. The idea of shutting down ASI was discussed as part of a game-theoretical concept [1], [2], but it was at that time not fully understood how making ASI vulnerable via the off-or Kill-switch could turn into deterring ASI to behave with more considerations for human interests. within these papers would not apply for situations when ASI Safety is provided by measures within the environment in which ASI software operates.…”
Section: Introductionmentioning
confidence: 99%