2022
DOI: 10.1007/s10458-022-09575-5
|View full text |Cite
|
Sign up to set email alerts
|

Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)

Abstract: The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and arg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 74 publications
0
7
0
Order By: Relevance
“…Thirdly, in the context of modeling human values, this approach might sometimes be more consistent with human value processing [28]. At almost any level of analysis possible, human intelligence is multi-objective [32]. Biological life uses a set of multi-objective homeostatic systems to prioritize acquiring resources that are needed most given the organism's state [24].…”
Section: Design Principles Research Contextmentioning
confidence: 99%
“…Thirdly, in the context of modeling human values, this approach might sometimes be more consistent with human value processing [28]. At almost any level of analysis possible, human intelligence is multi-objective [32]. Biological life uses a set of multi-objective homeostatic systems to prioritize acquiring resources that are needed most given the organism's state [24].…”
Section: Design Principles Research Contextmentioning
confidence: 99%
“…We believe it is reasonable to use MARL as a first step in exploring the use of AI tools to study multi-person social dilemmas. The current model for reinforcement learning suggests that reward maximization is sufficient to drive behavior that exhibits abilities studied in the human cooperation and social dilemmas, including "knowledge, learning, perception, social intelligence, language, generalization and imitation" (Yang, 2021;Silver et al, 2021;Vamplew et al, 2022). The justification for this claim is deeply rooted in the von Neumann Morgenstern utility theory (von Neumann and Morgenstern, 2007), which is the basis for the well-known expected utility theory (Schoemaker, 2013) and essentially states that it is safe to assume an intelligent entity will always make decisions according to the highest expected utility in any complex scenarios 1 (Yang, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Reinforcement learning (RL) is a mechanism for an agent to maximize expected reward by calibrating behavior to match behaviors that have been reinforced with reward (or punishment) in the past ( Sutton et al, 1992 ). RL has directly measurable signals in neural circuitry ( Schultz et al, 1997 ), has been foundational for the development of our understanding of human learning in general ( Shteingart & Loewenstein, 2014 ), and not only underpins human learning but also seems fundamental for the development of human-level artificial general intelligence ( Ide et al, 2022 ; Silver et al, 2021 ; Vamplew et al, 2022 ). RL is also important in the development of appropriate response inhibition, which plays a key role in goal-directed behavior ( Berkman, 2018 ; Verbruggen & Logan, 2008 ), psychopathological conditions ( Howlett et al, 2023 ), and in inhibitory response training for reducing unhealthy food intake ( Houben, 2011 ; Lawrence et al, 2015 ).…”
Section: Introductionmentioning
confidence: 99%