Abstract:This paper explores the potential for skepticism about artificial superintelligence to be used as a tool for political ends. Superintelligence is AI that is much smarter than humans. Superintelligence does not currently exist, but it has been proposed that it could someday be built, with massive and potentially catastrophic consequences. There is substantial skepticism about superintelligence, including whether it will be built, whether it would be catastrophic, and whether it is worth current attention. To da… Show more
“…As a secondary contribution, the link between ongoing work on AI safety and potential work mitigating these multi-agent failures incidentally answers an objection raised by AI risk skeptics that AI safety is "not worth current attention" and that the issues are "premature to worry about" [21]. This paper instead shows how failures due to multi-agent dynamics are critical in the present, as ML and superhuman narrow AI is being widely deployed, even given the (valid) arguments put forward by Yudkowsky [22] and Bostrom [7] for why a singleton AI is a more important source of existential risk.…”
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart's or Campbell's law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes.
“…As a secondary contribution, the link between ongoing work on AI safety and potential work mitigating these multi-agent failures incidentally answers an objection raised by AI risk skeptics that AI safety is "not worth current attention" and that the issues are "premature to worry about" [21]. This paper instead shows how failures due to multi-agent dynamics are critical in the present, as ML and superhuman narrow AI is being widely deployed, even given the (valid) arguments put forward by Yudkowsky [22] and Bostrom [7] for why a singleton AI is a more important source of existential risk.…”
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart's or Campbell's law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes.
“…These corporate actors often wield enormous resources and have a correspondingly large effect on the overall issue, either directly or by sponsoring industry-aligned think tanks, writers, and other intermediaries. At this time, there are only hints of such behavior by AI corporations, but the profitability of AI and other factors suggest the potential for much more [12].…”
Section: Actors and Audiencesmentioning
confidence: 99%
“…This practice was pioneered by the tobacco industry in the 1950s and has since been adopted by other industries including fossil fuels and industrial chemicals [10,11]. AI is increasingly important for corporate profits and thus could be a new area of anti-regulatory disinformation [12]. The history of corporate disinformation and the massive amounts of profit potentially at stake suggest that superintelligence disinformation campaigns could be funded at a large scale and could be a major factor in the overall issue.…”
Section: Introductionmentioning
confidence: 99%
“…There have been a number of attempts to reply to superintelligence misinformation in order to set the record straight [3][4][5][6][7][8][9]. However, to the best of the present author's knowledge, aside from a brief discussion in [12], there have been no efforts to examine the most effective ways of countering superintelligence misinformation. Given the potential importance of the matter, a more careful examination is warranted.…”
Section: Introductionmentioning
confidence: 99%
“…This paper synthesizes insights from these literatures and applies them to the particular circumstances of superintelligence. The paper is part of a broader effort to develop the social science of superintelligence by leveraging insights from other issues [12,16].…”
Superintelligence is a potential type of future artificial intelligence (AI) that is significantly more intelligent than humans in all major respects. If built, superintelligence could be a transformative event, with potential consequences that are massively beneficial or catastrophic. Meanwhile, the prospect of superintelligence is the subject of major ongoing debate, which includes a significant amount of misinformation. Superintelligence misinformation is potentially dangerous, ultimately leading bad decisions by the would-be developers of superintelligence and those who influence them. This paper surveys strategies to counter superintelligence misinformation. Two types of strategies are examined: strategies to prevent the spread of superintelligence misinformation and strategies to correct it after it has spread. In general, misinformation can be difficult to correct, suggesting a high value of strategies to prevent it. This paper is the first extended study of superintelligence misinformation. It draws heavily on the study of misinformation in psychology, political science, and related fields, especially misinformation about global warming. The strategies proposed can be applied to lay public attention to superintelligence, AI education programs, and efforts to build expert consensus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.