2021
DOI: 10.48550/arxiv.2112.04359
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ethical and social risks of harm from Language Models

Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 105 publications
(120 citation statements)
references
References 158 publications
(233 reference statements)
0
90
0
1
Order By: Relevance
“…As described in Section 2, open-endedness combined with smooth, general capability scaling and the abrupt scaling of specific capabilities, is likely to lead to safety issues [72,9] that are found after a model has been developed and deployed. Additionally, these models also possess known (pre-deployment) safety issues for which we lack robust solutions [33] (e.g, How do you ensure the system does not generate inappropriate and harmful outputs, such as making overtly sexist or racist comments [65]?…”
Section: Safetymentioning
confidence: 99%
See 2 more Smart Citations
“…As described in Section 2, open-endedness combined with smooth, general capability scaling and the abrupt scaling of specific capabilities, is likely to lead to safety issues [72,9] that are found after a model has been developed and deployed. Additionally, these models also possess known (pre-deployment) safety issues for which we lack robust solutions [33] (e.g, How do you ensure the system does not generate inappropriate and harmful outputs, such as making overtly sexist or racist comments [65]?…”
Section: Safetymentioning
confidence: 99%
“…This lack of standards compounds the problems caused by the four distinguishing features of generative models we identify in Section 2, as well as the safety issues discussed above. At the same time, there's a growing field of research oriented around identifying the weaknesses of these models, as well as potential problems with their associated development practices [7,67,9,19,72,41,50,62,66].…”
Section: Lack Of Standards and Normsmentioning
confidence: 99%
See 1 more Smart Citation
“…Large language models (LMs) can be "prompted" to perform a range of natural language processing (NLP) tasks, given some examples of the task as input. However, these models often express unintended behaviors such as making up facts, generating biased or toxic text, or simply not following user instructions (Bender et al, 2021;Bommasani et al, 2021;Kenton et al, 2021;Weidinger et al, 2021;Tamkin et al, 2021;Gehman et al, 2020). This is because the language modeling objective Win rate against SFT 175B Model PPO-ptx PPO SFT GPT (prompted) GPT…”
Section: Introductionmentioning
confidence: 99%
“…Some work adversarially prompts models to leak training data (Carlini et al, 2020), or output specific content (Wallace et al, 2019;Carlini et al, 2020). And a final line of work identifies additional potential failures of current and future machine learning systems (Bender et al, 2021;Bommasani et al, 2021;Weidinger et al, 2021).…”
Section: Related Workmentioning
confidence: 99%