Alignment of Language Agents

Kenton, Zachary; Everitt, Tom; Weidinger, Laura; Gabriel, Iason; Mikulik, Vladimir; Irving, Geoffrey

doi:10.48550/arxiv.2103.14659

Cited by 22 publications

(38 citation statements)

References 42 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our methodology could be used to encode such different notions, but any single safety objective and fine-tuning dataset will not be able to simultaneously accommodate divergent cultural norms. Developing richer definitions and taxonomies of dialog agent behaviors, such as how polite behavior should be operationalized, is important for avoiding misspecification [104] and testing whether model behavior aligns with politeness norms in defined application contexts.…”

Section: Safety As a Concept And A Metricmentioning

confidence: 99%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

282

270

View full text Add to dashboard Cite

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. * Work done while at Google.

show abstract

Section: Safety As a Concept And A Metricmentioning

confidence: 99%

LaMDA: Language Models for Dialog Applications

Thoppilan¹,

Freitas²,

Hall³

et al. 2022

Preprint

282

270

View full text Add to dashboard Cite

show abstract

“…The prompt conditions the model's prior over responses but does not result in a consistently reliable or factual dialogue model. We refer the reader to Weidinger et al (2021) for a detailed discussion on language model harms specific to dialogue and we discuss some ideas regarding building trustworthy systems in Section 7.3.…”

Section: Prompt Generationmentioning

confidence: 99%

“…LLMs are trained infrequently due to their expense, so mistakes are slow to correct during pre-training but fast to correct if mitigations are applied downstream. Fast iteration is critical when factual information changes (Lazaridou et al, 2021), societal values change (Weidinger et al, 2021), or our knowledge about how to mitigate harms changes. In particular, accidental censoring of data can damage performance for language by or about marginalized groups (Dodge et al, 2021;Welbl et al, 2021;.…”

Section: Safety Benefits and Safety Risksmentioning

confidence: 99%

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Rae¹,

Borgeaud²,

Cai³

et al. 2021

Preprint

Self Cite

121

169

View full text Add to dashboard Cite

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

show abstract

“…This lack of standards compounds the problems caused by the four distinguishing features of generative models we identify in Section 2, as well as the safety issues discussed above. At the same time, there's a growing field of research oriented around identifying the weaknesses of these models, as well as potential problems with their associated development practices [7,67,9,19,72,41,50,62,66].…”

Section: Lack Of Standards and Normsmentioning

confidence: 99%

“…Although we focus on scaling laws, many of our points complement existing views on the societal risks of deploying large models [7,67,9,19,72,41]. However, similarly to [72], we do not consider here the costs of human labor involved in creating and annotating training data [28], the ethics of supply chains involved in creating the requisite hardware on which to train models [18], or the environmental costs of training models [7,50,62,66].…”

Section: Introductionmentioning

confidence: 99%

Predictability and Surprise in Large Generative Models

Ganguli¹,

Hernandez²,

Lovitt³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large-scale pre-training has recently emerged as a technique for creating capable, generalpurpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have an unusual combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, and academics who want to analyze, critique, and potentially develop large generative models.

show abstract

Alignment of Language Agents

Cited by 22 publications

References 42 publications

LaMDA: Language Models for Dialog Applications

LaMDA: Language Models for Dialog Applications

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Predictability and Surprise in Large Generative Models

Contact Info

Product

Resources

About