2021
DOI: 10.48550/arxiv.2112.00861
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A General Language Assistant as a Laboratory for Alignment

Abstract: Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. Next… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
57
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 36 publications
(75 citation statements)
references
References 3 publications
1
57
1
Order By: Relevance
“…Pile-CC: Papers that train models on datasets that include the Pile-CC subset of the Pile include Luo et al [2021], Kharya and Alvi [2021], Askell et al [2021]. While this data has likely been used by other researchers for various purposes, we are unaware of any uses that would be directly comparable.…”
Section: Motivation For Dataset Creationmentioning
confidence: 99%
See 4 more Smart Citations
“…Pile-CC: Papers that train models on datasets that include the Pile-CC subset of the Pile include Luo et al [2021], Kharya and Alvi [2021], Askell et al [2021]. While this data has likely been used by other researchers for various purposes, we are unaware of any uses that would be directly comparable.…”
Section: Motivation For Dataset Creationmentioning
confidence: 99%
“…While this data has likely been used by other researchers for various purposes, we are unaware of any uses that would be directly comparable. OpenWebText2: Papers that train models on datasets that include the OpenWebText2 subset of the Pile include Luo et al [2021], Kharya and Alvi [2021] FreeLaw: Papers that train models on datasets that include the FreeLaw subset of the Pile include Askell et al [2021]. While this data has likely been used by other researchers for various purposes, we are unaware of any uses that would be directly comparable.…”
Section: Motivation For Dataset Creationmentioning
confidence: 99%
See 3 more Smart Citations