2022
DOI: 10.48550/arxiv.2202.04173
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Abstract: Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
0
0
0
Order By: Relevance
“…Perplexity-a measure of how well a language model predicts the next word in a sequence [82][83][84][85][86][87][88]. A lower perplexity value indicates that the language model is better at predicting the subsequent word (9).…”
Section: Evaluation Criteria In Language Modelingmentioning
confidence: 99%
“…Perplexity-a measure of how well a language model predicts the next word in a sequence [82][83][84][85][86][87][88]. A lower perplexity value indicates that the language model is better at predicting the subsequent word (9).…”
Section: Evaluation Criteria In Language Modelingmentioning
confidence: 99%