Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

Laugier, Léo; Pavlopoulos, John; Sorensen, Jeffrey; Dixon, Lucas

doi:10.18653/v1/2021.eacl-main.124

Cited by 21 publications

(31 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Research has also been conducted to investigate annotation bias and annotator pools (Al Kuwatly et al, 2020;Waseem, 2016;Ross et al, 2017;Shmueli et al, 2021;Posch et al, 2018), as well as bias (especially racial) in existing datasets (Davidson et al, 2019b;Laugier et al, 2021). It was found that data can reflect and propagate annotator bias.…”

Section: Related Workmentioning

confidence: 99%

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Maronikolakis¹,

Wisiorek²,

Nann³

et al. 2022

Preprint

View full text Add to dashboard Cite

Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. ( 2020)), we present XTREMESPEECH, 1 a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data -as opposed to giving companies and governments control over defining and combatting hate speech. This inclusive approach results in datasets more representative of actually occurring online speech and is likely to facilitate the removal of the social media content that marginalized communities view as causing the most harm. Based on XTREMESPEECH, we establish novel tasks with accompanying baselines, provide evidence that cross-country training is generally not feasible due to cultural differences between countries and perform an interpretability analysis of BERT's predictions.

show abstract

Section: Related Workmentioning

confidence: 99%

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Maronikolakis¹,

Wisiorek²,

Nann³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A more recent work by Tran et al (2020) uses a pipeline of models: a search engine finds non-toxic sentences similar to the given toxic ones, an MLM fills the gaps that were not matched in the found sentences, and a seq2seq model edits the generated sentence to make it more fluent. Finally, Laugier et al (2021) detoxify sentences by fine-tuning T5 as a denoising autoencoder with additional cycle-consistency loss. Dathathri et al (2020) and Krause et al (2020) approach a similar problem: preventing a language model from generating toxic text.…”

Section: Related Workmentioning

confidence: 99%

“…J is computed as the average of their sentence-level product. In addition to that, we tried a similar aggregated metric GM (Pang and Gimpel, 2019;Laugier et al, 2021) which uses perplexity as the measure of fluency and employs a different aggregation method.…”

Section: Metricsmentioning

confidence: 99%

“…The task of automatic rewriting of offensive content attracted less attention, yet it may find various useful applications such as making online world a better place by suggesting to a user posting a more neutral version of an emotional comment. The existing works on text detoxification (dos Santos et al, 2018;Tran et al, 2020;Laugier et al, 2021) cast this task as style transfer. The style transfer task is generally understood as rewriting of text with the same content and with altering of one or several attributes which constitute the "style", such as authorship (Voigt et al, 2018), sentiment (Shen et al, 2017), or degree of politeness (Madaan et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Text Detoxification using Large Pre-trained Neural Models

Dale¹,

Voronov²,

Dementieva³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small styleconditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first largescale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

show abstract

“…In controlled text generation, progress has been made in removing toxic behavior while maximizing fluency (Dathathri et al, 2019). In style transfer, the meaning of a toxic sentence is mapped onto a non-toxic target sentence (Laugier et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Wang¹,

Sudhakar²,

Ji³

2021

Preprint

View full text Add to dashboard Cite

Large pre-trained language models are often trained on large volumes of internet data, some of which may contain toxic or abusive language. Consequently, language models encode toxic information, which makes the realworld usage of these language models limited. Current methods aim to prevent toxic features from appearing generated text. We hypothesize the existence of a low-dimensional toxic subspace in the latent space of pre-trained language models, the existence of which suggests that toxic features follow some underlying pattern and are thus removable. To construct this toxic subspace, we propose a method to generalize toxic directions in the latent space. We also provide a methodology for constructing parallel datasets using a context based word masking system. Through our experiments, we show that when the toxic subspace is removed from a set of sentence representations, almost no toxic representations remain in the result. We demonstrate empirically that the subspace found using our method generalizes to multiple toxicity corpora, indicating the existence of a low-dimensional toxic subspace.

show abstract

Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

Cited by 21 publications

References 40 publications

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

Text Detoxification using Large Pre-trained Neural Models

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Contact Info

Product

Resources

About