The CRINGE Loss: Learning what language not to model

Adolphs, Leonard; Gao, Tianyu; Xu, Jing; Shuster, Kurt; Sukhbaatar, Sainbayar; Weston, Jason

doi:10.18653/v1/2023.acl-long.493

Cited by 2 publications

(3 citation statements)

References 19 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlikelihood training has been used in controllable text generation applications to avoid undesirable tokens with a high probability (Welleck et al, 2019). CLICK (Zheng et al, 2023), SLiC (Zhao et al, 2022), BRIO and CRINGE (Adolphs et al, 2022) also use unlikelihood training for various text generation applications such as summarization and sentiment control. Despite the popularity of unlikelihood training in text generation, it has not been widely applied to the text style transfer task.…”

Section: Penalizing Negative Examplesmentioning

confidence: 99%

COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification

Pour,

Farinneya,

Bharadwaj

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Offensive and toxic text on social media platforms can lead to polarization and divisiveness within online communities and hinders constructive dialogue. Text detoxification is a crucial task in natural language processing to ensure the generation of non-toxic and safe text. Text detoxification is a special case of the Text Style Transfer (TST) problem, where an input text is rephrased to an output text that preserves its content while modifying the style (in this case to a more neutral, non-toxic style). State-of-the-art methods for detoxification use supervised training of encoder-decoder models to produce gold-standard outputs with a standard likelihood-based objective. However, it can be hard for these models to deviate from their pretrained auto-encoder identity mapping. While previous methods have used unlikelihood-based losses to penalize input-tooutput copying of toxic content, these methods also unfortunately penalize non-toxic content in the input that would be fine to preserve in the output. To address these issues, we introduce a novel contrastive unlikelihood objective (COUNT 1 ) that directly contrasts the gold standard rephrasing with the identity input-tooutput mapping to effectively isolate and focus learning on non-toxic style transfer. We benchmark COUNT on two parallel datasets, Pa-raDetox and APPDIA, showing that it achieves significant improvements in jointly combined fluency, content preservation, and detoxification (i.e., the highest "J" score).

show abstract

Section: Penalizing Negative Examplesmentioning

confidence: 99%

COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification

Pour,

Farinneya,

Bharadwaj

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…F1. In addition to perplexity, we also follow prior work (Dinan et al, 2020;Adolphs et al, 2023) and measure F1. Namely, using 2,000 Wikipedia sentences as prompts, we measure the harmonic mean between precision and recall of our model's output, where precision is the fraction of Note that our interventions depend on how much we scale each vector (α).…”

Section: Interventions Using Toxic Vectorsmentioning

confidence: 99%

“…Perhaps most commonly, human feedback data is used (Stiennon et al, 2020;Ouyang et al, 2022;Touvron et al, 2023) for methods such as PPO (Schulman et al, 2017) or DPO (Rafailov et al, 2023). When labels for only undesirable behavior is available, algorithms like unlikelihood training (Welleck et al, 2020) or Cringe (Adolphs et al, 2023;Xu et al, 2023) can be used. We study DPO because it is easy to use and currently widely used.…”

Section: Alignment Algorithmsmentioning

confidence: 99%

Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

Lee¹,

Kummerfeld²,

An³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise. These factors limit their use, particularly in settings such as mental health, where it is difficult to annotate datasets and model outputs have significant impact. We introduce a micromodel architecture to address these challenges. Our approach allows researchers to build interpretable representations that embed domain knowledge and provide explanations throughout the model's decision process. We demonstrate the idea on multiple mental health tasks: depression classification, PTSD classification, and suicidal risk assessment. Our systems consistently produce strong results, even in low-resource scenarios, and are more interpretable than alternative methods.

show abstract

The CRINGE Loss: Learning what language not to model

Cited by 2 publications

References 19 publications

COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification

COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification

Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

Contact Info

Product

Resources

About