Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH) 2022
DOI: 10.18653/v1/2022.woah-1.20
|View full text |Cite
|
Sign up to set email alerts
|

Flexible text generation for counterfactual fairness probing

Abstract: A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for gener… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 21 publications
(27 reference statements)
0
2
0
Order By: Relevance
“…Large language models (LLMs) are becoming ubiquitous for their ability to solve a wide range of linguistic tasks with prompting that does not require additional model training [1,6,22]. This ability also lets them generate smaller, more refined datasets for finetuning [13,25,27], benchmarking [29], low-resource tasks or languages [4,15], and counterfactual testing (e.g., examples that are identical except for having different religious or gender-based identities [12]).…”
Section: Introductionmentioning
confidence: 99%
“…Large language models (LLMs) are becoming ubiquitous for their ability to solve a wide range of linguistic tasks with prompting that does not require additional model training [1,6,22]. This ability also lets them generate smaller, more refined datasets for finetuning [13,25,27], benchmarking [29], low-resource tasks or languages [4,15], and counterfactual testing (e.g., examples that are identical except for having different religious or gender-based identities [12]).…”
Section: Introductionmentioning
confidence: 99%
“…We address these limitations of human red teaming with a "plug-and-play" AI-assisted Red Teaming (AART) pipeline for generating adversarial testing datasets at scale by minimizing the human effort to only guide the adversarial generation recipe. Our work builds on recent automated red teaming (Perez et al, 2022), synthetic safety data generation (Fryer et al, 2022;Hartvigsen et al, 2022;Bai et al, 2022;Sun et al, 2023) and human-in-theloop methods . We adapt work on self-consistency (Wang et al, 2023a), chain-ofthought (Kojima et al, 2023Wei et al, 2022), and structured reasoning and data generation (Wang et al, 2023b;Xu et al, 2023;Creswell and Shanahan, 2022) and creatively apply them to the task of adversarial dataset creation.…”
Section: Introductionmentioning
confidence: 99%