Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.311
|View full text |Cite
|
Sign up to set email alerts
|

UNQOVERing Stereotyping Biases via Underspecified Questions

Abstract: Warning: This paper contains examples of stereotypes that are potentially offensive.While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UN-QOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naïve use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(27 citation statements)
references
References 28 publications
(35 reference statements)
0
27
0
Order By: Relevance
“…While we believe that the proposed methods can be effective in other languages, we leave this exploration for future work. We also acknowledge that QA systems suffer from bias (Li et al, 2020), which often lead to unintended real-world consequences. For the purpose of the shared task, we focused solely on the modeling techniques, but a study of model bias in our systems is necessary.…”
Section: Impact Statementmentioning
confidence: 99%
“…While we believe that the proposed methods can be effective in other languages, we leave this exploration for future work. We also acknowledge that QA systems suffer from bias (Li et al, 2020), which often lead to unintended real-world consequences. For the purpose of the shared task, we focused solely on the modeling techniques, but a study of model bias in our systems is necessary.…”
Section: Impact Statementmentioning
confidence: 99%
“…(Helm, 2016) observed that the generic query "three [White/Black/Asian] teenagers" brought up different kinds of images on Google: smiling teens selling bibles (White), mug shots (Black), and scantilyclad girls (Asian) (Benjamin, 2019). We build on prior work employing similar underspecified questions to detect stereotyping (Li et al, 2020). Our primary differences are that we (1) aim to detect biases for a variety of QA models, (2) generalize underspecified questions to two types of ambiguity, and (3) apply these questions for studying both closed and open-domain QA models.…”
Section: Question Answeringmentioning
confidence: 99%
“…We define bias as the amplification of existing inequality apparent in knowledge bases and the real world. This may be through exacerbating empirically-observed inequality, e.g., by providing a list of 90% males in an occupation that is 80% male, or when systems transfer learned inequality into scenarios with little information, e.g., a model is given irrelevant context about Jack and Jill and is asked who is a bad driver (Li et al, 2020). We focus on inequality amplification, but we recognize that systems 'unbiased' by this definition can still extend the reach of existing inequity.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Text data reflects the social and cultural biases in the world, and NLP models and applications trained on such data have been shown to reproduce and amplify those biases. Discrimination has been identified across diverse sensitive attributes including gender, disability, race, and religion (Caliskan et al, 2017;May et al, 2019;Garimella et al, 2019;Nangia et al, 2020;Li et al, 2020). While early work focused on debiasing typically binarized protected attributes in isolation (e.g., age, gender, or race; Caliskan et al (2017)), more recent work has adopted a more realistic scenario with multiple sensitive attributes (Li et al, 2018) or attributes covering several classes (Manzini et al, 2019).…”
Section: Introductionmentioning
confidence: 99%