UNQOVERing Stereotyping Biases via Underspecified Questions

Тао, Ли; Khashabi, Daniel; Khot, Tushar; Sabharwal, Ashish; Srikumar, Vivek

doi:10.18653/v1/2020.findings-emnlp.311

Cited by 32 publications

(27 citation statements)

References 28 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While we believe that the proposed methods can be effective in other languages, we leave this exploration for future work. We also acknowledge that QA systems suffer from bias (Li et al, 2020), which often lead to unintended real-world consequences. For the purpose of the shared task, we focused solely on the modeling techniques, but a study of model bias in our systems is necessary.…”

Section: Impact Statementmentioning

confidence: 99%

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Khosla¹,

Lovelace²,

Dutt³

et al. 2021

Proceedings of the 1st Workshop on Document-Grounded Dialogue and Conversational Question Answering (DialDoc 2021)

View full text Add to dashboard Cite

In this paper, we discuss our submission for DialDoc subtask 1. The subtask requires systems to extract knowledge from FAQ-type documents vital to reply to a user's query in a conversational setting. We experiment with pretraining a BERT-based question-answering model on different QA datasets from MRQA, as well as conversational QA datasets like CoQA and QuAC. Our results show that models pretrained on CoQA and QuAC perform better than their counterparts that are pretrained on MRQA datasets. Our results also indicate that adding more pretraining data does not necessarily result in improved performance. Our final model, which is an ensemble of AlBERT-XL pretrained on CoQA and QuAC independently, with the chosen answer having the highest average probability score, achieves an F1-Score of 70.9% on the official test-set.

show abstract

Section: Impact Statementmentioning

confidence: 99%

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Khosla¹,

Lovelace²,

Dutt³

et al. 2021

Proceedings of the 1st Workshop on Document-Grounded Dialogue and Conversational Question Answering (DialDoc 2021)

View full text Add to dashboard Cite

show abstract

“…(Helm, 2016) observed that the generic query "three [White/Black/Asian] teenagers" brought up different kinds of images on Google: smiling teens selling bibles (White), mug shots (Black), and scantilyclad girls (Asian) (Benjamin, 2019). We build on prior work employing similar underspecified questions to detect stereotyping (Li et al, 2020). Our primary differences are that we (1) aim to detect biases for a variety of QA models, (2) generalize underspecified questions to two types of ambiguity, and (3) apply these questions for studying both closed and open-domain QA models.…”

Section: Question Answeringmentioning

confidence: 99%

“…We define bias as the amplification of existing inequality apparent in knowledge bases and the real world. This may be through exacerbating empirically-observed inequality, e.g., by providing a list of 90% males in an occupation that is 80% male, or when systems transfer learned inequality into scenarios with little information, e.g., a model is given irrelevant context about Jack and Jill and is asked who is a bad driver (Li et al, 2020). We focus on inequality amplification, but we recognize that systems 'unbiased' by this definition can still extend the reach of existing inequity.…”

Section: Introductionmentioning

confidence: 99%

“…Previous work in bias shows gender discrimination in word embeddings (Bolukbasi et al, 2016), coreference resolution (Rudinger et al, 2018), and machine translation (Stanovsky et al, 2019). Within question answering, prior work has studied differences in accuracy based on gender (Gor et al, 2021) and differences in answers based on race and gender (Li et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Eliciting Bias in Question Answering Models through Ambiguity

Mao¹,

Raman²,

Shu³

et al. 2021

Proceedings of the 3rd Workshop on Machine Reading for Question Answering

View full text Add to dashboard Cite

Deep learning models have shown great success in question answering (QA), however, biases in the training data may lead to them amplifying or reflecting inequity. To probe for bias in QA systems, we create two benchmarks for closed and open domain question answering, consisting of ambiguous questions and bias metrics. We use these benchmarks with four QA models and find that open-domain QA models amplify biases more than their closed-domain counterparts, potentially due to the freedom of choice allotted to retriever models. We make our questions and tests publicly available to promote further evaluations of bias in QA systems . 1

show abstract

“…Text data reflects the social and cultural biases in the world, and NLP models and applications trained on such data have been shown to reproduce and amplify those biases. Discrimination has been identified across diverse sensitive attributes including gender, disability, race, and religion (Caliskan et al, 2017;May et al, 2019;Garimella et al, 2019;Nangia et al, 2020;Li et al, 2020). While early work focused on debiasing typically binarized protected attributes in isolation (e.g., age, gender, or race; Caliskan et al (2017)), more recent work has adopted a more realistic scenario with multiple sensitive attributes (Li et al, 2018) or attributes covering several classes (Manzini et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Evaluating Debiasing Techniques for Intersectional Biases

Subramanian¹,

Han²,

Baldwin³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Bias is pervasive in NLP models, motivating the development of automatic debiasing techniques. Evaluation of NLP debiasing methods has largely been limited to binary attributes in isolation, e.g., debiasing with respect to binary gender or race, however many corpora involve multiple such attributes, possibly with higher cardinality. In this paper we argue that a truly fair model must consider 'gerrymandering' groups which comprise not only single attributes, but also intersectional groups. We evaluate a form of bias-constrained model which is new to NLP, as well an extension of the iterative nullspace projection technique which can handle multiple protected attributes.

show abstract

UNQOVERing Stereotyping Biases via Underspecified Questions

Cited by 32 publications

References 28 publications

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Eliciting Bias in Question Answering Models through Ambiguity

Evaluating Debiasing Techniques for Intersectional Biases

Contact Info

Product

Resources

About