A Survey of Race, Racism, and Anti-Racism in NLP

Field, Anjalie; Blodgett, Su Lin; Waseem, Zeerak; Tsvetkov, Yulia

doi:10.18653/v1/2021.acl-long.149

Cited by 41 publications

(40 citation statements)

References 108 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We make use of pretrained language models to both generate and retrieve text in this work. Representations from pretrained language models are known to cause ethical concerns, such as perpetuating racial or gender bias (Field et al, 2021;Gala et al, 2020). We advise using caution and adopting a post-processing strategy to filter potentially offensive text produced by pretrained language models before releasing text content to users.…”

Section: Ethical Considerationsmentioning

confidence: 99%

Modeling Exemplification in Long-form Question Answering via Retrieval

Wang¹,

Xu²,

Thompson³

et al. 2022

Preprint

View full text Add to dashboard Cite

Exemplification is a process by which writers explain or clarify a concept by providing an example. While common in all forms of writing, exemplification is particularly useful in the task of long-form question answering (LFQA), where a complicated answer can be made more understandable through simple examples. In this paper, we provide the first computational study of exemplification in QA, performing a fine-grained annotation of different types of examples (e.g., hypotheticals, anecdotes) in three corpora. We show that not only do state-of-the-art LFQA models struggle to generate relevant examples, but also that standard evaluation metrics such as ROUGE are insufficient to judge exemplification quality. We propose to treat exemplification as a retrieval problem in which a partially-written answer is used to query a large set of human-written examples extracted from a corpus. Our approach allows a reliable ranking-type automatic metrics that correlates well with human evaluation. A human evaluation shows that our model's retrieved examples are more relevant than examples generated from a state-of-the-art LFQA model.

show abstract

Section: Ethical Considerationsmentioning

confidence: 99%

Modeling Exemplification in Long-form Question Answering via Retrieval

Wang¹,

Xu²,

Thompson³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Unsurprisingly, gendered and racial disparities have been documented in a number of language technologies [37,79,141], and processes of creating resources and technologies may further entrench such disparities [25,38,144]. For more detail see [53].…”

Section: Social Context: Social Variation and Language Discriminationmentioning

confidence: 99%

Data Governance in the Age of Large-Scale Data-Driven Language Technology

Jernite,

Nguyen,

Biderman

et al. 2022

Preprint

View full text Add to dashboard Cite

The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distributed governance that accounts for human values and grounded by an international research collaboration that brings together researchers and practitioners from 60 countries. The framework we present is a multi-party international governance structure focused on language data, and incorporating technical and organizational tools needed to support its work.

show abstract

“…The risks of models replicating or worsening harmful biases may grow as we train on ever larger data samples (Bender et al, 2021). Training models on data with representational issues can lead them to treat particular demographic groups unfairly and/or poorly (Barocas et al, 2017;Mehrabi et al, 2021), a problem that is particularly egregious for historically marginalized groups, including people of color (Field et al, 2021), and women (Hendricks et al, 2018). For example, models learned the stereotype that "women like shopping" when they were trained on data where most or all of shoppers are women, and they learned * Equal contribution.…”

Section: Introductionmentioning

confidence: 99%

Perturbation Augmentation for Fairer NLP

Qian¹,

Ross²,

Fernandes³

et al. 2022

Preprint

View full text Add to dashboard Cite

Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask: does training on demographically perturbed data lead to more fair language models? We collect a large dataset of human annotated text perturbations and train an automatic perturber on it, which we show to outperform heuristic alternatives. We find: (i) Language models (LMs) pre-trained on demographically perturbed corpora are more fair, at least, according to our current best metrics for measuring model fairness, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks. We find that improved fairness does not come at the expense of accuracy. Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this initial exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.

show abstract

A Survey of Race, Racism, and Anti-Racism in NLP

Cited by 41 publications

References 108 publications

Modeling Exemplification in Long-form Question Answering via Retrieval

Modeling Exemplification in Long-form Question Answering via Retrieval

Data Governance in the Age of Large-Scale Data-Driven Language Technology

Perturbation Augmentation for Fairer NLP

Contact Info

Product

Resources

About