Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

Yarden, Tal,; Inbal, Magar,; Schwartz, Roy

doi:10.18653/v1/2022.gebnlp-1.13

Cited by 4 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, it is interesting to observe that social biases do not necessarily increase with this extra capacity of the MLMs. In the case of gender-related biases, Tal et al (2022) showed that even if the gender bias scores measured on Winogender (Rudinger et al, 2018) are smaller for the larger MLMs, they make more stereotypical errors with respect to gender. However, whether this observation generalises to all types of social biases remains an open question.…”

Section: Discussionmentioning

confidence: 99%

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Zhou,

Camacho-Collados,

Bollegala

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Various types of social biases have been reported with pretrained Masked Language Models (MLMs) in prior work. However, multiple underlying factors are associated with an MLM such as its model size, size of the training data, training objectives, the domain from which pretraining data is sampled, tokenization, and languages present in the pretrained corpora, to name a few. It remains unclear as to which of those factors influence social biases that are learned by MLMs. To study the relationship between model factors and the social biases learned by an MLM, as well as the downstream task performance of the model, we conduct a comprehensive study over 39 pretrained MLMs covering different model sizes, training objectives, tokenization methods, training data domains and languages. Our results shed light on important factors often neglected in prior literature, such as tokenization or model objectives.

show abstract

Section: Discussionmentioning

confidence: 99%

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

Zhou,

Camacho-Collados,

Bollegala

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…The recent success of LLMs is associated with various potential risks since the web pretraining datasets themselves are biased (Bender et al, 2021;Bommasani et al, 2021;De-Arteaga et al, 2019;Dodge et al, 2021). ; Tal et al (2022) show that the risk of biases gets higher with the increase of the model size, causing biases to resurface during the downstream tasks such as NLI (Poliak et al, 2018;Sharma et al, 2021), coreference resolution (Rudinger et al, 2018;Zhao et al, 2018), and MT (Stanovsky et al, 2019). A number of ethical considerations related to PLMs have been studied, including memorizing and revealing private information (Carlini et al, 2022), or spreading misinformation (Weidinger et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

JASMINE: Arabic GPT Models for Few-Shot Learning

Billah Nagoudi,

Abdul-Mageed,

Elmadany

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Scholarship on generative pretraining (GPT) remains acutely Anglocentric, leaving serious gaps in our understanding of the whole class of autoregressive models. For example, we have little knowledge about the potential of these models and their societal impacts in diverse linguistic and cultural settings. We alleviate this issue for Arabic, a wide collection of languages and dialectal varieties with ∼ 450 million population, by introducing JASMINE. JASMINE is a suite of powerful Arabic autoregressive Transformer language models ranging in size between 300 million-6.7 billion parameters pretrained on a large and diverse dataset (∼ 235GB of text). We also carefully design and release a comprehensive benchmark for both automated and human evaluation of Arabic autoregressive models, with coverage of potential social biases, harms, and toxicity. Using our novel benchmark, we evaluate JAS-MINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them.

show abstract

“…Finally, while the interplay and tradeoff between privacy, efficiency, and fairness in tabular data has received extensive examination (Hooker et al, 2020;Lyu et al, 2020) comparatively fewer studies have been conducted in NLP (Tal et al, 2022;Ahn et al, 2022;Hessenthaler et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

Fairness in Language Models Beyond English: Gaps and Challenges

Ramesh¹,

Sitaram²,

Choudhury³

2023

Preprint

View full text Add to dashboard Cite

With language models becoming increasingly ubiquitous, it has become essential to address their inequitable treatment of diverse demographic groups and factors. Most research on evaluating and mitigating fairness harms has been concentrated on English, while multilingual models and non-English languages have received comparatively little attention. This paper presents a survey of fairness in multilingual and non-English contexts, highlighting the shortcomings of current research and the difficulties faced by methods designed for English. We contend that the multitude of diverse cultures and languages across the world makes it infeasible to achieve comprehensive coverage in terms of constructing fairness datasets. Thus, the measurement and mitigation of biases must evolve beyond the current datasetdriven practices that are narrowly focused on specific dimensions and types of biases and, therefore, impossible to scale across languages and cultures.

show abstract

Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

Cited by 4 publications

References 14 publications

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models

JASMINE: Arabic GPT Models for Few-Shot Learning

Fairness in Language Models Beyond English: Gaps and Challenges

Contact Info

Product

Resources

About