Societal Biases in Language Generation: Progress and Challenges

Sheng, Emily; Chang, Kai-Wei; Natarajan, Premkumar; Peng, Nanyun

doi:10.48550/arxiv.2105.04054

Cited by 5 publications

(7 citation statements)

References 72 publications

(74 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Distinguishing "statistical bias" from "social bias" Concerns regarding "bias" in language models generally revolve around distributional skews that result in unfavourable impacts for particular social groups (Sheng et al, 2021). We note that there are different definitions of "bias" and "discrimination" in classical statistics compared to sociotechnical studies.…”

Section: Discussionmentioning

confidence: 89%

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

104

View full text Add to dashboard Cite

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.

show abstract

Section: Discussionmentioning

confidence: 89%

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

104

View full text Add to dashboard Cite

show abstract

“…Before ChatGPT emerged, extensive academic research was conducted on the ethical risks associated with LLMs, particularly those involved in natural language generation (NLG). These investigations delved into potential societal impacts, highlighting concerns ranging from the confident distribution of inaccurate information to the creation of widespread false news and information [20,21]. With the advent of ChatGPT, concerns have increased, as this study highlighted its risks [22].…”

Section: Large Language Modelsmentioning

confidence: 91%

AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture

Alshammari,

El-Sayed,

Elleithy

2024

BDCC

View full text Add to dashboard Cite

The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written texts (HWTs), an area where existing AI detectors have demonstrated significant limitations. To achieve this goal, this paper utilized and fine-tuned two Transformer-based models, AraELECTRA and XLM-R, by training them on two distinct datasets: a large dataset comprising 43,958 examples and a custom dataset with 3078 examples that contain HWT and AI-generated texts (AIGTs) from various sources, including ChatGPT 3.5, ChatGPT-4, and BARD. The proposed architecture is adaptable to any language, but this work evaluates these models’ efficiency in recognizing HWTs versus AIGTs in Arabic as an example of Semitic languages. The performance of the proposed models has been compared against the two prominent existing AI detectors, GPTZero and OpenAI Text Classifier, particularly on the AIRABIC benchmark dataset. The results reveal that the proposed classifiers outperform both GPTZero and OpenAI Text Classifier with 81% accuracy compared to 63% and 50% for GPTZero and OpenAI Text Classifier, respectively. Furthermore, integrating a Dediacritization Layer prior to the classification model demonstrated a significant enhancement in the detection accuracy of both HWTs and AIGTs. This Dediacritization step markedly improved the classification accuracy, elevating it from 81% to as high as 99% and, in some instances, even achieving 100%.

show abstract

“…LGMs relates to the social harms that arise from the model performing more poorly for some demographic groups, generating discriminatory speech, or further propagating discriminatory outcomes through the generated text [1,29].…”

Section: Fairness Inmentioning

confidence: 99%

“…Despite the ever-increasing power of LGMs in generating realistic and cohesive language, they are also susceptible to learning harmful language and encoding undesirable bias across identities that can retain and magnify harmful content and stereotypes [5,28,29,33]. This reality necessitates that both the developers and the ultimate users of an LGM are keenly aware of its ethical risk levels to ensure reliable behavior.…”

Section: Introductionmentioning

confidence: 99%

Democratizing Ethical Assessment of Natural Language Generation Models

Rasekh¹,

Eisenberg²

2022

Preprint

View full text Add to dashboard Cite

Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms by generating discriminatory language, hateful speech, profane content, and other harmful material. Ethical assessment of these models is therefore critical. But it is also a challenging task, requiring an expertise in several specialized domains, such as computational linguistics and social justice. While significant strides have been made by the research community in this domain, accessibility of such ethical assessments to the wider population is limited due to the high entry barriers. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models: Tool for Ethical Assessment of Language generation models (TEAL), a component of Credo AI Lens, an open-source assessment framework. CCS CONCEPTS• Computing methodologies → Natural language generation.

show abstract

Societal Biases in Language Generation: Progress and Challenges

Cited by 5 publications

References 72 publications

Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture

Democratizing Ethical Assessment of Natural Language Generation Models

Contact Info

Product

Resources

About